What is Stable Diffusion?
Stable Diffusion (SD) is a text-to-image generative AI model that was launched in 2022 by Stability AI, a UK-based company that builds open AI tools.
Stable Diffusion generates images in seconds conditioned on text descriptions, which are known as prompts. It is not only limited to image generations but also does tasks, such as inpainting, outpainting, and image-to-image generations guided by prompts.
Since Stable Diffusion is a deep learning model, it is trained on billions of text-image pairs to generate images from mere text.
- How Does Stable Diffusion Work?
- Is Stable Diffusion Open-Source?
- How to Use Stable Diffusion & Pricing
- How to Get More Stable Diffusion Free Credits
- Use Cases of Stable Diffusion
- Stable Diffusion Community
- Are Stable Diffusion Images Copyrighted? Can It Be Used for Commercial Purpose?
How Does Stable Diffusion Work?
Stable Diffusion is a deep learning generative AI model. In order to understand what Stable Diffusion is, you must know what is deep learning, generative AI, and latent diffusion model.
Deep learning (DL) is a specialized type of machine learning (ML), which is a subset of artificial intelligence (AI).
Deep learning enables computers to think like humans by analyzing and exchanging data across different nodes—exactly like data transmission between neurons in the human brain—from the given datasets.
Unlike machine learning, deep learning can analyze unstructured data, such as text, images, videos, and audio with the help of a complex structure of algorithms known as artificial neural network (ANN).
Generative AI is a technique that creates new content, such as text, images, audio, video, or code based on existing datasets. It leverages deep learning algorithms to detect the underlying pattern in the input and produces similar content.
Some examples of generative AI include GPT-3, Stable Diffusion, DALL.E 2, and Midjourney.
Now, that you know about deep learning and generative AI, let’s move on to the architecture behind Stable Diffusion.
Diffusion models are powerful generative models. Two processes that takes place in diffusion models are forward diffusion and reverse diffusion.
In forward diffusion, the model adds noise to the input image. The noise that the diffusion model adds to the input image is known as Gaussian noise. During the first process, the input image gradually transitions from the original position to the noise position following the Markov Chain.
In reverse diffusion, reversing of noise added will takes place. Recovering or removing the noise means recovering the values of the input image pixels. During the process of reversing the noise, the input image transfers from the noise position to the original position using Convolution Neural Networks (CNN) known as U-Net.
If the input was an audio or video, Artificial Neural Networks (ANN) or Recurrent Neural Networks (RNN) might have been used.
Latent Diffusion Model (LDM)
The Latent Diffusion Model (LDM) is the actual mechanism behind Stable Diffusion. Latent Diffusion Model involves the same process as Diffusion Model but in an efficient manner.
The diffusion model works directly with pixels, the large data inputs. Since it deals with large input data, there comes a struggle in terms of computation issues.
To overcome the computational issues of Diffusion Models, Robin Rombach and his colleagues discovered a model called the latent diffusion model (LDM).
The main difference between the diffusion model and the latent diffusion model is that the LDM works with compressed data for faster and more efficient computation.
The latent diffusion model deals with compressed images instead of regular images. Hence, no more dealing with large input data.
Working with compressed images while keeping as much information as possible will lead to faster generation as the data size is much smaller. If you take a look at the overview of the latent diffusion architecture image, the input image X is being encoded and infused into the space called latent space Z. The encoder is used to extract information from the input and send it to the latent space Z. The working mechanism of the encoder is very similar to GAN.
Then, the input image gradually turns into noise following the Markov chain, and by the time it reaches the noisy latent vector ZT. After that, the denoising process takes place in the pre-trained de-noising autoencoder with skip connections for attention Q, K, V.
Lastly, reconstruction of the input image takes place using the decoder D. Then, the modified and de-noised image transforms into a final high-quality image.
Is Stable Diffusion Open-Source?
As mentioned earlier, Stability AI—the company that released Stable Diffusion (SD)—made Stable Diffusion open-source. Means that anyone can view, edit, and built models upon the SD code which is available on Hugging Face as well as Google Colab notebook.
Most of the companies behind generative AI models or AI art generators must have used a few open-source datasets for training the models. Likewise, Stable Diffusion used the LAION-5B dataset, which consists of 5.85 billion multilingual CLIP-filtered image-text pairs. The LAION-5B is an open-source dataset that was created by the Large-scale Artificial Intelligence Open Network (LAION).
How to Use Stable Diffusion
Due to its open-source nature, Stable Diffusion can be run locally. I will show you how to get started with Stable Diffusion for free.
Meanwhile, you can check out the Stable Diffusion public demo on Hugging Face. Enter the prompt in the given box and click “Generate image”. After clicking the “Generate image” button, you will get 4 AI-generated images.
If you don’t know how to prompt, check out our famous article “Prompt Engineering Made Easy” or click on any of the examples given in the Hugging Face SD demo page. You can also access the “Advanced options” given.
Stable Diffusion hosted on Hugging Face is just a demo, and it runs on CPU rather than GPU. Therefore, you can expect a faster generation.
To experience Stable Diffusion to its fullest, such as to gain more control and achieve rapid generation, you need to use Stable Diffusion on DreamStudio.
DreamStudio is an in-browser graphical user interface for Stable Diffusion, developed by Stability AI. It has an easy-to-use graphical user interface (GUI) that gives a seamless experience for everyone.
To use Stable Diffusion for free on DreamStudio, you need a Discord account. Sign up with your Discord account on DreamStudio to get 200 free generations to play with the text-to-art generator Stable Diffusion.
Once you have signed up for DreamStudio, you can start prompting. You can also use the free prompt generators available for Stable Diffusion.
Enter the prompt in the given box and click “Dream”. Wait a few seconds to get the high-quality AI-generated image.
You can find all the AI-generated images on the “History” tab. To check the free credits, click the “Profile icon” and click “Membership”.
For better output, try the additional settings located on the right side of the DreamStudio application.
If you have used all of your free credits, you will need to pay for a membership. See below for pricing details.
Stable Diffusion Pricing
Stable Diffusion is free forever if you run it on your local device (see the next paragraph).
However, using Stable Diffusion on DreamStudio is not always free. If you run out of your free credits, you need to pay for the membership.
While the company Stability AI provides free open-source AI models, their services— like DreamStudio—are not open-source and free.
Taking complex things and simplifying them comes with a price.
Currently, the pricing structure of DreamStudio is very simple. However, Stability AI might change the pricing in the future. Once you run out of the 200 free generations, you need to buy a membership that costs $10. You can perform approximately 1,000 generations with this membership.
Normally, each generation will consume one credit. But, the credit per image can vary from 0.2 credits to 28.2 credits, depending on the additional settings, such as image resolution, step count settings, etc.
If you are an organization or an individual looking for a Stable Diffusion API, here is the pricing for it.
- 10,000 generations for $100
- 50,000 generations for $500
- 100,000 generations for $1000
For more details on pricing, check out the Stable Diffusion Pricing article.
Run Stable Diffusion on your Local Machine
Since Stable Diffusion is an open-source model, you can run it on your Linux, Windows, or Mac computer for free. However, your machine needs to fulfill the following system requirements:
- NVIDIA GPU with at least 6 GB VRAM (4 GB VRAM is enough but will be slower)
- A local storage space of 10 GB
- An OS of Linux, Windows 11, 10, 8.1, 8, or Mac
Since SD uses complex algorithms, it requires more computational power. Hence, it heavily relied on GPUs.
The method that I’m going to reveal is the easiest way to run Stable Diffusion without going to the command line or local web server often.
- UnstableFusion for Linux, Windows, and Mac
- Diffusion Bee for Max
UnstableFusion is a GUI for Stable Diffusion like DreamStudio but for desktop. It comes with all the essential features, which include inpainting, img2img, and more.
The UnstableFusion requires you to install Python, the Stable Diffusion model, and other components before using it.
For more information on how to run and use Unstable Fusion, refer to this step-by-step article.
Diffusion Bee, on the other hand, is exclusively for Mac computers. Ensure to have Mac M1/ M2, to use Diffusion Bee.
Diffusion Bee does not require Python, the Stable Diffusion model, or other components like UnstableFusion. Rather, it is of the plug-and-play type. Just install the Diffusion Bee application, give the prompt, and click “Generate”.
A system requirement for Diffusion Bee is: macOS 12.5.1, 8 GB of RAM is enough, but 16 GB of RAM is ideal.
How to Get More Stable Diffusion Free Credits
You can run Stable Diffusion with unlimited free credits if you run it on your local machine.
If you find it difficult to use Stable Diffusion on your local machine because of system requirements or manual Python and other dependency installations, you can use SD with more free credits on platforms that have a similar GUI, like DreamStudio.
Yes, there are ways to get more Stable Diffusion free credits. To get more SD credits,
- Use the Stable Diffusion Public Demo if you are okay with the slight time delay.
- Make use of Playground AI website.
- Use Stable Diffusion Discord Channel
Use Cases of Stable Diffusion
The Stable Diffusion has enormous potential to be unleashed. It is like a car with high horsepower. If you know how to handle it (prompt well), you will get unforgettable moments.
Stable Diffusion Use cases include:
- Concept Art
- Graphic Design
There are people who have made money-making projects using Stable Diffusion and other generative AI models. Danny Postma, a serial entrepreneur, made many AI products.
Stable Diffusion Community
Being part of a community makes you intellectual. There are many updates, events, and models being built on Stable Diffusion every week.
The best way to stay updated is to join the Stable Diffusion and generative AI communities. When we talk about communities, Discord is the go-to platform. Most of the Stable Diffusion and generative AI-related Discord channels allow members to generate images, see other artists’ creations, and collaborate with other members.
Apart from the news, trend, and event updates, the community is one of the best places to take prompt inspiration from other fellow artists. Actually, I wrote an article about stealing prompt ideas from other artists.
The Stable Diffusion official Discord channel allows its members to try the new version of the model.
There are many quality generative AI Discord channels available for all AI enthusiasts and emerging AI artists other than the Stable Diffusion Discord channel.
Are Stable Diffusion Images Copyrighted? Can It Be Used for Commercial Purpose?
This is the single biggest question that pops into the mind of all Stable Diffusion users. The short answer to this question is no for copyright and yes for commercial use.
As of now, Stability AI does not claim any copyright over the images generated using Stable Diffusion. In fact, it cannot claim any copyright over images since the training dataset is the work of many artists. However, in the future, the company may ask for a royalty if the AI-generated images are minted as NFTs.
Stable Diffusion is released under an extremely permissive license. According to Stability AI’s policy about commercial use, users of Stable Diffusion can use the images generated for both commercial and non-commercial purposes.
However, users must stick to the ethical ways mentioned on the license. Also, the policies of Stability AI are not definitive and are subject to change.
To know more about the person who copyrighted her AI-assisted image and who minted his AI-assisted image as NFT, read this interesting article.