Diffusion models like these are trained on billions of labeled data to produce the queried image. They do not only rely on labeled data to produce images, audio, or videos. Rather, additionally, a method known as “classifier-free guidance” is used to improve the quality and decrease the sample diversity. This method was proposed by Dhariwal and Nichol in 2021.
What is the CFG Scale?
Like Seed, the classifier-free guidance scale (CFG Scale) is one of the additional settings found in the Stable Diffusion model. The CFG scale adjusts how much the image looks closer to the prompt and/ or input image. If CFG Scale is greater, the output will be more in line with the input prompt and/or input image, but it will be distorted. On the other hand, the lower the CFG Scale value, the more likely it is to drift away from the prompt or the input image, but the better quality.
The value of the CFG scale and the fidelity between the prompt & output images are directly proportional to each other. The value of the CFG Scale and the quality of the output are inversely proportional to each other.
Let’s understand the CFG Scale functionality with an experiment. In this article, I’ll be using DreamStudio, Playground AI, and Lexica to show how the CFG Scale works. You can also use this article whether you are using Stable Diffusion on your local machine.
An Experiment to Understand CFG Scale Functionality
The aim of this experiment is to understand how the CFG scale works and to find its sweet spot.
Prompt: Portrait of Tom Cruise in red suit, 4k, high quality
The above images were generated in Stable Diffusion at various CFG scale values. As you can see, the fidelity increases with the CFG values.
The images with the CFG value of 1 and 3 are nowhere near the face of Tom Cruise.
At the same time, if you compare the image with the CFG value of 7 and the image with the CFG value of 18, the color and the portrait of the image with the CFG value of 18 are perfectly in line with the prompt. But, there is a lot of noise compared to the image with a CFG value of 7.
If you take images with CFG values of 9 and 10, though the face looks good, the color of the suit is completely different from the input prompt. The faces look oversaturated on images with CFG values above 12. According to me, the image with the CFG value of 7 looks more realistic.
When Should I Change the Value of the CFG Scale in Stable Diffusion?
The LAION dataset, on which the Stable Diffusion was trained, is capable of producing effective results with just a short and simple set of img2img instructions. However, if you try to generate something that Stable Diffusion has no prior knowledge of and want to conjoin multiple concepts or people, you might need to turn up the value of the CFG scale and/ or denoising strength. As a result, Stable Diffusion will generates creatively at the expense of image quality.
How to Use CFG Scale in DreamStudio, Lexica, and Playground AI
- Sign Up for DreamStudio or Playground AI
- Enter the Prompt
- Adjust the CFG Scale Value
- Find the Optimal or Sweet Spot of CFG
1. Sign Up for DreamStudio or Playground AI
If you want to try Stable Diffusion on Lexica, you don’t need to sign in. But the other two platforms require you to sign in for image generation.
If you want to use DreamStudio or Playground AI, sign in using your Gmail account or Discord account.
2. Enter the Prompt
As a second step, you need to enter the prompt. If you have difficulty creating good prompts, refer to our article prompt engineering made easy. Otherwise, use free prompt generators or GPT-3 to create a good prompt.
3. Adjust the CFG Scale Value
Once you have entered the prompt, you need to adjust the CFG value.
In DreamStudio, you can find the “Cfg Scale” slider on the right-hand side of the screen. In Lexica, you can find the “Guidance Scale” once you have clicked the “Generate” button.
In Playground AI, you can find “Prompt Guidance” on the right-hand side of the screen.
After adjusting the CFG value, click “Dream” if you are in DreamStudio, “Generate” if you are in Lexica or Playgroud AI.
4. Find the Optimal or Sweet Spot of CFG
Keep playing with the CFG value and see which one meets your requirements.
Once you have found the sweet spot of CFG value, you can download and use that image. Remember that the sweet spot or optimal value of CFG varies depending on your requirements. However, the CFG value of 7–11 gives optimal output in most cases.
CFG Scale value in Stable Diffusion is an important setting that changes the way images look. Mostly, the standard value of CFG would be effective. If you want high fidelity of the image over quality, increase the CFG scale value. If you look for quality, try to lower it.
1. How do I adjust the Stable Diffusion-generated images to match my prompt?
If you are getting less fidelity in your Stable Diffusion-generated images, try adjusting the value of CFG. Adjusting the CFG value will give images that exactly match your prompt. Mostly, the standard value of 7 would be effective.
2. What is the sweet spot of the CFG Scale? Or What is the Optimal Value of CFG Scale?
The CFG scale has a value of 0 to 20. In general, a CFG Scale value of 7 to 11 will give the best results with low noise. However, it varies if you have queried Stable Diffusion that has no prior knowledge.