Stable Diffusion Project: Creating Illustration


Many people write in their jobs. Not everyone is a novel writer; some write technical documentation, business plans, news articles, and even blog posts. In those writings, illustrations are not essential but often good to have. They are decorations, interpretations, or visual explanations of the text. However, you probably do not want to spend too much time or do not have the drawing skills to create the illustration. Stable Diffusion is here to help!

In this post, you will see how you can use Stable Diffusion to create illustrations. After finishing this post, you will learn:

  • How to create a prompt from text
  • How to adjust the prompt for a better illustration

Let’s get started

Stable Diffusion Project: Creating Illustration
Photo by Koushik Chowdavarapu. Some rights reserved.

Overview

This post is in three parts; they are:

  • Project Idea
  • Creating the Illustration
  • Facial Details

Project Idea

Illustration is a decoration to the text. Let’s begin with the following story:

A number of generals are attacking a fortress. The generals must decide as a group whether to attack or retreat; some may prefer to attack, while others prefer to retreat. The important thing is that all generals agree on a common decision, for a halfhearted attack by a few generals would become a rout, and would be worse than either a coordinated attack or a coordinated retreat.

The problem is complicated by the presence of treacherous generals who may not only cast a vote for a suboptimal strategy; they may do so selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to those generals in favor of retreat, and a vote of attack to the rest. Those who received a retreat vote from the ninth general will retreat, while the rest will attack (which may not go well for the attackers). The problem is complicated further by the generals being physically separated and having to send their votes via messengers who may fail to deliver votes or may forge false votes.

If you are familiar with this story, it is the description of Byzantine Generals’ Problem. The text above is from Wikipedia. You are going to create a picture to accompany the text above.

Creating the Illustration

Simply, you can open up the Stable Diffusion Web UI and enter the story as the positive prompt and generate. But there are some problems. Firstly, the story may have different scenes, and you need only one in an illustration. Secondly, the description is not optimized for use as a prompt. Since the text will be converted into an embedding for the diffusion process, you want to emphasize the keywords while using a grammatically correct sentence is not important. Thirdly, besides the positive prompt, you may also need a negative prompt.

Paraphrasing the story to make it suitable to use as a prompt is the first step. ChatGPT or some other AI tool is good at paraphrasing. You can start with this prompt in ChatGPT:

Below is a story. Please extract a key scene and describe it as a prompt for Stable Diffusion to generate an illustration. If appropriate, please also describe the corresponding negative prompt.

Story: A number of generals are attacking a fortress. The generals must decide as a group whether to attack or retreat; some may prefer to attack, while others prefer to retreat. The important thing is that all generals agree on a common decision, for a halfhearted attack by a few generals would become a rout, and would be worse than either a coordinated attack or a coordinated retreat. The problem is complicated by the presence of treacherous generals who may not only cast a vote for a suboptimal strategy; they may do so selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to those generals in favor of retreat, and a vote of attack to the rest. Those who received a retreat vote from the ninth general will retreat, while the rest will attack (which may not go well for the attackers). The problem is complicated further by the generals being physically separated and having to send their votes via messengers who may fail to deliver votes or may forge false votes.

And the response will be:

Key Scene Description for Stable Diffusion:

In a dramatic moment, a group of nine generals stands in a tense circle within a dimly lit war tent. The atmosphere is charged with suspense as they face the crucial decision of attacking or retreating from a looming fortress. The generals are in deep discussion, some with expressions of determination, others with doubt, reflecting their conflicting preferences. In the center of the scene, a messenger is depicted delivering a vote, his presence adding to the tension as the generals eye him warily. The backdrop includes maps and battle plans pinned to the tent walls, with the distant silhouette of the fortress visible through the tent’s open flap. The expressions and body language of the generals convey the gravity of their dilemma, hinting at underlying treachery and the looming threat of a disastrous miscommunication.

Negative Prompt:

Avoid depicting modern technology, futuristic elements, and unrealistic or fantasy settings. The scene should not include any supernatural elements or anachronistic details like computers or modern uniforms.

Using ChatGPT to generate a prompt for Stable Diffusion

By just copy-and-paste these two generated paragraphs into Stable Diffusion Web UI as the positive and negative prompts, you may get this:

A picture generated from Stable Diffusion using the prompts as suggested by ChatGPT

This is generated using the model Deliberate and the generation is set to 768×512 pixels. This result is not bad, but it may not be what you wanted. Firstly, this output looks like a picture of a photograph. If you repeat it a few times, you will see various different styles may be produced because you never mentioned what style you wanted! Surely, you want to polish the prompts for a better picture.

Let’s resort to ChatGPT again, as a follow-up:

Simplify the answer of the positive prompt into fewer words

Then you get:

Positive Prompt for Stable Diffusion:

A group of nine generals in a dimly lit war tent, deep in discussion about attacking or retreating from a fortress. Tense expressions and body language reflect their conflicting preferences. A messenger delivers a vote, adding to the tension. Maps and battle plans adorn the tent walls, with the fortress silhouette visible through the open flap.

Asking ChatGPT to simplify the prompt

You can replace your positive prompt with this shorter version. The result should not be very different. A good prompt should mention the main focus, the background, and the composition and style. This prompt lacks the last. So you can modify the prompt to enrich such details. If you want a realistic-looking output, adding a keyword “photorealistic” should works.

The negative prompt is also important to the generation. We can also simply the negative prompt to provide just the keyword. You should also add some style keywords to the negative prompt to control what you do not want. An example is the following:

Modern technology, futuristic elements, fantasy, supernatural elements, sketch, cartoon, anime, model

And the output is like the following:

Improved generation from Stable Diffusion by adjusting the prompt used

Facial Details

A photorealistic picture would easily fail if you want to have a lot of persons in it. This is the case in the screenshot above: If you look closer to each person’s face, you will see many of them have a weird facial expression of distorted anatomy. This can be fixed, but not with the prompt.

You need to install the “ADetailer” plugin to Stable Diffusion: Go to the extension tab and type in the URL to the “Install from URL” section and then restart the Web UI. Then you can see “ADetailer” section in the text2img control.

Repeating the prompt, but this time you should check “ADetailer” to enable it and make sure the detector is “face_yolo8n.pt”. This will detect the faces from the generated picture and run inpainting to regenerate the faces. You do not need to provide any additional prompt in ADetailer plugin unless you want some additional details to those faces. The result is as follows:

Generation from Stable Diffusion. Note the faces look malformed.

Improved picture after applying the ADetailer plugin.

The two pictures are generated with a fixed random seed so they looks similar. But with the ADetailer plugin, you see the faces look much more natural. Now you can use the generated picture as an illustration for your writing.

Further Readings

This section provides more resources on the topic if you want to go deeper.

Summary

In this post, you experienced a workflow on how to extract a scene from text and convert it into a prompt for Stable Diffusion. With some attention to the details, you can modify the prompt to generate a picture that is suitable as an illustration for your text. You also learned that the ADetailer extension for the Stable Diffusion Web UI can help you make a better picture by replacing the faces in the generation with more naturally looking one.



Source link