Create Text to Image GUI App
Updated: Jul 23
In this step-by-step tutorial, we will learn how to create a professional text-to-image application using Stable Diffusion, Flask, and Docker. We will produce high-definition images, far greater than 512 x 512 pixels, using super-sampling techniques. In addition, we will learn how to delegate generative tasks to our GPU, making our app incomparably faster.
Table of Content
Starter Application Overview
Watch Video Tutorial
In case you prefer watching rather than reading, this tutorial is also available in a video version that covers the same workflow. Otherwise, please continue with the article version below.
Install Docker Desktop
Let's begin by installing Docker Desktop from https://docker.com.
Once the installation is complete, we will make sure that the Docker engine is running given the green bar at the bottom left (as appears in the screenshot below).
In addition, since we will use a WSL2 terminal for our Docker commands, we will need to enable it in the settings menu by checking the box highlighted in pink in the screenshot below.
Introduction to Docker for Absolute Beginners - Optional
If you're not familiar with Docker, I highly recommend watching my detailed beginners guide. You will learn about containers, images, Docker compose and Dockerfiles using a much more simple example than the current project.
Clone Starter Files from GitHub
Once we've set up Docker Desktop, we can move on with cloning some starter files. These files include a Flask web GUI interface that doesn't have any functionality. There are buttons, for example, but not much happens when you click them.
For this, we will need a terminal, in my case - WSL 2. If you haven't had a chance to explore it yet, it represents a Linux Ubuntu environment and can be easily installed on a Windows system.
Install WSL 2
You can install WSL 2 by navigating to your command prompt terminal by typing "cmd" in the Windows Start menu, and entering the following command:
wsl --run
Once you do so, you'll need to restart your system and then you'll have:
a new WSL terminal that you can access from the Windows Start menu as well.
a new Linux drive that you can access from your file system.
Clone Github Repository
Next, we will navigate to my Stable Diffusion GUI App repository on GitHub, pressing the green "code" button and copying the HTTPS URL.
We will then paste it back in our terminal, using the following command:
This will download the entire repository into your new Linux drive. Once the download is complete, we can then navigate to the root directory of our project (the folder that stores the main executable file of our application) using the following terminal command:
cd StableDiffusion_GUI_App/starter_files
Environment Setup with Docker Init
From the root directory of our project, we can initialize our working environment. To do so, we will use a tool named Docker Init that generates all the files necessary to run a Docker container. We will simply type:
docker init
Which will present us a series of questions, and based on our answers - Docker will automatically determine the content of the following files:
.dockerignore
Dockerfile
compose.yaml
README
In our case, we will select Python as our platform, in the version of 3.11.5 (or whichever future version Docker Init suggests). The app will listen on port 8000, and the command we will use to run it would be:
python3 app.py
Once the Docker files were generated, we can run our application with:
docker compose up --build
And access it through our web browser, navigating to: http://localhost:8000
Starter Application Overview
The interface of our starter application has one text input field and one "generate" button, meant to produce 3 different image options. Additionally, there's a "save" button under each image, which will allow us to store it as a super-sampled high-definition copy on our file system.
At this stage, when we enter text into the input field and press the "generate" button. We will receive the following print statement in our terminal:
user prompt received: Canadian bear eating fish in the river
If we press on all the "save" buttons in the following order: right, center and left. We will receive:
save button 0 was clicked!
save button 1 was clicked!
save button 2 was clicked!
Which means that we don't need to worry about fetching user input and determining image ids. This starter application allows us to focus on the important details, rather than investigating interface elements. As a result, the only starter file that we will modify would be app.py.
Note: To shut down the container, please type Ctrl + C in your terminal.
Source Code of app.py
from flask import Flask, render_template, request
from PIL import Image
import secrets
app = Flask(__name__)
# generate random secret key
app.config['SECRET_KEY'] = secrets.token_hex(16)
@app.route('/')
def hello():
# home page
return render_template(
"index.html",
# pass variables into the HTML template
btn_range = range(3),
prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
)
@app.route('/prompt', methods=['POST', 'GET'])
def prompt():
# generate images from user prompt
print("user prompt received:", request.form['prompt_input'])
return render_template(
"index.html",
# pass variables into the HTML template
btn_range = range(3),
prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
)
@app.route('/supersample', methods=['POST', 'GET'])
def supersample():
# enlarge and save prompt image in high quality
print("save button", request.form['save_btn'], "was clicked!")
return render_template(
"index.html",
# pass variables into the HTML template
btn_range = range(3),
prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
)
if __name__ == '__main__':
# run application
app.run(
host = '0.0.0.0',
port = 8000,
debug = True
)
Dynamic Saving with Debug Mode
Before we learn how to convert text to images, we will need to ensure that our working environment is convenient and time efficient. Currently, when we make changes to our Python, HTML, CSS, and JavaScript files, they are saved in our file system - but our container is not aware of them at all.
The reason is, our container is isolated from the rest of the computer, and unless we manually enable dynamic saving - we will need to shut down and re-run our container every time we make a tiny change to our code. This process is unnecessary, as we can easily skip it.
Step 1: Enable Debug Mode
First, we need to make sure that Debug mode is enabled inside app.py, which in our case is True, given the last few lines of code:
app.run(
host = '0.0.0.0',
port = 8000,
debug = True
)
Step 2: Mount Volumes
Additionally, we will need to mount a drive from our local system to a drive from the container by navigating to the compose.yaml file generated by Docker Init, and adding a "volumes" attribute underneath the "ports" attribute, such that:
services:
server:
build:
context: .
ports:
- 8000:8000
volumes:
- ./:/app
Please note: to make it work, the indentation level of "volumes" must match the indentation level of "ports".
But what exactly does ./:/app means?
./ represents the root directory of our project (StableDiffusion_GUI_App/starter_files)
: separates between the local directory and the container directory.
/app represents the root directory of our container.
But how do we know that /app is the root directory of our container?
We simply navigate to our Dockerfile, also generated by Docker Init. where we can see that our Working Directory is set to /app, given:
WORKDIR /app
If it is set to a different name on your end, sometime in the future, please use it instead.
User Permissions in Dockerfile
If we're already accessing our Dockerfile, there's one more change we need to apply to make our workflow more convenient. But default, Docker Init creates an app user with very limited access and sets it as the main user of the container.
# Create a non-privileged user that the app will run under.
RUN adduser \
--disabled-password \
--gecos "" \
--home "/nonexistent" \
--shell "/sbin/nologin" \
--no-create-home \
--uid "${UID}" \
appuser
# Switch to the non-privileged user to run the application.
USER appuser
This user is not allowed to make any changes in the application such as saving files to the container. In our case, it's a big problem because if we can't save the AI generated images - our app becomes useless.
To solve it, we will set the main user of our app to the system administrator, also known as root with:
USER root
Which will give us the appropriate access permissions.
Convert Text to Image with Stable Diffusion
Once we ensured dynamic saving and adequate user permissions, we can finally move on with converting our text user input to images.
For this, we will need a Generative AI model, in our case Stable Diffusion Version 1.4, which we can find on Hugging Face.
Clone Stable Diffusion
To clone Stable Diffusion 1.4 into our local machine, we will need a library called Git LFS, or Git Large File Storage. It allows Git to transfer larger than usual files, as found in the Stable Diffusion model. Consequentially, running the following commands in your terminal, will require 43.4 GB of disk space.
sudo apt-get update && \
sudo apt-get install git-lfs && \
git-lfs install && \
git clone "https://huggingface.co/CompVis/stable-diffusion-v1-4"
Model Requirements
Once the model is downloaded and we have a new "stable-diffusion-v1-4" folder in our root directory, we can move on with adding requirements. For this, we will open requirements.txt with a code editor, and add the following modules:
diffusers
transformers
torch
accelerate
For Stable Diffusion tasks, will need a library named Diffusers that uses the Transformers infrastructure and a deep learning library named t=Torch, also known as Pytorch. In addition, we will need the Accelerate library, which will help us load our model on CPU (and later on GPU) in a memory efficient way.
Load Stable Diffusion Pipeline
Once the requirements are installed, we can move on with our code. For this, we will open app.py with a text editor, and we will import the StableDiffusionPipeline class, alongside Pytorch.
from diffusers import StableDiffusionPipeline
import torch
Then, right below the secret key at the first few lines of code, we will load our local model:
# load stable diffusion model
pipeline = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4")
The only problem is, this pipeline produces very basic images. They are OK, but they are not as detailed or realistic as they can be.
To solve it, we will use a tool named FreeU, designed to improve the sample quality of Stable Diffusion models. And we will apply it on our pipeline, with the following command:
pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.2, b2=1.4)
Where FreeU allows us to customize 4 quality control parameters: s1, s2, b1, and b2. The values assigned for these parameters are the officially recommended values by FreeU for Stable Diffusion 1.4, as observed in Github on July 2024.
Use Stable Diffusion Pipeline
Generate One Image
To use the Stable Diffusion pipeline, we will find the app route that prints the user input to the console and we will add the following lines of code:
@app.route('/prompt', methods=['POST', 'GET'])
def prompt():
# generate images from user prompt
print("user prompt received:", request.form['prompt_input'])
# generate image from output
image = pipeline(request.form['prompt_input']).images[0]
# save generated image
image.save("output.png")
return render_template(
"index.html",
btn_range = range(3),
# render placeholder images on the page
prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
)
Then, once we run our container once again with:
docker compose up --build
We can enter a prompt, press the "generate" button and save a 512 x 512 pixel image as "output.png" inside our "starter_files" root directory.
Generate Several Images
To apply the same principles on a few images, rather than just one, we will wrap the image generation and saving commands in a for loop. Additionally, if we're already generating 3 different images, we might as well display them on the page with the following code:
@app.route('/prompt', methods=['POST', 'GET'])
def prompt():
# generate images from user prompt
print("user prompt received:", request.form['prompt_input'])
# generate 3 images and save them
for i in range(3):
image = pipeline(request.form['prompt_input']).images[0]
image.save("./static/images/demo_img" + str(i) + ".png")
return render_template(
"index.html",
btn_range = range(3),
# render the newly saved images on the page
prompt_images = ["demo_img" + str(i) + ".png" for i in range(3)]
)
The only problem is, it takes several minutes to generate these 3 images, and app users are unlikely to tolerate this timeframe. So let's find a way to speed it up.
Enable GPU in Container
To increase the speed of our generative tasks, we will use a software platform named CUDA, that allows parallel processing on Nvidia-based GPUs. You can find a list of all compatible graphics card here: https://developer.nvidia.com/cuda-gpus
If you're GPU is not there - don't worry! you can still follow along. Just skip to the upscaling section, and continue with a slower implementation.
Now, since our Docker container is isolated for the rest of the computer, we will need to enable it access to our GPU hardware. For this, we will need to install a CUDA Container Toolkit and to integrate it in our Docker Compose file.
Install Nvidia Container Toolkit
To install the Nvidia Container Toolkit, we will navigate to the official installation guide that may be updated in the future:
As observed in July 2024, the current Production Repository Configuration command is:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Once we run the above command, we can then install the toolkit with:
sudo apt-get install -y nvidia-container-toolkit
We will then configure the container runtime with:
sudo nvidia-ctk runtime configure --runtime=docker
And finally, we will restart the Docker daemon to complete the installation:
sudo systemctl restart docker
NOTE: if you are unable to restart the Docker daemon with the command above, please do so from Docker Desktop. On the green bar at the bottom left of the window, please click on the "Quit Docker Desktop" button. Then open Docker Desktop once again, and continue with the next steps.
Enable GPU Hardware in Docker Compose
Once we installed the drivers and libraries required to run CUDA in containers, we will turn on GPU access with Docker compose, using the following guide:
As a result, our compose.yaml will have the following content:
services:
server:
build:
context: .
ports:
- 8000:8000
volumes:
- ./:/app
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Check if CUDA is Available
Once we updated our compose file, we can re-build our container. If the installation process was successful, adding the following statement at the top of our code, will return "processing on cuda":
if torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"
print("processing on", device)
Run Stable Diffusion Pipeline with CUDA
To generate images using CUDA, we will simply send our pipeline to be processed on GPU using the .to() method:
pipeline = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4").to(device)
Since we've set the device variable to CUDA only if it is available, and otherwise CPU - users who do not have compatible GPU hardware will still be able to use the app. It will run significantly slower, but it will not present errors, as explicitly stating "cuda" would:
pipeline = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4").to("cuda")
Now, when we try generating another batch of images, they'd be created within seconds.
But the only problem is, our images are very small! They are only 512 x 512 pixels, which is not very useful. So let's find a way of enlarging them!
Upscale Images with EDSR
Download EDSR
To enlarge our images, we will need another AI model named EDSR, or Enhanced Deep Residual Networks for Single Image Super-Resolution. We will download it from its Github repository, but since we only need the model itself rather than the entire repository, we will not use Git Clone this time, but rather:
wget https://github.com/Saafke/EDSR_Tensorflow/raw/master/models/EDSR_x4.pb
This command will download a x4 upscaling model, turning 512 pixels into 2048 pixels. If you'd like to use a smaller scale factor of 2 or 3, please replace the "4" in the wget command above.
EDSR Requirements - OpenCV
To use ESDR, we will add the following requirements to requirements.txt:
opencv-python
opencv-contrib-python
However, to properly install OpenCV, we will also need some Operating System requirements that cannot be simply installed with pip. Therefore we cannot add them in requirements.txt, but we must specify them inside our Dockerfile.
So right after we install requirements.txt on line 39, we will install the OpenCV OS dependencies:
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=requirements.txt \
python -m pip install -r requirements.txt
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
Lastly, we will navigate to app.py and we will import OpenCV at the top of our code with:
import cv2
Load EDSR Model
To load the EDSR model, we will add the following block of code right below our pipeline initiation and FreeU commands:
# load super resolution model
super_res = cv2.dnn_superres.DnnSuperResImpl_create()
super_res.readModel("EDSR_x4.pb")
super_res.setModel("edsr", 4)
Use EDSR Model
Once EDSR is loaded, we can then use it to enlarge our demo images. So let's find the app route where we perform the super sampling, and let's populate it with the following code:
@app.route('/supersample', methods=['POST', 'GET'])
def supersample():
# enlarge and save prompt image in high quality
print("save button", request.form['save_btn'], "was clicked!")
# read demo image that was selected for saving
demo_img = cv2.imread(
"./static/images/demo_img" + str(request.form['save_btn']) + ".png"
)
# convert image colour format to RGB
demo_img = cv2.cvtColor(demo_img, cv2.COLOR_BGR2RGB)
# enlarge image x4
XL_img = super_res.upsample(demo_img)
# convert a Numpy array to an actual image
XL_img = PIL_Image.fromarray(XL_img)
# save image
XL_img.save("XL_output.png")
return render_template(
"index.html",
# pass variables into the HTML template
btn_range = range(3),
prompt_images = [
"./static/images/demo_img" + str(i) + ".png" for i in range(3)
]
)
Now, when we save one of our demo images, we can see it appear in our root directory at an upscaled resolution of 2048 x 2048 pixels under he name of "XL_output.png".
But the only problem is - we would like to save an entire collection of images, rather than just overriding the same file time and again. So let's fix it by giving our extra large images unique names.
Unique Image Naming
One way to do so is by using unique numbers such as date and time. So let's import the datetime module at the top of our code with:
from datetime import datetime
Then, we will generate a unique image id with:
img_id = str(datetime.today())
which will result in an output optimized for reading:
2024-07-22 02:20:11.919895
However, in our case, we don't really care about readability, so we can focus on the numeric features only by removing the punctuation symbols from the id, such that:
img_id = str(datetime.today())
img_id = img_id.replace(":", "").replace(".", "").replace(" ", "").replace("-", "")
which will now return a unique numeric value instead:
20240722022349018563
So let's use this unique image id to name our extra large images. For this, we will create a new directory inside static/images and we will call it saved. Then we will update our code with the following lines:
# generate unique image id
img_id = str(datetime.today())
img_id = img_id.replace(":", "").replace(".", "").replace(" ", "").replace("-", "")
# save image
XL_img.save("./static/images/saved/img_" + img_id + ".png")
Now, when we save our tiny demo images, they will appear in high resolution inside a new designated folder with unique names.
Creative Licenses
If we'd like to publish our app and share it with the world - we will need to take care of licensing. And the idea is, since we are using models and tools that somebody else created - we also need to follow their rules.
So let's download the EDSR license with:
We will then rename the newly downloaded file from LICENSE to EDSR_LICENSE.
Then we will download the FreeU license with:
Similarly, we will rename the file to FREEU_LICENSE.
And lastly, we will download the Stable Diffusion license with:
And of course renaming it to StableDiffusion_LICENSE.txt.
Finally, before publishing your app, I highly recommend reading these licenses and ensuring that you comply with their terms. Since in my case, I am sharing the application free of charge and using it for educational purposes, I can move on with the next steps.
Publish Application on DockerHub
Important Pre-Publishing Steps
Before we publish our application, we will ensure that:
we delete the "output.png" and "XL_output.png" test images from the root directory.
we delete the 3 demo images from the static/images directory, as they will be automatically generated by the app.
we delete the uniquely named images inside the static/images/saved directory (but please do not delete the saved directory itself!)
we call rebuild our images for the very last time with "docker compose up --build", which will save all the file updates we just made in our container.
Publish Image on DockerHub
Create DockerHub Repository
To publish our application on DockerHub, we will create a Docker account, in my case mariyasha, as well as a new Docker repository, in my case diffuse_me. Which composes the remote image name of mariyasha/diffuse_me.
Rename Local Docker Image
Once we have a remote repository, we will need to rename our local image accordingly. First, let's find out the local name of our image with:
docker images
Which will present all the Docker images stored on our system. Where we see that the name of our image is starter_files-server, as our root directory. We can also see it has the tag of latest and the size of 29.8GB.
So let's rename it to match our remote repository name with:
docker image tag starter_files-server:latest mariyasha/diffuse_me:1.0
And if we check the Docker images once again, we see a new instance of our image with the remote repository name.
NOTE: we chose to give it the tag of 1.0 because it's the first version of our image. The tag added to the image is determined by us, and defaults to "latest".
Push to DockerHub
Finally, to upload our image to DockerHub, we will push it with:
docker push mariyasha/diffuse_me:1.0
Once the upload is complete, we will refresh our repository page - and we will see a new entry within Tags. It means that our upload is successful and our app is officially public!
Use Application
Let's say you shared your application with your friends - but how can they use it on their end?
Step 1: Install Docker Desktop.
Step 2: Make sure you have 29.8GB of disk space.
Step 3: pull the image from DockerHub into your computer.
docker pull mariyasha/diffuse_me:1.0
Step 4: navigate to a directory where you'd like to save your AI-generated image collection.
cd path/to/folder
Step 5: run the newly pulled local image.
docker run --gpus=all -p 8000:8000 -v ./:/app/static/images/saved mariyasha/diffuse_me:1.0
Step 6: enjoy! :)
References
A short list of links, referred throughout the tutorial for your convenience:
Starter Files: https://github.com/MariyaSha/StableDiffusion_GUI_App
Stable Diffusion v1-4: https://huggingface.co/CompVis/stable-diffusion-v1-4
Nvidia Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Docker GPU Support Guide:
DockerHub: https://hub.docker.com/
Thank you!
I hope you enjoyed this tutorial, and if you did - please share it with the world!
If you came up with a cool software based on this guide, please give me a shout somewhere on social media - I'd love to see where your imagination leads you! :)
Thank you so much for reading and see you soon in another incredible tutorial!
Comments