Create Text to Image GUI App

In this step-by-step tutorial, we will learn how to create a professional text-to-image application using Stable Diffusion, Flask, and Docker. We will produce high-definition images, far greater than 512 x 512 pixels, using super-sampling techniques. In addition, we will learn how to delegate generative tasks to our GPU, making our app incomparably faster.

preview graphic of 10 AI Generated photos of a unicorn kitten on the beach at dusk — examples of unicorn kittens on the beach at dusk, generated by the app

Table of Content

Install Docker Desktop
Clone Starter Files from GitHub
Environment Setup with Docker Init
Starter Application Overview
Dynamic Saving with Debug Mode
User Permissions in Dockerfile
Convert Text to Image with Stable Diffusion
Enable GPU in Container
Upscale Images with EDSR
Creative Licenses
Publish Application on DockerHub
Use Application
References

Watch Video Tutorial

In case you prefer watching rather than reading, this tutorial is also available in a video version that covers the same workflow. Otherwise, please continue with the article version below.

Install Docker Desktop

Let's begin by installing Docker Desktop from https://docker.com.

Once the installation is complete, we will make sure that the Docker engine is running given the green bar at the bottom left (as appears in the screenshot below).

a screenshot of Docker Desktop, as appears when the engine is running — Engine Running in Docker Desktop

In addition, since we will use a WSL2 terminal for our Docker commands, we will need to enable it in the settings menu by checking the box highlighted in pink in the screenshot below.

a screenshot of Docker Desktop, as it appears when WSL2 communications are enabled — Enable WSL2 Communications in Docker Desktop

Introduction to Docker for Absolute Beginners - Optional

If you're not familiar with Docker, I highly recommend watching my detailed beginners guide. You will learn about containers, images, Docker compose and Dockerfiles using a much more simple example than the current project.

Clone Starter Files from GitHub

Once we've set up Docker Desktop, we can move on with cloning some starter files. These files include a Flask web GUI interface that doesn't have any functionality. There are buttons, for example, but not much happens when you click them.

For this, we will need a terminal, in my case - WSL 2. If you haven't had a chance to explore it yet, it represents a Linux Ubuntu environment and can be easily installed on a Windows system.

Install WSL 2

You can install WSL 2 by navigating to your command prompt terminal by typing "cmd" in the Windows Start menu, and entering the following command:

wsl --run

Once you do so, you'll need to restart your system and then you'll have:

a new WSL terminal that you can access from the Windows Start menu as well.
a new Linux drive that you can access from your file system.

Clone Github Repository

Next, we will navigate to my Stable Diffusion GUI App repository on GitHub, pressing the green "code" button and copying the HTTPS URL.

We will then paste it back in our terminal, using the following command:

git clone https://github.com/MariyaSha/StableDiffusion_GUI_App.git

This will download the entire repository into your new Linux drive. Once the download is complete, we can then navigate to the root directory of our project (the folder that stores the main executable file of our application) using the following terminal command:

cd StableDiffusion_GUI_App/starter_files

Environment Setup with Docker Init

From the root directory of our project, we can initialize our working environment. To do so, we will use a tool named Docker Init that generates all the files necessary to run a Docker container. We will simply type:

docker init

Which will present us a series of questions, and based on our answers - Docker will automatically determine the content of the following files:

.dockerignore
Dockerfile
compose.yaml
README

In our case, we will select Python as our platform, in the version of 3.11.5 (or whichever future version Docker Init suggests). The app will listen on port 8000, and the command we will use to run it would be:

python3 app.py

a screenshot of the output received by Docker Init — Docker Init Output

Once the Docker files were generated, we can run our application with:

docker compose up --build

And access it through our web browser, navigating to: http://localhost:8000

Starter Application Overview

The interface of our starter application has one text input field and one "generate" button, meant to produce 3 different image options. Additionally, there's a "save" button under each image, which will allow us to store it as a super-sampled high-definition copy on our file system.

a screenshot of the starter Stable Diffusion GUI Application — Stable Diffusion Starter Application

At this stage, when we enter text into the input field and press the "generate" button. We will receive the following print statement in our terminal:

user prompt received: Canadian bear eating fish in the river

If we press on all the "save" buttons in the following order: right, center and left. We will receive:

save button 0 was clicked!
save button 1 was clicked!
save button 2 was clicked!

Which means that we don't need to worry about fetching user input and determining image ids. This starter application allows us to focus on the important details, rather than investigating interface elements. As a result, the only starter file that we will modify would be app.py.

Note: To shut down the container, please type Ctrl + C in your terminal.

Source Code of app.py

from flask import Flask, render_template, request
from PIL import Image
import secrets

app = Flask(__name__)
# generate random secret key
app.config['SECRET_KEY'] = secrets.token_hex(16)

@app.route('/')
def hello():
    # home page  
    return render_template(
        "index.html", 
        # pass variables into the HTML template
        btn_range = range(3), 
        prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
    )

@app.route('/prompt', methods=['POST', 'GET'])
def prompt():
    # generate images from user prompt
    print("user prompt received:", request.form['prompt_input'])
    return render_template(
        "index.html", 
        # pass variables into the HTML template
        btn_range = range(3), 
        prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
    )

@app.route('/supersample', methods=['POST', 'GET'])
def supersample():
    # enlarge and save prompt image in high quality
    print("save button", request.form['save_btn'], "was clicked!")
    return render_template(
        "index.html", 
        # pass variables into the HTML template
        btn_range = range(3), 
        prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
    )

if __name__ == '__main__':
    # run application
    app.run(
        host = '0.0.0.0', 
        port = 8000, 
        debug = True
    )

Dynamic Saving with Debug Mode

Before we learn how to convert text to images, we will need to ensure that our working environment is convenient and time efficient. Currently, when we make changes to our Python, HTML, CSS, and JavaScript files, they are saved in our file system - but our container is not aware of them at all.

The reason is, our container is isolated from the rest of the computer, and unless we manually enable dynamic saving - we will need to shut down and re-run our container every time we make a tiny change to our code. This process is unnecessary, as we can easily skip it.

Step 1: Enable Debug Mode

First, we need to make sure that Debug mode is enabled inside app.py, which in our case is True, given the last few lines of code:

app.run(
        host = '0.0.0.0', 
        port = 8000, 
        debug = True
    )

Step 2: Mount Volumes

Additionally, we will need to mount a drive from our local system to a drive from the container by navigating to the compose.yaml file generated by Docker Init, and adding a "volumes" attribute underneath the "ports" attribute, such that:

services:
  server:
    build:
      context: .
    ports:
      - 8000:8000
    volumes:
      - ./:/app

Please note: to make it work, the indentation level of "volumes" must match the indentation level of "ports".

But what exactly does ./:/app means?

./ represents the root directory of our project (StableDiffusion_GUI_App/starter_files)
: separates between the local directory and the container directory.
/app represents the root directory of our container.

But how do we know that /app is the root directory of our container?

We simply navigate to our Dockerfile, also generated by Docker Init. where we can see that our Working Directory is set to /app, given:

WORKDIR /app

If it is set to a different name on your end, sometime in the future, please use it instead.

User Permissions in Dockerfile

If we're already accessing our Dockerfile, there's one more change we need to apply to make our workflow more convenient. But default, Docker Init creates an app user with very limited access and sets it as the main user of the container.

# Create a non-privileged user that the app will run under.
RUN adduser \
    --disabled-password \
    --gecos "" \
    --home "/nonexistent" \
    --shell "/sbin/nologin" \
    --no-create-home \
    --uid "${UID}" \
    appuser

# Switch to the non-privileged user to run the application.
USER appuser

This user is not allowed to make any changes in the application such as saving files to the container. In our case, it's a big problem because if we can't save the AI generated images - our app becomes useless.

To solve it, we will set the main user of our app to the system administrator, also known as root with:

USER root

Which will give us the appropriate access permissions.

Convert Text to Image with Stable Diffusion

Once we ensured dynamic saving and adequate user permissions, we can finally move on with converting our text user input to images.

For this, we will need a Generative AI model, in our case Stable Diffusion Version 1.4, which we can find on Hugging Face.

Clone Stable Diffusion

To clone Stable Diffusion 1.4 into our local machine, we will need a library called Git LFS, or Git Large File Storage. It allows Git to transfer larger than usual files, as found in the Stable Diffusion model. Consequentially, running the following commands in your terminal, will require 43.4 GB of disk space.

sudo apt-get update && \
sudo apt-get install git-lfs && \
git-lfs install && \
git clone "https://huggingface.co/CompVis/stable-diffusion-v1-4"

Model Requirements

Once the model is downloaded and we have a new "stable-diffusion-v1-4" folder in our root directory, we can move on with adding requirements. For this, we will open requirements.txt with a code editor, and add the following modules:

diffusers
transformers
torch
accelerate

For Stable Diffusion tasks, will need a library named Diffusers that uses the Transformers infrastructure and a deep learning library named t=Torch, also known as Pytorch. In addition, we will need the Accelerate library, which will help us load our model on CPU (and later on GPU) in a memory efficient way.

Load Stable Diffusion Pipeline

Once the requirements are installed, we can move on with our code. For this, we will open app.py with a text editor, and we will import the StableDiffusionPipeline class, alongside Pytorch.

from diffusers import StableDiffusionPipeline
import torch

Then, right below the secret key at the first few lines of code, we will load our local model:

# load stable diffusion model
pipeline = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4")

The only problem is, this pipeline produces very basic images. They are OK, but they are not as detailed or realistic as they can be.

Images Produces by Stable Diffusion without FreeU of: "unicorn kitten on the beach, high definition photo at dusk" — Images Produced without FreeU using: "unicorn kitten on the beach, high definition photo at dusk"

To solve it, we will use a tool named FreeU, designed to improve the sample quality of Stable Diffusion models. And we will apply it on our pipeline, with the following command:

pipeline.enable_freeu(s1=0.9, s2=0.2, b1=1.2, b2=1.4)

Where FreeU allows us to customize 4 quality control parameters: s1, s2, b1, and b2. The values assigned for these parameters are the officially recommended values by FreeU for Stable Diffusion 1.4, as observed in Github on July 2024.

Use Stable Diffusion Pipeline

Generate One Image

To use the Stable Diffusion pipeline, we will find the app route that prints the user input to the console and we will add the following lines of code:

@app.route('/prompt', methods=['POST', 'GET'])
def prompt():
    # generate images from user prompt
    print("user prompt received:", request.form['prompt_input'])
    
    # generate image from output
    image = pipeline(request.form['prompt_input']).images[0]
    # save generated image
    image.save("output.png")  
 
    return render_template(
        "index.html", 
        btn_range = range(3), 
        # render placeholder images on the page
        prompt_images = ["/static/images/placeholder_image.png" for i in range(3)]
    )

Then, once we run our container once again with:

docker compose up --build

We can enter a prompt, press the "generate" button and save a 512 x 512 pixel image as "output.png" inside our "starter_files" root directory.

Images Produced with FreeU using: "unicorn kitten on the beach, high definition photo at dusk" — Images Produced with FreeU enabled using: "unicorn kitten on the beach, high definition photo at dusk"

Generate Several Images

To apply the same principles on a few images, rather than just one, we will wrap the image generation and saving commands in a for loop. Additionally, if we're already generating 3 different images, we might as well display them on the page with the following code:

@app.route('/prompt', methods=['POST', 'GET'])
def prompt():
    # generate images from user prompt
    print("user prompt received:", request.form['prompt_input'])
    
    # generate 3 images and save them
    for i in range(3):
        image = pipeline(request.form['prompt_input']).images[0]
        image.save("./static/images/demo_img" + str(i) + ".png") 
 
    return render_template(
        "index.html", 
        btn_range = range(3), 
        # render the newly saved images on the page
        prompt_images = ["demo_img" + str(i) + ".png" for i in range(3)]
    )

The only problem is, it takes several minutes to generate these 3 images, and app users are unlikely to tolerate this timeframe. So let's find a way to speed it up.

a screenshort of the application as it outputs 3 images of bears from the text of "Canadian bear eating fish in the river at dusk" — Images Generated with the Prompt: Canadian bear eating fish in the river at dusk

Enable GPU in Container

To increase the speed of our generative tasks, we will use a software platform named CUDA, that allows parallel processing on Nvidia-based GPUs. You can find a list of all compatible graphics card here: https://developer.nvidia.com/cuda-gpus

If you're GPU is not there - don't worry! you can still follow along. Just skip to the upscaling section, and continue with a slower implementation.

Now, since our Docker container is isolated for the rest of the computer, we will need to enable it access to our GPU hardware. For this, we will need to install a CUDA Container Toolkit and to integrate it in our Docker Compose file.

Install Nvidia Container Toolkit

To install the Nvidia Container Toolkit, we will navigate to the official installation guide that may be updated in the future:

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

As observed in July 2024, the current Production Repository Configuration command is:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Once we run the above command, we can then install the toolkit with:

sudo apt-get install -y nvidia-container-toolkit

We will then configure the container runtime with:

sudo nvidia-ctk runtime configure --runtime=docker

And finally, we will restart the Docker daemon to complete the installation:

sudo systemctl restart docker

NOTE: if you are unable to restart the Docker daemon with the command above, please do so from Docker Desktop. On the green bar at the bottom left of the window, please click on the "Quit Docker Desktop" button. Then open Docker Desktop once again, and continue with the next steps.

Enable GPU Hardware in Docker Compose

Once we installed the drivers and libraries required to run CUDA in containers, we will turn on GPU access with Docker compose, using the following guide:

https://docs.docker.com/compose/gpu-support/

As a result, our compose.yaml will have the following content:

services:
  server:
    build:
      context: .
    ports:
      - 8000:8000
    volumes:
      - ./:/app
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Check if CUDA is Available

Once we updated our compose file, we can re-build our container. If the installation process was successful, adding the following statement at the top of our code, will return "processing on cuda":

if torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu" 
print("processing on", device)

Run Stable Diffusion Pipeline with CUDA

To generate images using CUDA, we will simply send our pipeline to be processed on GPU using the .to() method:

pipeline = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4").to(device)

Since we've set the device variable to CUDA only if it is available, and otherwise CPU - users who do not have compatible GPU hardware will still be able to use the app. It will run significantly slower, but it will not present errors, as explicitly stating "cuda" would:

pipeline = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4").to("cuda")

Now, when we try generating another batch of images, they'd be created within seconds.

But the only problem is, our images are very small! They are only 512 x 512 pixels, which is not very useful. So let's find a way of enlarging them!

Upscale Images with EDSR

Download EDSR

To enlarge our images, we will need another AI model named EDSR, or Enhanced Deep Residual Networks for Single Image Super-Resolution. We will download it from its Github repository, but since we only need the model itself rather than the entire repository, we will not use Git Clone this time, but rather:

wget https://github.com/Saafke/EDSR_Tensorflow/raw/master/models/EDSR_x4.pb

This command will download a x4 upscaling model, turning 512 pixels into 2048 pixels. If you'd like to use a smaller scale factor of 2 or 3, please replace the "4" in the wget command above.

EDSR Requirements - OpenCV

To use ESDR, we will add the following requirements to requirements.txt:

opencv-python
opencv-contrib-python

However, to properly install OpenCV, we will also need some Operating System requirements that cannot be simply installed with pip. Therefore we cannot add them in requirements.txt, but we must specify them inside our Dockerfile.

So right after we install requirements.txt on line 39, we will install the OpenCV OS dependencies:

RUN --mount=type=cache,target=/root/.cache/pip \
    --mount=type=bind,source=requirements.txt,target=requirements.txt \
    python -m pip install -r requirements.txt   

RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y

Lastly, we will navigate to app.py and we will import OpenCV at the top of our code with:

import cv2

Load EDSR Model

To load the EDSR model, we will add the following block of code right below our pipeline initiation and FreeU commands:

# load super resolution model
super_res = cv2.dnn_superres.DnnSuperResImpl_create()
super_res.readModel("EDSR_x4.pb")
super_res.setModel("edsr", 4)

Use EDSR Model

Once EDSR is loaded, we can then use it to enlarge our demo images. So let's find the app route where we perform the super sampling, and let's populate it with the following code:

@app.route('/supersample', methods=['POST', 'GET'])
def supersample():
    # enlarge and save prompt image in high quality
    print("save button", request.form['save_btn'], "was clicked!") 
   
    # read demo image that was selected for saving
    demo_img = cv2.imread(
        "./static/images/demo_img" + str(request.form['save_btn']) + ".png"
    )
    # convert image colour format to RGB
    demo_img = cv2.cvtColor(demo_img, cv2.COLOR_BGR2RGB)
    # enlarge image x4
    XL_img = super_res.upsample(demo_img) 
    # convert a Numpy array to an actual image
    XL_img = PIL_Image.fromarray(XL_img) 
    # save image
    XL_img.save("XL_output.png")

    return render_template(
        "index.html", 
        # pass variables into the HTML template
        btn_range = range(3),
        prompt_images = [
	       "./static/images/demo_img" + str(i) + ".png" for i in range(3)
	   ]
    )

Now, when we save one of our demo images, we can see it appear in our root directory at an upscaled resolution of 2048 x 2048 pixels under he name of "XL_output.png".

But the only problem is - we would like to save an entire collection of images, rather than just overriding the same file time and again. So let's fix it by giving our extra large images unique names.

Unique Image Naming

One way to do so is by using unique numbers such as date and time. So let's import the datetime module at the top of our code with:

from datetime import datetime

Then, we will generate a unique image id with:

img_id = str(datetime.today())

which will result in an output optimized for reading:

2024-07-22 02:20:11.919895

However, in our case, we don't really care about readability, so we can focus on the numeric features only by removing the punctuation symbols from the id, such that:

img_id = str(datetime.today())
img_id = img_id.replace(":", "").replace(".", "").replace(" ", "").replace("-", "")

which will now return a unique numeric value instead:

20240722022349018563

So let's use this unique image id to name our extra large images. For this, we will create a new directory inside static/images and we will call it saved. Then we will update our code with the following lines:

# generate unique image id
img_id = str(datetime.today())
img_id = img_id.replace(":", "").replace(".", "").replace(" ", "").replace("-", "")
# save image
XL_img.save("./static/images/saved/img_" + img_id + ".png")

Now, when we save our tiny demo images, they will appear in high resolution inside a new designated folder with unique names.

Creative Licenses

If we'd like to publish our app and share it with the world - we will need to take care of licensing. And the idea is, since we are using models and tools that somebody else created - we also need to follow their rules.

So let's download the EDSR license with:

wget https://github.com/Saafke/EDSR_Tensorflow/raw/master/LICENSE

We will then rename the newly downloaded file from LICENSE to EDSR_LICENSE.

Then we will download the FreeU license with:

wget https://github.com/ChenyangSi/FreeU/raw/main/LICENSE

Similarly, we will rename the file to FREEU_LICENSE.

And lastly, we will download the Stable Diffusion license with:

wget https://huggingface.co/spaces/CompVis/stable-diffusion-license/raw/main/license.txt

And of course renaming it to StableDiffusion_LICENSE.txt.

Finally, before publishing your app, I highly recommend reading these licenses and ensuring that you comply with their terms. Since in my case, I am sharing the application free of charge and using it for educational purposes, I can move on with the next steps.

Publish Application on DockerHub

Important Pre-Publishing Steps

Before we publish our application, we will ensure that:

we delete the "output.png" and "XL_output.png" test images from the root directory.
we delete the 3 demo images from the static/images directory, as they will be automatically generated by the app.
we delete the uniquely named images inside the static/images/saved directory (but please do not delete the saved directory itself!)
we call rebuild our images for the very last time with "docker compose up --build", which will save all the file updates we just made in our container.

Publish Image on DockerHub

Create DockerHub Repository

To publish our application on DockerHub, we will create a Docker account, in my case mariyasha, as well as a new Docker repository, in my case diffuse_me. Which composes the remote image name of mariyasha/diffuse_me.

a screenshot of creating a repository on DockerHub — Create DockerHub Repository

Rename Local Docker Image

Once we have a remote repository, we will need to rename our local image accordingly. First, let's find out the local name of our image with:

docker images

Which will present all the Docker images stored on our system. Where we see that the name of our image is starter_files-server, as our root directory. We can also see it has the tag of latest and the size of 29.8GB.

a screenshot of the "docker images" command and its output in a wsl2 terminal — Find Image Name with Docker Images

So let's rename it to match our remote repository name with:

docker image tag starter_files-server:latest mariyasha/diffuse_me:1.0

And if we check the Docker images once again, we see a new instance of our image with the remote repository name.

NOTE: we chose to give it the tag of 1.0 because it's the first version of our image. The tag added to the image is determined by us, and defaults to "latest".

a screenshot of "docker images" and "docker image tag" commands with their output in a wsl2 terminal — Docker Images and Docker Image Tag Terminal Output

Push to DockerHub

Finally, to upload our image to DockerHub, we will push it with:

docker push mariyasha/diffuse_me:1.0

Once the upload is complete, we will refresh our repository page - and we will see a new entry within Tags. It means that our upload is successful and our app is officially public!

a screenshot of a DockerHub repository — Image Pushed Successfully to DockerHub

Use Application

Let's say you shared your application with your friends - but how can they use it on their end?

Step 1: Install Docker Desktop.

Step 2: Make sure you have 29.8GB of disk space.

Step 3: pull the image from DockerHub into your computer.

docker pull mariyasha/diffuse_me:1.0

Step 4: navigate to a directory where you'd like to save your AI-generated image collection.

cd path/to/folder

Step 5: run the newly pulled local image.

docker run --gpus=all -p 8000:8000 -v ./:/app/static/images/saved mariyasha/diffuse_me:1.0

Step 6: enjoy! :)

References

A short list of links, referred throughout the tutorial for your convenience:

Starter Files: https://github.com/MariyaSha/StableDiffusion_GUI_App
Stable Diffusion v1-4: https://huggingface.co/CompVis/stable-diffusion-v1-4
FreeU: https://github.com/ChenyangSi/FreeU
Nvidia Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Docker GPU Support Guide:
https://docs.docker.com/compose/gpu-support/
DockerHub: https://hub.docker.com/

Thank you!

I hope you enjoyed this tutorial, and if you did - please share it with the world!

If you came up with a cool software based on this guide, please give me a shout somewhere on social media - I'd love to see where your imagination leads you! :)

Thank you so much for reading and see you soon in another incredible tutorial!