Stable Diffusion in Java (SD4J) Enables Generating Images with Deep Learning – InfoQ.com

Oracle Open Source has introduced the Stable Diffusion in Java (SD4J) project, a modified port of the Stable Diffusion C# implementation with support for negative text inputs. Stable diffusion is a deep learning text-to-image model based on diffusion. SD4J can be used, via the GUI or programmatically in Java applications, to generate images. SD4J runs on top of the ONNX Runtime, a cross platform inference and training machine learning accelerator, allowing faster customer experience and reduced model training time.

Git Large File Storage, a Git extension for versioning large files, should be installed first, for example with the following command on Linux:

Afterwards, the SD4J project can be cloned locally with the following command:

SD4J uses models, the compatible pre-built ONNX models from Hugging Face, that will be used for the examples in this news story:

The README contains more information on using other models, such as those not in ONNX format.

ONNXRuntime-Extensions is a library which extends the capabilities of the ONNX models and the interference with the ONNX Runtime:

After cloning the project, the following command can be executed inside the onnxruntime-extensions directory to compile the ONNXRuntime-Extensions for your platform:

The following error might be displayed if CMake isn't installed:

Install at least version 3.25 of CMake to resolve the error, for example with the following command on Linux:

When the build is successful, the resulting library (libortextensions.[dylib,so] or ortextensions.dll) can be found inside the following directory:

The resulting library should be copied to the root directory of the SD4J project.

After these preparations, the GUI can be started by executing the Maven command, containing the model path, inside the sd4j directory:

The SD4J GUI is shown after the Maven command executed successfully:

The images in this news story are created with guidance scale 10, seed 42, inference steps 50 and image scheduler Euler Ancestral, unless stated otherwise.

First, the GUI is used to create an image of a sports car on the road, with the following image text:

This results in a red sports car on a road:

When generating images of sports cars, most of them are red. In order to create images with sports cars that aren't red, the image negative text may be used to specify what the image shouldn't contain. For example, by using the value red for image negative text, a white car is generated in this example:

The guidance scale indicates whether the resulting image should be closely related to the text prompt. A higher number indicates that they should be closely related. Conversely, a lower number may be used if more creativity in the image is desired. For stable diffusion, most models use a default guidance scale value between 7 and 7.5.

A clear picture of a house on a hill surrounded by trees is generated using the image text: Professional photograph of house on a hill, surrounded by trees, while it rains, high resolution, high quality and guidance scale 10:

Using the same image text with guidance scale 1allows more creativity and the house is now a bit hidden between the trees and the hill is less visible:

The seed is a random number used to generate noise. The generated images stay the same when using the same seed, prompt and other parameters.

Stable diffusion starts with an image of random noise. With each inference step, the noise is reduced and steered towards the prompt. Higher is not always better as it might introduce unwanted details. The Hugging Face website in general recommends 50 inference steps.

Creating an image of a tree in a park with inference 10 results in a relatively noisy tree image:

Increasing the inference steps to 50 results in a clearer image of a tree:

While increasing the inference steps further to 200 results in an image clearly displaying multiple trees and some other elements, for example in red:

The image scheduler takes a model's output to return a denoised version, while the batch size specifies the amount of generated images.

Working manually via the GUI allows generating images, however the project also provides the SD4J Java class to access SD4J programmatically.

Faster image generation is possible after enabling the CUDA integration for NVIDIA GPUs by changing the exec-maven-plugin in the pom.xml from CPU to CUDA.

More information can be found in the SD4J README and the Hugging Face documentation provides additional information about the different concepts.

Read the original:
Stable Diffusion in Java (SD4J) Enables Generating Images with Deep Learning - InfoQ.com

Related Posts

Comments are closed.