High-quality SDXL Image Generation in under 1 second on OctoAI

Nov 20, 2023

Recently, OctoAI has onboarded 2 major advances in image GenAI models that vastly improve the speed and cost of generating high-quality images. The first is SSD-1B, a distilled version of SDXL that is 50% faster, with only a small difference in image quality. The second is LCM-LoRAs, an even newer and more impressive innovation that enables you to get high-quality image output in less than one second and the lowest SDXL image prices possible today. Critical to this sub-second SDXL latency is OctoAI’s Asset Orchestration architecture and the Asset Orchestrator technology, which enables smart caching and fast asset loading in addition to what is typically possible with the LCM-LoRA approach. This blog elucidates the benefits and caveats of both innovations and how you can leverage them on OctoAI. To get started with OctoAI Image Gen Solution and try this for yourself, sign up and start for free today. Please refer to our API docs for details on how to use this capability.


SSD-1B launched a few weeks ago and is available in open source for commercial use. Its main benefit is that it is 50% faster than SDXL because it is a smaller model. Furthermore, OctoAI applied our proprietary compiler and cloud system optimizations to this model, getting us to end-to-end average generation speed of 1.4 seconds — the fastest version of SSD-1B on the market.

The main pitfalls are that since the SSD-1B model has been distilled to a smaller size, the output images from SSD-1B are different than SDXL. In other words, you can no longer reproduce the same output images as SDXL, even if you use the same seed and API parameters. Furthermore, the community around this new model is still relatively small compared to that around SDXL, so there are not as many fine-tuned assets like checkpoints and LoRAs allowing for customization of styles, objects, and faces around this model. Read about why customization around image models is important and how OctoAI enables customization.


The second innovation that we are very excited about is LCM-LoRAs. This innovation enables you to achieve high-quality output from SDXL with only 4 steps and less than 1 second of latency. More importantly, it is completely compatible with existing image customization techniques available on OctoAI such as SDXL LoRAs. Customers using this can achieve both fast inference speed and product differentiation via customization.

Exploring a real-world example

Let’s say you are an E-Commerce merchant selling glass orbs for the holiday season. Using OctoAI SDXL with default settings (base style and 30 steps), you can generate the following image within 2.8 seconds:

Now if you use SDXL with LCM, you can generate a similarly high-quality image in only 4 steps and 660 milliseconds. Note that OctoAI built a proprietary Asset Orchestrator technology to ensure that the LCM asset can be cached in memory intelligently; without this technology, generation can take closer to 2-3 seconds:

Finally, if you mix an SDXL LCM-LoRA with another custom LORA tuned on glass orbs for the holiday season, you can generate an image like the following. This generation can also be done in 4 steps, but bringing in the custom LoRA would introduce a hot-swapping cost:

Here are a few issues when using LCM-LoRA:

  • You must set cfg_scale to a small number between 1 to 2, otherwise the quality turns out very poor. If you pick a cfg_scale that is outside this range, we will automatically force the value to a valid number.

  • You must use the LCM sampler. If you pick a different sampler, we will automatically force the value to LCM.

  • Typically 4-8 steps is sufficient for generating high quality images.

