Stable Diffusion XL images in under 3 seconds on OctoAI

New Webinar

August 7: Learn to optimize LLMs for cost and quality, outperforming GPT-4

Home

Blog

Stable Diffusion XL 1.0 images in under 3 seconds on OctoAI

Bassem Yacoube

Sep 26, 2023

3 minutes

In this article

Speed: Image creation in under 3 seconds with no drop in quality

Concurrency: Consistent sub 3-second latencies at multiple levels of concurrency

All this and more, in the upcoming OctoAI Image Generation Solution

In this article

Speed: Image creation in under 3 seconds with no drop in quality

Concurrency: Consistent sub 3-second latencies at multiple levels of concurrency

All this and more, in the upcoming OctoAI Image Generation Solution

OctoAI continues to push the boundary for speed in image generation. We’re excited to share today that we can now consistently generate 1024x1024 pixel resolution 30-step images on SDXL 1.0 with a p95 time of 2.8 seconds. This is part of the upcoming OctoAI Image Generation Solution — a curated ensemble of models and tools to fast-track the adoption of image generation in your applications, available in private preview today.

Since the launch of SDXL 1.0 on OctoAI, the team here has been actively investigating approaches to accelerate SDXL performance without trading off quality. While there are many ways to speed up generation, these often rely on decreasing the compute requirement by reducing the number of steps or limiting other parameters. Our goal for SDXL acceleration has been to retain quality and control while improving the speed of image generation. Read on for detailed test configurations, results, and comparisons.

Speed: Image creation in under 3 seconds with no drop in quality

As the largest open image generation model to date, SDXL has approximately two and a half times more parameters than its predecessor Stable Diffusion models (2.6B for SDXL versus 865M for SD 2.1). SDXL thus requires a larger infrastructure footprint and more compute resources for its runtime - translating to higher cost and image generation latencies. This is a problem OctoAI has been uniquely positioned to address, building on our AI systems and Machine Learning Compilation (MLC) expertise without touching the quality or accuracy of your image creation. These optimizations have resulted in a median (p50) latency of 2.77 seconds and a p95 latency of 2.8 seconds.

Chart showing SDXL endpoint latency of OctoAI v HuggingFace and OcotAI outperforms at p50 and p95 || '

OctoAI’s median (p50) and p95 image generation speeds are over 3x faster than Hugging Face on the same hardware. Moreover, OctoAI showed a p50 to p95 variance of under 2%.

The configuration used for the test was as follows:

Image resolution: 1024x1024 pixels
Number of steps: 30
Refiner: Off
CFG scale: 7
All steps are complete runs, with no customizations to reduce the computations
Client: AWS us-east-1 region
Endpoints: AWS us-east-1 region
Platforms compared: OctoAI and Hugging Face Inference Endpoints
Hugging Face configuration: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0, deployed using the 1-click-deploy for SDXL using 1x NVIDIA A100 80GB GPU

Concurrency: Consistent sub 3-second latencies at multiple levels of concurrency

As application usage increases, the number of customers and the number of image generation requests will increase. Applications can have bursty periods of high usage, like weekends for a consumer entertainment application or weekday mornings for a workforce application. Applications will also scale up the use of image generation APIs as they grow in adoption, and as they see increased usage by customers. A key consideration for application builders is to ensure predictable performance - both for the bursty intermittent increase and for the more gradual increase with adoption. Concurrency tests measure a system's performance as the number of concurrent users increases, and help evaluate behavior as the usage scales.

The charts below show the results of the same tests as earlier, run with 1, 2, 5, and 10 concurrent requests, to simulate different traffic levels. OctoAI’s image generation latency stays near-constant across the levels, showing a variation of under 2% across all the levels tested.

Line chart showing OctoAI latency of SDXL at different levels of request concurrency || '

An important point to note is that this concurrency performance does not assume any commitments or over-provisioning by the customer. OctoAI automatically scales the infrastructure resources up and down to meet the required traffic, maintaining the desired image generation latency as usage volumes move up and down. Customers with committed usage contracts can also add custom contractual Service Level Agreements (SLAs) as part of their agreement.

All this and more, in the upcoming OctoAI Image Generation Solution

The accelerated SDXL 1.0 model tested here will soon be available to all customers, as part of the upcoming OctoAI Image Generation Solution. The solution brings together an ensemble of models, customization capabilities including fine-tuning, and operational enhancements, to fast-track and scale the use of image generation in your applications. Our early access customers have already generated millions of images using the solution. If you would like to test the OctoAI Image Generation Solution and evaluate if it meets your needs, please reserve an early access spot.

Stay tuned for more in the coming weeks. You’re also welcome to join us on Discord, to keep up with the latest on OctoAI!

Your choice of models on our SaaS or in your environment

Run any model or checkpoint on our efficient, reliable, and customizable API endpoints. Sign up and start building in minutes.