Stable Diffusion XL API

Generate an image using a Stable Diffusion XL (SDXL) model.

OctoAI’s SDXL API supports both text-to-image, image-to-image use cases and, adding custom assests like LoRAs, checkpoints, VAES, textual inversions. It also support ControlNets. For more details, refer SDXL-ControlNets.

You need to create an OctoAI Authentication Token to access this API.

How to use

Invoke https://image.octoai.run/generate/sdxl endpoint with a POST request.

The headers of the request must include an Authentication Token in the authorization field. The accept header should be set to application/json to receive the image encoded as base64 in a JSON response.

Generating with a prompt: Commonly referred to as text-to-image, this mode generates an image from text alone. While the only required parameter is the prompt, it also supports a width and height parameter which can be used to control the aspect ratio of the generated image.

Generating with a prompt and an image: Commonly referred to as image-to-image, this mode also generates an image from text but uses an existing image as the starting point. The required parameters are:

prompt - text to generate the image from
init_image - the image to use as the starting point for the generation. Argument takes an image encoded as a string in base64 format.
strength - controls how much influence the image parameter has on the output image

Generating with a prompt and a custom asset: This mode generates an image from text but uses either a custom checkpoint, LoRA, textual inversion, or VAE. Note that using a custom asset increases generation time.

prompt - text to generate the image from
checkpoint - Here you can specify a checkpoint either from the OctoAI asset library or your private asset library.
loras - Here you can specify LoRAs, in name-weight pairs, either from the OctoAI asset library or your private asset library.
textual_inversions - Here you can specify textual inversions and their corresponding trigger words.
vae - Here you can specify variational autoencoders.

Inpainting with a prompt: Inpainting replaces or edits specific areas of an image. This makes it a useful tool for image restoration like removing defects and artifacts, or even replacing an image area with something entirely new. Inpainting relies on a mask to determine which regions of an image to fill in. The required parameters are:

prompt - text to generate the image from
init_image - the image to use as the starting point for the generation. Argument takes an image encoded as a string in base64 format.
mask_image - area of the picture to inpaint. Argument takes an image encoded as a string in base64 format.

Outpainting with a prompt: Outpainting is the process of using an image generation model like Stable Diffusion to extend beyond the canvas of an existing image. Outpainting is very similar to inpainting, but instead of generating a region within an existing image, the model generates a region outside of it. The required parameters are:

prompt - text to generate the image from
init_image - the existing image you’d like to outpaint. You need to create a source image that places your original image within a larger canvas. Argument takes an image encoded as a string in base64 format.
mask_image - a black and white mask representing the extended area. Argument takes an image encoded as a string in base64 format.
outpainting - Argument takes a boolean value to determine Whether the request requires outpainting or not. If so, special preprocessing is applied for better results. Defaults to false. This needs to be set to true, if you wish to use outpainting.

Transfer faces from source image to new generation Parameter aka Photo Merge feature.

prompt - text to generate the image from
transfer_images (dict, Optional): Applicable for use cases where you wish to transfer the subject in the uploaded image(s) to the output image(s). It works best for face transfer use cases. Argument takes a dictionary containing a mapping of trigger words to a list of sample images which demonstrate the desired object to transfer.

For more details about all parameters, please see the request schema below.

Output

The resolution of the generated image will be 1 megapixel. The default resolution is 1024x1024.

Pricing

SDXL Base: $0.004 per image, 1024x1024, 30 steps; billed per image
SDXL custom asset: $0.008 per image, 1024x1024, 30 steps; billed per image
Fine tuning job SDXL: $0.25 per tune, using the 500 step default
Inpainting: Same price as corresponding SDXL image
Outpainting: Same price as corresponding SDXL image
Photomerge: $0.015 per image, 1024x1024, 30 steps; billed per image

Check Pricing Page for more details.

Request Details

Headers:

Authorization (Required): Your OCTOAI_TOKEN
Content-Type (Required): Set to application/json

Parameters:

prompt (string [ upto 77 tokens], Required): A string of text describing the image to generate. You can use prompt weighting, e.g. (A tall (beautiful:1.5) woman:1.0) (some other prompt with weight:0.8) . The weight will be the product of all brackets a token is a member of. The brackets, colons and weights do not count towards the number of tokens.
prompt_2 (string, Optional): This only applies to SDXL. By default, setting only prompt copies the input to both prompt and prompt_2. When prompt and prompt_2 are both set, they have very different functionality. The second prompt is meant for more human readable descriptions of the desired image.
For example, prompt is used for “word salad” style control of the image. This is the type of prompting you are likely familiar with from SD 1.5. Prompts like the following work well:
prompt = "photorealistic, high definition, masterpiece, sharp lines"
whereas prompt_2 is meant for more human readable descriptions of the desired image. For example:
prompt_2 = "A portrait of a handsome cat wearing a little hat. The cat is in front of a colorful background.
negative_prompt (string, Optional): Text describing image traits to avoid during generation.
negative_prompt_2 (string, Optional): This only applies to SDXL. This prompt is meant for human readable descriptions of what you don’t want the image, e.g. you would say “Low resolution” in negative_prompt then “Bad hands” in negative_prompt_2.
sampler (string, Optional): A string specifying which scheduler to use when generating an image. Defaults to DDIM. Regular samplers include DDIM,DDPM,DPM_PLUS_PLUS_2M_KARRAS,DPM_SINGLE,DPM_SOLVER_MULTISTEP,K_EULER, K_EULER_ANCESTRAL,PNDM,UNI_PC. Premium samplers (2x price) include DPM_2, DPM_2_ANCESTRAL,DPM_PLUS_PLUS_SDE_KARRAS, HEUN and KLMS.
cfg_scale (double, Optional): Floating-point number represeting how closely to adhere to prompt description. Must be a positive number no greater than 50.0. Defaults to 12.
image_encoding (enum, Optional): Define which encoding process should be applied before returning the generated image(s). Allowed values: jpeg png
num_images (integer, Optional): Integer representing how many output images to generate with a single prompt/configuration. Defaults to 1. Allowed values: 1-16.
seed (union, Optional): Integer number or list of integers representing the seeds of random generators. Fixing random seed is useful when attempting to generate a specific image. Must be greater than 0 and less than 2^32.
steps (integer, Optional Defaults to 30): Integer representing how many steps of diffusion to run. Must be greater than 0 and less than or equal to 200.
init_image (string, Optional): The image (encoded in b64 string) to use as the starting point for the generation. This parameter is for Image-to-Image generation and Inpainting.
Use .jpg format to ensure best latency
strength (double,Optional): Floating-point number indicating how much creative the Image to Image generation mode should be. Must be greater than 0 and less than or equal to 1.0. Defaults to 0.8. This parameter is for Image-to-Image generation.
height (integer, Optional): Integer representing the height of image to generate. Default to 1024.
width (integer, Optional): Integer representing the width of image to generate. Default to 1024.

Supported Output Resolutions (Width x Height) are as follows:

SDXL:

(1024, 1024),(896, 1152),(832, 1216),(768,
1344),(640, 1536),(1536, 640),(1344, 768),
(1216, 832),(1152, 896)

use_refiner (Boolean, Optional): A boolean true or false determines whether to use the refiner or not
high_noise_frac (double, Optional): A floating point or integer determining how much noise should be applied using the base model vs. the refiner. A value of 0.8 will apply the base model at 80% and Refiner at 20%. Defaults to 0.8 when not set.
style_preset (string, Optional): This only applies to SDXL. Used to guide the output image towards a particular style. Defaults to None. Supported values for styles present include base, 3d-model, Abstract,Advertising, Alien, analog-film, anime,Architectural, cinematic, Collage,comic-book, Craft Clay, Cubist,digital-art, Disco,Dreamscape,Dystopian, enhance, Fairy Tale,fantasy-art, Fighting Game, Film Noir, Flat Papercut, Food Photography, Gothic, Graffiti, Grunge, HDR, Horror, Hyperrealism, Impressionist, isometric, Kirigami, line-art,Long Exposure,low-poly,Minimalist,modeling-compound,Monochrome,Nautical, Neon Noir,neon-punk,origami,Paper Mache, Paper Quilling,Papercut Collage,Papercut Shadow Box,photographic,pixel-art,Pointillism,Pop Art,Psychedelic,Real Estate,Renaissance,Retro Arcade,Retro Game,RPG Fantasy,Game,Silhouette,Space,Stacked Papercut,Stained Glass,Steampunk,Strategy Game,Surrealist,Techwear Fashion,Thick Layered Papercut,tile-texture,Tilt-Shift, Tribal,Typography,Watercolor,Zentangle

Custom Assets

checkpoint (string, Optional): Here you can specify a checkpoint either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.
loras(string, Optional): Here you can specify LoRAs, in name-weight pairs, either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.
textual_inversions (string, Optional): Here you can specify textual inversions and their corresponding trigger words. Note that using a custom asset increases generation time.
vae (string, Optional): Here you can specify variational autoencoders. Note that using a custom asset increases generation time.

Inpainting and Outpating Parameters

mask_image (string, Optional): Only applicable for inpainting use cases i.e. to specify which area of the picture to paint onto. Argument takes an image encoded as a string in base64 format.
Use .jpg format to ensure best latency
outpainting (boolean, Optional): Only applicable for outpainting use cases. Argument takes a boolean value to determine Whether the request requires outpainting or not. If so, special preprocessing is applied for better results. Defaults to false

Photo Merge Parametes

transfer_images (dict, Optional): This is our Photo Merge feature. Applicable for use cases where you wish to transfer the subject in the uploaded image(s) to the output image(s). It works best for face transfer use cases. Argument takes a dictionary containing a mapping of trigger words to a list of sample images which demonstrate the desired object to transfer.

Request Examples

$ curl -H 'Content-Type: application/json' -H "Authorization: Bearer $OCTOAI_TOKEN" -X POST "https://image.octoai.run/generate/sdxl" \
>     -d '{
>         "prompt": "The angel of death Hyperrealistic, splash art, concept art, mid shot, intricately detailed, color depth, dramatic, 2/3 face angle, side light, colorful background",
>         "negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
>         "sampler": "DDIM",
>         "cfg_scale": 11,
>         "height": 1024,
>         "width": 1024,
>         "seed": 2748252853,
>         "steps": 20,
>         "num_images": 1,
>         "high_noise_frac": 0.7,
>         "strength": 0.92,
>         "use_refiner": true,
>         "style_preset": "3d-model"
>     }' > response.json

Here’s an example of how to mix OctoAI assets (checkpoints, loras, and textual_inversions) in the same API request.

1 payload = {
2     ...
3 	"checkpoint": "octoai:realcartoon",
4     "loras": {
5         "octoai:crayon-style": 0.7,
6         "your-custom-lora": 0.3
7     },
8     "textual_inversions": {
9         "octoai:NegativeXL": "negativeXL_D",
10     },
11 	"vae": "your_vae_name"
12     ...
13 }

1 payload = {
2     ...
3         "prompt": "A trigger_word_1 sitting on a golden throne",
4         "negative_prompt": "Blurry photo, distortion, low-res, bad quality",
5         "checkpoint": "octoai:RealVisXL",
6         "width": 1024,
7         "height": 1024,
8         "num_images": 2,
9         "sampler": "K_EULER_ANCESTRAL",
10         "steps": 20,
11         "cfg_scale": 7.5,
12         "transfer_images": {"trigger_word_1": ["$BASE64_IMAGE_1", "$BASE64_IMAGE_2"]
13     ...
14 }

$	curl -H 'Content-Type: application/json' -H "Authorization: Bearer $OCTOAI_TOKEN" -X POST "https://image.octoai.run/generate/sdxl" \
>	-d '{
>	"prompt": "The angel of death Hyperrealistic, splash art, concept art, mid shot, intricately detailed, color depth, dramatic, 2/3 face angle, side light, colorful background",
>	"negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
>	"sampler": "DDIM",
>	"cfg_scale": 11,
>	"height": 1024,
>	"width": 1024,
>	"seed": 2748252853,
>	"steps": 20,
>	"num_images": 1,
>	"high_noise_frac": 0.7,
>	"strength": 0.92,
>	"use_refiner": true,
>	"style_preset": "3d-model"
>	}' > response.json

1	payload = {
2	...
3	"checkpoint": "octoai:realcartoon",
4	"loras": {
5	"octoai:crayon-style": 0.7,
6	"your-custom-lora": 0.3
7	},
8	"textual_inversions": {
9	"octoai:NegativeXL": "negativeXL_D",
10	},
11	"vae": "your_vae_name"
12	...
13	}

1	payload = {
2	...
3	"prompt": "A trigger_word_1 sitting on a golden throne",
4	"negative_prompt": "Blurry photo, distortion, low-res, bad quality",
5	"checkpoint": "octoai:RealVisXL",
6	"width": 1024,
7	"height": 1024,
8	"num_images": 2,
9	"sampler": "K_EULER_ANCESTRAL",
10	"steps": 20,
11	"cfg_scale": 7.5,
12	"transfer_images": {"trigger_word_1": ["$BASE64_IMAGE_1", "$BASE64_IMAGE_2"]
13	...
14	}