ControlNet Stable Diffusion XL API

Generate an image using a ControlNet Stable Diffusion 1.5 (SD1.5) model.

OctoAI’s SD1.5 Controlnet API supports both text-to-image, image-to-image use cases, and works with custom assests like LoRAs, checkpoints, VAES, and textual inversions. We offer the following public OctoAI SD1.5 ControlNet checkpoints in the OctoAI Asset Library:

octoai:canny_sd15
octoai:depth_sd15
octoai:inpaint_sd15
octoai:ip2p_sd15
octoai:lineart_sd15
octoai:openpose_sd15
octoai:scribble_sd15
octoai:tile_sd15

In addition to using the default ControlNet checkpoints, you can also upload your own private ControlNet checkpoints to the OctoAI Asset Library. These custom checkpoints can then be utilized during generation by specifying the controlnet parameter. When using custom ControlNet checkpoints, please ensure you provide your own ControlNet mask using the controlnet_image parameter.

You need to create an OctoAI Authentication Token to access this API.

How to use

Invoke https://image.octoai.run/generate/controlnet-sd15 endpoint with a POST request.

The headers of the request must include an Authentication Token in the authorization field. The accept header should be set to application/json to receive the image encoded as base64 in a JSON response.

Generating with a prompt: Commonly referred to as text-to-image, this mode generates an image from text alone.

  • prompt - text to generate the image from
  • controlnet- Required if using a controlnet engine. Argument takes in the value of ControlNet to be used during image generation.
  • controlnet_image: Required if using a controlnet engine. Controlnet image encoded in b64 string for guiding image generation.

Generating with a prompt and an image: Commonly referred to as image-to-image, this mode also generates an image from text but uses an existing image as the starting point. The required parameters are:

  • prompt - text to generate the image from
  • init_image - the image to use as the starting point for the generation. Argument takes an image encoded as a string in base64 format.
  • strength - controls how much influence the image parameter has on the output image
  • controlnet- Required if using a controlnet engine. Argument takes in the value of ControlNet to be used during image generation.
  • controlnet_image: Required if using a controlnet engine. Controlnet image encoded in b64 string for guiding image generation.

Generating with a prompt and a custom asset: This mode generates an image from text but uses either a custom checkpoint, LoRA, textual inversion, or VAE. Note that using a custom asset increases generation time.

  • prompt - text to generate the image from
  • controlnet- Required if using a controlnet engine. Argument takes in the value of ControlNet to be used during image generation.
  • controlnet_image- Required if using a controlnet engine. Controlnet image encoded in b64 string for guiding image generation.
  • checkpoint - Here you can specify a checkpoint either from the OctoAI asset library or your private asset library.
  • loras - Here you can specify LoRAs, in name-weight pairs, either from the OctoAI asset library or your private asset library.
  • textual_inversions - Here you can specify textual inversions and their corresponding trigger words.
  • vae - Here you can specify variational autoencoders.

For more details about all parameters, please see the request schema below.

Output

The resolution of the generated image will be 512x512 megapixel.

Pricing

  • SD1.5 controlnet: ***$0.003 *** per image

Check Pricing Page for more details.

Request Details

Headers:

Authorization (Required): Your OCTOAI_TOKEN
Content-Type (Required): Set to application/json

Parameters:

  • prompt (string [ upto 77 tokens], Required): A string of text describing the image to generate. You can use prompt weighting, e.g. (A tall (beautiful:1.5) woman:1.0) (some other prompt with weight:0.8) . The weight will be the product of all brackets a token is a member of. The brackets, colons and weights do not count towards the number of tokens.
  • negative_prompt (string, Optional): Text describing image traits to avoid during generation.
  • sampler (string, Optional): A string specifying which scheduler to use when generating an image. Defaults to DDIM. Regular samplers include DDIM,DDPM,DPM_PLUS_PLUS_2M_KARRAS,DPM_SINGLE,DPM_SOLVER_MULTISTEP,K_EULER, K_EULER_ANCESTRAL,PNDM,UNI_PC. Premium samplers (2x price) include DPM_2, DPM_2_ANCESTRAL,DPM_PLUS_PLUS_SDE_KARRAS, HEUN and KLMS.
  • cfg_scale (double, Optional): Floating-point number represeting how closely to adhere to prompt description. Must be a positive number no greater than 50.0. Defaults to 12.
  • image_encoding (enum, Optional): Define which encoding process should be applied before returning the generated image(s). Allowed values: jpeg png
  • num_images (integer, Optional): Integer representing how many output images to generate with a single prompt/configuration. Defaults to 1. Allowed values: 1-16.
  • seed (union, Optional): Integer number or list of integers representing the seeds of random generators. Fixing random seed is useful when attempting to generate a specific image. Must be greater than 0 and less than 2^32.
  • steps (integer, Optional Defaults to 30): Integer representing how many steps of diffusion to run. Must be greater than 0 and less than or equal to 200.
  • init_image (string, Optional): The image (encoded in b64 string) to use as the starting point for the generation. This parameter is for Image-to-Image generation and Inpainting.
    Use .jpg format to ensure best latency
  • strength (double,Optional): Floating-point number indicating how much creative the Image to Image generation mode should be. Must be greater than 0 and less than or equal to 1.0. Defaults to 0.8. This parameter is for Image-to-Image generation.
  • height (integer, Optional): Integer representing the height of image to generate. Default to 1024.
  • width (integer, Optional): Integer representing the width of image to generate. Default to 1024.

Supported Output Resolutions (Width x Height) are as follows:

SD1.5:

(512, 512),(640, 512),(768, 512),(512, 704),
(512, 768),(576, 768),(640, 768),(576, 1024),
(1024, 576)
  • use_refiner (Boolean, Optional): A boolean true or false determines whether to use the refiner or not
  • high_noise_frac (double, Optional): A floating point or integer determining how much noise should be applied using the base model vs. the refiner. A value of 0.8 will apply the base model at 80% and Refiner at 20%. Defaults to 0.8 when not set.

ControlNet parameters

  • controlnet(string, Required if using a controlnet engine): Argument takes in the value of ControlNet to be used during image generation.
  • controlnet_image(string, Required if using a controlnet engine): Controlnet image encoded in b64 string for guiding image generation.
  • controlnet_conditioning_scale (double, Optional): Only applicable if using Controlnets. Argument determines how strong the effect of the controlnet will be. Defaults to 1
  • controlnet_early_stop (integer,Optional):Only applicable if using Controlnets. If provided, indicates fraction of steps at which to stop applying controlnet. This can be used to sometimes generate better outputs.
  • controlnet_preprocess (boolean,Optional):Only applicable if using Controlnets. Argument takes in a boolean value to determine whether or not to apply automatic ControlNet preprocessing. For the privileged set of controlnet checkpoints listed above, we default to helping you autogenerate the corresponding controlnet map/mask that will be fed into the controlnet, but you can override the default by additionally specifying a controlnet_preprocess: false parameter.

Custom Assets

  • checkpoint (string, Optional): Here you can specify a checkpoint either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.

  • loras(string, Optional): Here you can specify LoRAs, in name-weight pairs, either from the OctoAI asset library or your private asset library. Note that using a custom asset increases generation time.

  • textual_inversions (string, Optional): Here you can specify textual inversions and their corresponding trigger words. Note that using a custom asset increases generation time.

  • vae (string, Optional): Here you can specify variational autoencoders. Note that using a custom asset increases generation time.

Request Examples

$curl -H 'Content-Type: application/json' -H "Authorization: Bearer $OCTOAI_TOKEN" -X POST "https://image.octoai.run/generate/controlnet-sd15" \
> -d '{
> "controlnet_image": base64_image,
> "controlnet": "octoai:canny_sd15",
> "controlnet_preprocess": false,
> "prompt": (
> "A photo of a cute tiger astronaut in space"
> ),
> "negative_prompt": "low quality, bad quality, sketches, unnatural",
> "steps": 20,
> "num_images": 1,
> "seed": 768072361,
> "height": 512,
> "width": 512
> }' > response.json