Media Gen SolutionCustomizations

ControlNets

OctoAI's asset library is pre-populated with the most popular available ControlNets which allow added image input to influence and customize the image generation.

While traditional image generation models can produce stunning visuals, they often lack guidance, and therefore the ability to generate images subject to user-desired image composition. ControlNet changes the game by allowing an additional image input that can be used for conditioning (influencing) the final image generation. This could be anything from simple scribbles to detailed depth maps or edge maps. By conditioning on these input images, ControlNet directs the Stable Diffusion model to generate images that align closely with the user’s intent.

OctoAI’s Asset Library comes pre-populated with the followinglist of public controlnets.

octoai:canny_sdxl
octoai:depth_sdxl
octoai:openpose_sdxl
octoai:canny_sd15
octoai:depth_sd15
octoai:inpaint_sd15
octoai:ip2p_sd15
octoai:lineart_sd15
octoai:openpose_sd15
octoai:scribble_sd15
octoai:tile_sd15

Other than using the default controlnet checkpoints, you can also upload private ControlNet checkpoints into the OctoAI Asset Library and then use those checkpoints at generation time via the parameter controlnet in the API. For custom controlnet checkpoints, make sure to provide your own ControlNet mask in the controlnet_image parameter.

Below is an example of using a Canny ControlNet along with ControlNet image (left) and a simple prompt A photo of woman wearing a (rose pink dress:1). Canny ControlNet is designed to detect a wide range of edges in images. Given a raw image or sketch, Canny can extract the image’s contours and edges, and use them for image generation. You can see the image (right) generated from SDXL with Canny ControlNet applied.

ControlNet Image

SDXL with Canny ControlNet

Example Code for Canny ControlNet:

$curl -X POST "https://image.octoai.run/generate/controlnet-sdxl" \
> -H "Content-Type: application/json" \
> -H "Authorization: Bearer $OCTOAI_TOKEN" \
> --data-raw '{
> "prompt": "A photo of woman wearing a (rose pink dress:1)",
> "negative_prompt": "Blurry photo, distortion, low-res, poor quality",
> "controlnet": "octoai:canny_sdxl",
> "width": 1024,
> "height": 1024,
> "num_images": 1,
> "sampler": "DDIM",
> "steps": 30,
> "cfg_scale": 12,
> "use_refiner": true,
> "high_noise_frac": 0.8,
> "style_preset": "base",
> "controlnet_conditioning_scale": 1,
> "controlnet_image": "<BASE64 IMAGE>"
> }' | jq -r ".images[0].image_b64" | base64 -d >result.jpg

Below is an example of using a OpenPose ControlNet along with ControlNet image (left) and a prompt An photo of a white man on a japanese tatami mat . OpenPose ControlNet is a fast human keypoint detection model that can extract human poses like positions of hands, legs, and head. See the example below. You can see the image (right) generated from SDXL with OpenPose ControlNet applied.

ControlNet Image

SDXL with OpenPose ControlNet

Example Code for OpenPose ControlNet:

$curl -X POST "https://image.octoai.run/generate/controlnet-sdxl" \
> -H "Content-Type: application/json" \
> -H "Authorization: Bearer $OCTOAI_TOKEN" \
> --data-raw '{
> "prompt": "An photo of a white man on a japanese tatami mat ",
> "negative_prompt": "Blurry photo, distortion, low-res, poor quality, distorted legs, distorted feet, disproportionate hands and ",
> "controlnet": "octoai:openpose_sdxl",
> "width": 1024,
> "height": 1024,
> "num_images": 1,
> "sampler": "DDIM",
> "steps": 30,
> "cfg_scale": 12,
> "use_refiner": true,
> "high_noise_frac": 0.8,
> "style_preset": "base",
> "controlnet_conditioning_scale": 1,
> "controlnet_image": "<BASE64 IMAGE>"
> }' | jq -r ".images[0].image_b64" | base64 -d >result.jpg