Stable Video Diffusion REST API

Generate an videos using a Stable Video Diffusion (SVD) model.

OctoAI’s SVDL API supports image-to-video use cases. For text-to-video use cases, utilize our Image Gen APIs such as SDXL, SD15, SD3, Juggernaut XI APIs followed by the SVD API.

You need to create an OctoAI Authentication Token to access this API.

How to use

Invoke https://image.octoai.run/generate/svd endpoint with a POST request.

The headers of the request must include an Authentication Token in the authorization field. The accept header should be set to application/json to receive the image encoded as base64 in a JSON response.

Generating a video with an image: The only required parameter is image. It should be in base64 encoded string format.

For more details about all parameters, please see the request schema below.

Output

3-4 secs long video with resolution closest to the input image. For all supported resolutions, refer Parameters section below.

Pricing

$0.15 per video, any resolution, 25 steps; billed per video

Check Pricing Page for more details.

Request Details

Headers:

Authorization (Required): Your OCTOAI_TOKEN
Content-Type (Required): Set to application/json

Parameters:

image (base64 encoded image, required) - Starting point image encoded in base64 string
height (int; optional) - Integer representing the height of video/animation to generate- If not provided, the output height will be inferred from the input ‘image’, and the closest resolution supported will be chosen.
width (int; optional) - Integer representing the width of video/animation to generate- If not provided, the output width will be inferred from the input ‘image’, and the closest resolution supported will be chosen.

Supported resolutions are (w,h): (576, 1024), (1024, 576), (768, 768)
cfg_scale (float; optional) - Floating-point number representing how closely to adhere to ‘image’ description- Must be a positive number no greater than 10.0.
fps (int; optional) - How fast the generated frames should play back.
steps (int; optional) - Integer representing how many steps of diffusion to run- Must be greater than 0 and less than or equal to 50.
motion_scale (float; optional) - A floating point number between 0 and 1 indicating how much motion should be in the generated animation.
noise_aug_strength (float; optional) - How much noise to add to the initial image- higher values encourage creativity.
num_videos (int; optional) - Integer representing how many output videos/animations to generate with a single image and configuration. You can generate upto 16 videos in a single API request. All videos will be generated in sequence within the same configurations but different seed values.
seed (int; optional) - Integer number or list of integers representing the seeds of random generators. Fixing random seed is useful when attempting to generate a specific video (or set of videos).

Response

videos (list) - List of generation(s) generated by the request.
prediction_time_ms (float) - Total runtime of the video/animations(s) generation(s).

$ curl -X POST "https://image.octoai.run/generate/svd" \
>     -H "Content-Type: application/json" \
>     -H "Authorization: Bearer $OCTOAI_TOKEN" \
>     --data-raw '{
>         "image": "<BASE_64_STRING>",
>         "steps": 40,
>         "cfg_scale": 3,
>         "fps": 4,
>         "motion_scale": 0.2,
>         "noise_aug_strength": 0.55,
>         "num_videos": 1,
>         "seed": "2138732363"
>     }' | jq -r ".videos[0].video" | base64 -d >result.mp4

$	curl -X POST "https://image.octoai.run/generate/svd" \
>	-H "Content-Type: application/json" \
>	-H "Authorization: Bearer $OCTOAI_TOKEN" \
>	--data-raw '{
>	"image": "<BASE_64_STRING>",
>	"steps": 40,
>	"cfg_scale": 3,
>	"fps": 4,
>	"motion_scale": 0.2,
>	"noise_aug_strength": 0.55,
>	"num_videos": 1,
>	"seed": "2138732363"
>	}' \| jq -r ".videos[0].video" \| base64 -d >result.mp4