Rate limits

Rate limits are restrictions on the rate and individual account can submit inference requests.

Rate limits are restrictions applied by OctoAI on the rate at which an individual account can submit inference requests against an API endpoint. It is a mechanism used to ensure predictable performance of the platform, and to allow all OctoAI customers to experience predictable inference latencies. Inference requests that are not completed because of a rate limit cap will return an HTTP 429 response code, and can be retried after an appropriate backoff period.

OctoAI API rate limits

API endpointFree tierPro tier
Text Gen10 requests per minute240 requests per minute
Media Gen10 requests per minute60 requests per minute