FAQs

Rate limits

Rate limits are restrictions on the rate and individual account can submit inference requests.

Rate limits are restrictions applied by OctoAI on the rate at which an individual account can submit inference requests against an API endpoint. It is a mechanism used to ensure predictable performance of the platform, and to allow all OctoAI customers to experience predictable inference latencies. Inference requests that are not completed because of a rate limit cap will return an HTTP 429 response code, and can be retried after an appropriate backoff period.

OctoAI API rate limits

API endpoint	Free tier	Pro tier	Enterprise tier
Text Gen	10 requests per minute	240 requests per minute	Contact us

Media Gen	10 requests per minute	60 requests per minute	Contact us

Higher rate limits are available, please reach out if you need an increase.