Rate limits
Rate limits are restrictions on the rate and individual account can submit inference requests.
Rate limits are restrictions applied by OctoAI on the rate at which an individual account can submit inference requests against an API endpoint. It is a mechanism used to ensure predictable performance of the platform, and to allow all OctoAI customers to experience predictable inference latencies. Inference requests that are not completed because of a rate limit cap will return an HTTP 429 response code, and can be retried after an appropriate backoff period.