Inference
Limitations
Asynchronous inference request size is currently limited to 10mb. Asynchronous inference output data is stored for 24 hours, then automatically deleted.
A long-running inference with duration greater than 1 minute may occasionally encounter an error. If this happens, re-submit your request or reach out to us for help.
Create inference
Starts an inference at the specified endpoint URL for the data inputs you provide. The request is synchronous by default, and you can optionally specify the request as asynchronous. Input parameters are included in the cURL example of each endpoint.
API requests to your endpoints must be authenticated with a token - you can generate a token from the Account Settings page. Be sure to include your token in the header of your requests.
Example synchronous cURL request:
Asynchronous inference
You can create an asynchronous inference by specifying X-OctoAI-Async: 1
in the request header.
Example asynchronous cURL request:
You’ll receive a response ID and poll URL where you can poll for the status and results:
Get inference
Poll for status
Use the poll_url
to return the status of an inference, which will be one of these values:
pending
: the inference is waiting or starting uprunning
: the inference is in progresscompleted
: the inference is finished
Example poll cURL request:
Example pending poll response:
Get inference data
When completed, the provided response_url
will include the inference data. Asynchronous inference output data is stored for 24 hours, then automatically deleted.
Example completed poll response:
Example cURL request for completed inference data: