Pricing & billing
Only pay for what you use.
At OctoAI you only pay for what you use. Upon sign up you will receive $10 of free credit in your account, and these credits don’t expire. That is equivalent of:
- Over a million words of output with the largest Llama 3 70B model and Mixtral 8x7B model.
- 1,000 SDXL default images and about 66 Stable Video Diffusion animations
How does billing work?
OctoAI uses post-paid billing - add a credit card and pay for your use at the end of each month. All existing credits will remain available within your account and will be used before any post-paid billing is applied.
On the 1st day of each month, we’ll send an invoice so you can see the upcoming charge. On the 7th day of each month, we’ll charge the card on file for the prior billing period. If there’s an issue charging your credit card, you can manually pay via the invoice.
Where can I find my billing data?
You can view your plan tier, invoices, and itemized usage for all OctoAI services in Billing & Usage in your account at anytime.
What are the rate limits for each solution?
See rate limits for details. You will recieve an HTTP 429 response code if you reach the limit cap.
Media Gen Solution
Below is a full feature breakdown of the Media Gen Solution tiers.
Pro pricing for Media Gen Solution
Pricing for default image features and configurations are below:
The price for each feature type changes as listed below for non-default configurations:
Here are a few examples to illustrate how this works to assist you in applying to your own use case:
Text Gen Solution
We offer simple, competitive token-based pricing for text gen endpoints, with prices varying depending on parameter size and quantization level:
*The cost for fine-tuning training is calculated per epoch per million tokens. After your fine-tunes, the price for LoRA inference will be the same cost as the base model inference with no extra fees. For Llama 3.1 70B model fine-tuning, please contact us for access.
Compute Service
Pro pricing for Compute Service
- Large 80: A100 GPU with 80GB memory @ $0.00145 per second (~$5.20 per hour)
- Large 40: A100 GPU with 40GB memory @ $0.00114 per second (~$4.10 per hour)
- Medium: A10 GPU with 24GB memory @ $0.00032 per second (~$1.15 per hour)
- Small: T4 GPU with 16GB memory @ $0.00011 per second (~$0.40 per hour)
Billing is by the second of compute usage, starting at the time when the endpoint is ready for inferences. The time when the endpoint is ready for inferences is when either the healtcheck on your end point begins returning 200, or if there is no healthcheck, the time you see the “Replica is running” log line in your events tab.
- You will be billed for the total inference duration and timeout duration
- You will not be billed for the duration of cold start
Example models in the platform have a pre-set hardware / pricing tier. If you create an endpoint from a custom model, you can choose the tier best suited to your needs.