At OctoAI you only pay for what you use. Upon sign up you will receive $10 of free credit in your account. This credit can be used until the end of your first month after sign up. That is equivalent of:

  • Over 500,000 words with the largest Llama 2 70B model, and over a million words with the new Mixtral 8x7B model
  • 1,000 SDXL default images
  • 2+ hours of compute on our large tier hardware
  • 9+ hours of compute on our medium tier hardware
  • 27+ hours of compute on our small tier hardware

How does billing work?

You can enter a credit card at any time, and your account with automatically be charged to keep your credit replenished. This can be a minimum of $10 or a maximum amount set by you. We auto-reload your account when the balance reaches 10% of your reload amount. Your account must have a positive balance to use any OctoAI services.

When you are close to running out of free credits we will prompt you to enter your credit card information, which you can do in your account usage page. If you do not enter a credit card before your free credits run out your account will be suspended and your endpoints terminated.

Where can I find my billing data?

You can view your plan tier, invoices, and itemized usage for all OctoAI services in Billings & Usage in your account at anytime.

What are the rate limits for each solution?

See rate limits for details, and feel free to contact us to discuss higher limits to meet your needs. You will recieve an HTTP 429 response code if you reach the limit cap.

Image Gen Solution

Below is a full feature breakdown of the Image Gen Solution tiers.

SDXL and SD 1.5 text2img, img2img, Inpainting, ControlNetCost-optimizedCost-optimizedOption for Cost-optimized or Latency-optimized
Custom Assets (checkpoints, loras, inversions, VAEs)βŒβœ…βœ…
Option for SLA guaranteesβŒβŒβœ…
Option for Private Deployment (at higher price)βŒβŒβœ…
Dedicated Customer Success ManagerβŒβŒβœ…

Pro pricing for Image Gen Solution

Pricing for default image features and configurations are below:

Feature TypeStepsResolutionSamplerPrice
SDXL301024x1024DDIM (and any not listed below as premium)$.004/image
SDXL with Custom Asset (Fine-tuned)301024x1024DDIM (and any not listed below as premium)$.008/image
SDXL Fine-tuning500N/AN/A$.25/tune
SD 1.5 with Base or Custom Asset (Fine-tuned)30512x512DDIM (and any not listed below as premium)$.0015/image
SD1.5 Fine-tuning500N/AN/A$0.1/tune
Asset library (storage)N/AN/AN/A$.05/GB stored per month, after the first 50GB
Background RemovalN/AN/AN/A$.002/request

The price for each feature type changes as listed below for non-default configurations:

Configuration TypePrice Formula
Image Generation StepsDefault price * (step_count/30)
SDXL ResolutionsDefault price *(pixel_count/(1024*1024))
SD1.5 ResolutionsDefault price * (pixel_count/(512*512))
Premium Samplers: DPM_2, DPM_2_ANCESTRAL, DPM_PLUS_PLUS_SDE_KARRAS, HEUN, KLMSDefault price *2
Fine-tuning StepsDefault price * (step_count/500)

Here are a few examples to illustrate how this works as well as a calculator (coming soon!) to assist you in applying to your own use case:

Feature TypeStepsResolutionSamplerPrice
SDXL401024x1024DDIM (default)$.0053
SDXL401024x1024DPM_2_ANCESTRAL (premium)$.0107
SDXL with LCM-LoRA (Fine-tuned)41024x1024LCM$.001
SDXL with Custom Asset (Fine-tuned)601024x1024DDIM (default)$.016
SDXL with Custom Asset (Fine-tuned)601024x1024DPM_2 (premium)$.032
SD 1.540512x512DDIM (default)$.002
SD1.5601024x1024DDIM (default)$.003
SD1.5401024x1024DPM_2 (premium)$.009
SDXL Fine-tuning1000N/AN/A$.5

Text Gen Solution

We offer simple, competitive token-based pricing for text gen endpoints, with prices varying depending on parameter size and quantization level:

ModelInput PriceOutput Price
Mixtral-8x7B models$0.00030 / 1K tokens$0.00050 / 1K tokens
Mistral-7B models$0.00010 / 1K tokens$0.00025 / 1K tokens
Llama2-70B models$0.00060 / 1K tokens$0.00190 / 1K tokens
Llama2-13B models$0.00020 / 1K tokens$0.00050 / 1K tokens
CodeLlama-70B models$0.00060 / 1K tokens$0.00190 / 1K tokens
CodeLlama-34B models$0.00050 / 1K tokens$0.00100 / 1K tokens
CodeLlama-13B models$0.00020 / 1K tokens$0.00050 / 1K tokens
CodeLlama-7B models$0.00010 / 1K tokens$0.00025 / 1K tokens
gte-large$0.00005 / 1K tokens

If you would like to explore pricing for other models, quantization levels, or specific fine tunes, contact us.

Compute Service

Deploy endpoint from any container (private or public registry)βœ…βœ…βœ…
Example models from communityβœ…βœ…βœ…
CLI and SDK for containerizing + deploying Python modelsβœ…βœ…βœ…
Max endpoints per account210No limit
Max replicas per endpoint310No limit
Auto-acceleration of PyTorch models❌❌Early access
Dedicated Customer Success ManagerβŒβŒβœ…

Pro pricing for Compute Service

  1. Large 80: A100 GPU with 80GB memory @ $0.00145 per second (~$5.20 per hour)
  2. Large 40: A100 GPU with 40GB memory @ $0.00114 per second (~$4.10 per hour)
  3. Medium: A10 GPU with 24GB memory @ $0.00032 per second (~$1.15 per hour)
  4. Small: T4 GPU with 16GB memory @ $0.00011 per second (~$0.40 per hour)

Billing is by the second of compute usage, starting at the time when the endpoint is ready for inferences. The time when the endpoint is ready for inferences is when either the healtcheck on your end point begins returning 200, or if there is no healthcheck, the time you see the β€œReplica is running” log line in your events tab.

  • You will be billed for the total inference duration and timeout duration
  • You will not be billed for the duration of cold start

Example models in the platform have a pre-set hardware / pricing tier. If you create an endpoint from a custom model, you can choose the tier best suited to your needs.