Sign up
Log in
Sign up
Log in
New Webinar
August 7: Learn to optimize LLMs for cost and quality, outperforming GPT-4
Register now

Pricing & Plans

Get started today on OctoAI and receive $10 of free credit in your account.

OverviewText Gen SolutionMedia Gen Solution

Text Gen Solution

The $10 credit is the equivalent of over a million words of output with the largest Llama 3 70B model and Mixtral 8x7B model.

OctoAI’s unified API endpoint means you can build on your choice of models using your fine tunes.

New
Model Remix Credits

We're giving away up to 150x bonus credits for our brand new Text Gen Solution on top of our industry-leading cost-per-token. Requires certain spend or commit to spend.

See detailed pricing
Features
Free Trial

$10

Free credit upon sign up

Get started building your project

GTE Large
Bring your Fine Tune
Fine-tuning
Bring your choice of checkpoints
Committed use discounts
Performance optimization options
Contractual SLAs
Dedicated Customer Success Manager
Option for private deployment
Sign Up
See detailed pricing
Features
Free Trial

$10

Free credit upon sign up

Get started building your project

Pro

$0.15

Per 1M tokens for 7B and 8B models

$3 in/$9 out

Per 1M tokens for 405B models

Enterprise

Contact Us

Bring your own checkpoint

GTE Large
Bring your Fine Tune
Fine-tuning
Bring your choice of checkpoints
Committed use discounts
Performance optimization options
Contractual SLAs
Dedicated Customer Success Manager
Option for private deployment
Sign Up
Sign up
Contact us

Frequently asked questions

Don’t see the answer to your question here? Feel free to reach out so we can help.

What are your rate limits for the Text Gen Solution?

The rate limits are as follows:

  • Free Tier = 10 RPM

  • Pro Tier = 240 RPM

  • Enterprise Tier = Contact us

Higher rate limits are available, please reach out if you need an increase.
What are input and output tokens?

Tokens are units used to measure input and output text for LLMs. 1,000 tokens is about 750 words. Input tokens measure tokens in the input prompt (including context information). Output tokens are generated by the model.

How is RAG implemented?

There are multiple ways in which customers can build a RAG application on OctoAI. OctoAI allows customers to run their choice of LLMs (like Llama 2 70B, Mixtral 8x7B, Mixtral 8x22B) and embedding models (like gte-large). With these primitives, customers can use their preferred vector database as the reference data store for their RAG application. OctoAI also supports integrations with popular LLM application development frameworks like LangChain, allowing the use of pre-built functions in LangChain to simplify their RAG application development. Lastly, OctoAI supports integrations into turnkey RAG frameworks like PineCone Canopy for customers to easily implement RAG with their data.

Is it possible to pre-define a prompt?

All our Text Gen Solution code samples do have system prompts included, like: "role": "system", "content": "You are a helpful assistant." It should be noted that Mistral models do not support system prompts out of the box.

Start building with ease in minutes using OctoAI

We enable users to harness the value from AI innovations to build the next generation of intelligent applications. Sign up and enjoy the freedom to choose your model, infrastructure, and deployment templates.

Sign Up Today
Talk to sales