Text Gen Solution

Fast, cost-optimized open source LLM endpoints

Quickly evaluate and scale the latest open source language models by leveraging OctoAI's singular API. Our deep expertise in model compilation and ML systems expertise means you get low-latency, affordable endpoints that can handle any production workload.

An LLM summarization and question and answer chatbot powered by OctoAI

Read about our customers

Latitude Games lowers costs by 5x and unlocks new game experiences with Mixtral on OctoAI

Blog Author - Deepak Mohan
Blog Author - Nick Walton
Deepak Mohan & Nick Walton
Feb 16, 2024
Read more

Adopt open source LLMs in three lines of code

Build on your choice of Mixtral, Nous Hermes 2 Mixtral, Mistral, Llama 2-Chat, or Code Llama models on our blazing fast API endpoints. Scale seamlessly and reliably without dropping performance.

Migrate with Ease

OpenAI SDK users move to OctoAI's compatible API with minimal effort. See how.

Adaptive scalability

Growth-ready for your app

Robust reliability

Ensuring your services will always work properly

Low cost with high performance

Keeping customers and the finance departments happy

from openai import OpenAI
import os

client = OpenAI(
   base_url = "https://text.octoai.run/v1",
   api_key = os.environ['OCTOAI_TOKEN']

completion = client.chat.completions.create(
    # model="gpt-3.5-turbo",
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},

The LLM landscape is changing almost every day, and we need the flexibility to quickly select and test the latest options. OctoAI made it easy for us to evaluate a number of fine tuned Llama 2 variants for our needs, identify the best one, and move it to production for our application.

Matt Shumer

CEO & Co-Founder Otherside AI

Otherside AI

Fine-tune LLMs on OctoAI

Work with OctoAI’s fine-tuning capabilities to customize your text app for your needs. You can fine-tune easily with our proprietary tools, and trust the safety of your data.

NEW: GTE Large

Text embedding for RAG

Utilize GTE Large embedding endpoint to facilitate retrieval augmented generation (RAG) or semantic search for your apps. With a score of 63.13% on the MTEB leaderboard and and compatible API, migrating from OpenAI requires minimal code updates. Learn how.

Build using high quality and low cost Mixtral Instruct

Our accelerated Mixtral delivers quality competitive with GPT 3.5, but with open source flexibility. Enjoy reduced costs with our 4x lower price per token than GPT 3.5. Migrating is made easy with one unified OpenAI compatible API.

Build using more models for your use case

Using OctoAI you can link several generative models together to create a highly performant pipeline. You can build new experiences specifically for your industry needs using language, images, audio, or your own custom models.

