OctoAI Logo
Sign up
Log in
Sign up
Log in

RunTuneScale generative AI in the cloud

OctoAI delivers production-grade GenAI solutions running on the most efficient compute, empowering builders to launch the next generation of AI applications.

Sign Up
Contact Us

Make GenAI work for you

"Working with OctoAI, we quickly evaluated Mixtral, validated its performance, and moved the model to production. Mixtral on OctoAI serves a majority of the inferences on AI Dungeon."

Deck Author - Nick Walton
Nick WaltonCEO & Co-Fouder @ Latitude

“Speed is key to the AI art experience we deliver. We've increased our image generation speeds by 5x with OctoAI’s low latency inferences, resulting in more usage and growth for our platform!”

Deck Author - Angus Russell
Angus RussellFounder @ NightCafe

"The LLM landscape is changing almost every day. OctoAI made it easy to evaluate a number of fine-tuned models for our needs, identify the best, and move it to production for our app."

Deck Author - Matt Shumer
Otherside AI
Matt ShumerCEO & Co-Founder @ Otherside AI

Tap into AI expertise builders need to succeed

OctoAI emerged from deep expertise in AI systems: hardware enablement, model acceleration, and machine learning compilation and infrastructure. Leave the complexities of scaling ML to us and focus your resources on developing an app that meets the moment.

blue security icon


The only SOC 2 Type II certified production grade GenAI platform in the market.

red reliability icon


Our strong cloud partnerships ensure ample compute capacity, with autoscaling and aggressive SLAs ensuring your app is supported as your usage grows.

yellow scale icon


Effortlessly scales with your app and user base, allowing you to provide the best possible user experience.

dark grey support icon

Expert Support

Ensure technical and business success by working hand-in-hand with an experienced team of customer engineers and account managers at every step.

Read more about our customers

Text Gen Solution

OthersideAI achieves 12x reduction in LLM costs over OpenAI, with Mixtral on OctoAI

Blog Author - Ben Hamm
Blog Author - Matt Shumer
Ben Hamm & Matt Shumer
Jan 23, 2024
Read more
Otherside AI
Storytime AI

Generate, classify, and summarize text with the utmost control

OctoAI is the fastest and most flexible place to leverage the best open source large language models: Mixtral, Smaug 72B, Mistral, Code Llama, and Llama 2 Chat. Build with the best OSS models that best delivers for your users and business, controlling the development from end-to-end.

Learn more
An LLM summarization and question and answer chatbot powered by OctoAI

Generate breathtaking imagery in your app

OctoAI’s Image Generation is the most performant and customizable solution for Stable Diffusion and Stable Diffusion XL. Create, store, and orchestrate model assets at scale to deliver highly differentiated end-user experiences.

Learn more

Run your choice of OSS, fine-tuned, or custom models performantly at scale

Save significant engineering resources spent rolling deployment pipelines and tap into OctoAI’s sophisticated ML infrastructure and efficient, scalable compute. Effortlessly bring custom models or models from popular hubs like HuggingFace.

Learn more
Bring your fine-tuned model and use the OctoAI compute service for fast and efficient reliable service

Our ML experts deliver the fastest, cheapest foundational models

The OctoAI team includes recognized leaders in ML systems, ML compilation, and hardware intrinsics who have founded widely adopted open source ML projects including Apache TVM and XGBoost. Our accelerated models are in production at hyperscalers like Microsoft where they process billions of images a month in services like Xbox.

curl -X POST https://your-sd-endpoint.octoai.cloud/predict' \
-H 'content-type: application/json' \
-H 'Authorization: BEARER {apiKey}' \
--data '{"prompt:"an oil painting of an octopus playing chess", "width":512, "height":512, "guidance_scale":7.5, "num_images_per_prompt":1, "num_inference_steps":3-, "seed":0, "negative_prompt":"frog", "solver":"DPMSolverMultistep"}'
 > test_curl )json.out

SDXL (accelerated)


second image generation

over 2x

faster than base model

curl -X POST "https://text.octoai.run/v1/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OCTOAI_TOKEN" --data-raw '{"model": "mixtral-8x7b-instruct-fp16", "prompt": "Hello world!", "max_tokens": "250", "presence_penalty": 0, "temperature": 0.1, "top_p": 0.9 }'

Mixtral (accelerated)


lower price per token than GPT 3.5


lower latency than GPT-4 Turbo

Read about our work