Sign up
Log in
Sign up
Log in
On-demand webinar
Learn from our technical deep dive into using function calling to develop AI agents.
Watch now

Efficient
GenAI inference

Build and scale production applications on the latest optimized models and fine tunes

Try APIs Free
Self-hosted Demo
text gen

Text Gen API

LLMs for chat, summarization, and structured output

media gen

Media Gen API

Diffusion models for stunning image and video

private deployment icon

OctoStack

Turnkey GenAI stack in your environment

Innovators Choose OctoAI

“Working with the OctoAI team, we were able to quickly evaluate the new model, validate its performance through our proof of concept phase, and move the model to production. Mixtral on OctoAI serves a majority of the inferences and end player experiences on AI Dungeon today.”

Read story
Nick Walton portrait

Nick Walton

CEO & Co-Founder Latitude

Read story

GenAI production stack: SaaS or in your environment

The foundation of OctoAI is systems and compilation technologies we’ve pioneered, like XG Boost, TVM, and MLC, giving you an enterprise system that runs in our SaaS or your private environment.

Diagram of OctoAI GenAI systems stack showing OctoAI's solutions, models, and AI serving stack powered by broad hardware

Enterprise-grade inference

SOC 2 Type II certified

Your data security and privacy is a top priority for OctoAI. We continually invest in security capabilities and practices in our platform and processes. 

Learn more
OctoAI is SOC 2 Type II certified as of fall 2023
New Solution

OctoStack from OctoAI: GenAI in your environment

OctoStack is a turnkey GenAI serving stack to run your optimized models in your environment on your GPUs. Lower your total cost of ownership and deploy models with greater agility while ensuring data privacy.

Learn more
Overview diagram of how OctoStack by OctoAI would work in your infrastructure environment

What’s New at OctoAI

news icon

Customer & Product Updates

Stable Diffusion 3 (SD3) is now available on OctoAI

Jul 9, 2024
2 minutes

HyperWrite: Elevating User Experience and Business Performance with OctoAI's Cutting-Edge AI Platform

Jun 26, 2024
3 minutes

IP Adapter for Creative Content Production

Jun 25, 2024
5 minutes

Streamline Jira ticket creation with OctoAI’s structured outputs

Jun 19, 2024
8 minutes
Visit the blog
box in gear icon

Latest Models

See all models
news icon

Customer & Product Updates

Stable Diffusion 3 (SD3) is now available on OctoAI

Jul 9, 2024
2 minutes

HyperWrite: Elevating User Experience and Business Performance with OctoAI's Cutting-Edge AI Platform

Jun 26, 2024
3 minutes

IP Adapter for Creative Content Production

Jun 25, 2024
5 minutes

Streamline Jira ticket creation with OctoAI’s structured outputs

Jun 19, 2024
8 minutes
Visit the blog

Demos & Webinars

View all demos & webinars

Your choice of models and fine tunes

Start building in minutes. Gain the freedom to run on any model or checkpoint on our efficient API endpoints.

Try APIs Free
Self-hosted Demo
octoai asset create --name checkpoint-panda --upload-from-hf-repo NeuralNovel/Panda-7B-v0.1 \
--engine text/mistral-7b-instruct \
--data-type fp16 \
--format safetensors \
--type checkpoint \
--transfer-api sts