OctoAI is on a mission to offer easy access to efficient compute and enable users to integrate their choice of AI models into applications. OctoAI helps you run, tune, and scale AI applications easily:


We give you pre-optimized endpoints for popular open source models that you can immediately use to prototype your app for free, as well as a CLI to easily deploy endpoints from your custom models.


We automatically scale you endpoints anywhere between 0 hardware replicas to as many as you need. We support customers in scaling to 100,000 monthly active users this year.

New users get $10 worth of free credits for signing up; the credits expire within about one month. That is equivalent of:

  • Over 500,000 words with the largest Llama 2 70B model, and over a million words with the new Mixtral 8x7B model
  • 1,000 SDXL default images
  • 2+ hours of compute on our large tier hardware
  • 9+ hours of compute on our medium tier hardware
  • 27+ hours of compute on our small tier hardware

Join our Discord community to learn about the applications other customers are building, get help, or just tell us what you are excited about!

In order to get started you can:


Try our demo endpoints


Install our CLI/SDK

Start building endpoints for your custom models!