Run GenAI in yourenvironment
OctoStack is a turnkey production GenAI serving stack that delivers highly-optimized inference at enterprise scale.
Efficient reliable self-contained GenAI deployment
OctoStack allows you to run your choice of models in your environment, including any cloud platform, VPC, or on-premise, ensuring full control over your data. This solution encompasses state-of-the-art model serving technology meticulously optimized at every layer, from data input to GPU code.
![Overview diagram of how OctoStack by OctoAI would work in your infrastructure environment Overview diagram of how OctoStack by OctoAI would work in your infrastructure environment](https://www.datocms-assets.com/45680/1714762039-octostack-diagram-dark-mode-centered.png?auto=format&w=1800)
GenAI serving stack
OctoAI removes the complications of running, managing, and scaling GenAI systems, so you can focus on developing your AI apps and projects.
OctoAI is built upon systems and compilation technologies we launched: XGBoost, Apache TVM, and MLC/MLC-LLM, providing you an enterprise system running in your private environment.
![Diagram overview of OctoAI systems showing APIs, solutions, Soc 2Type 2 certification, and support for hardware and environments Diagram overview of OctoAI systems showing APIs, solutions, Soc 2Type 2 certification, and support for hardware and environments](https://www.datocms-assets.com/45680/1720630638-octoai-overall-diagram-soc2-type2-included.png?auto=format&w=795)
OctoStack delivers on our performance and security-sensitive use case. It lets us easily and efficiently run the customized models we need within the environments we choose and supports the scale our customers require.
![Dali Kaafar portrait Dali Kaafar portrait](https://www.datocms-assets.com/45680/1711407449-dali-kaafar-apate-ceo.jpeg?auto=format&w=431)
Dali Kaafar
CEO Apate AI
![Apate AI logo Apate AI logo](https://www.datocms-assets.com/45680/1711049682-apate-ai-logo.png?auto=format&w=100)
OctoStack's 10x performance boost
We built optimization technologies into the OctoAI systems stack, and these are available for builders to turn on or off, as needed, for their deployments. Benchmarking shows 4x to 5x improvements in low concurrency use cases, and up to 10x or more for larger scale deployments with tens to hundreds of concurrent users.
![Multi-user Throughput of vLLM compared to OctoStack chart Multi-user Throughput of vLLM compared to OctoStack chart](https://www.datocms-assets.com/45680/1714760904-octostack-multi-user-throughput-compared-to-vllm-chart-4.png?auto=format&w=3132)
See 4x GPU utilization improvements
Maximize the effectiveness of your GPUs when you combine them with OctoStack’s optimized serving layer. Instantly reduce costs and latency compared to proprietary model providers and DIY deployment methods.
Get the most from your data with OctoStack and Snowflake
Generative AI is making it easier for companies to derive more value from their data by using LLMs in their secure environment. Using Retrieval Augmented Generation, RAG, with OctoStack in Snowflake’s Snowpark expands the use of your own datasets, so users can conversationally ask questions and improve business outcomes.
![Diagram showing how RAG can work using OctoStack in your environment to enrich you existing data and user usage of that data Diagram showing how RAG can work using OctoStack in your environment to enrich you existing data and user usage of that data](https://www.datocms-assets.com/45680/1720827260-rag-with-octostack-and-your-snowflake-datastores.png?auto=format&w=1425)
Securely and confidently run GenAI at scale
Benefit from OctoAI's expertise in inference optimization while meeting privacy and compliance requirements. Scale enterprise applications with reliable performance.
![blue speedometer icon blue speedometer icon](https://www.datocms-assets.com/45680/1697495905-blue-speedometer-icon.png?auto=format&w=90)
Run any model, fast
Select the ideal mix of open-source, custom, and proprietary models while maximizing performance.
In your environment
In your virtual private cloud (VPC), in your cloud of choice: AWS, Microsoft Azure, Coreweave, Google Cloud Platform, Lambda Labs, OCI, Snowflake, and others.
Hardware flexibility
Run models on-premise on your choice of hardware including a broad range of NVIDIA GPUs, AMD, Google TPUs, AWS Inferentia, and more.
Data privacy
Our production-ready stack runs in your environment next to your data to meet your security and privacy needs.
Learn more about OctoStack
Review these resources to get an in-depth understanding of how OctoStack can expedite running GenAI in your environment — privately and securely.
OctoStack Info Brief
GenAI in your environment
Bring GenAI to your Datastore
Frequently asked questions
Don’t see the answer to your question here? Feel free to reach out so we can help.
Smaller models (7B and 8B) can run on NVIDIA A10G GPUs, while the minimum recommendation for larger models (70B or 8x7B) are two A100 or H100 GPUs. These GPUs tend to provide the best throughput and latency. If you are unsure about your hardware capabilities please contact us, and our experts will work to understand your requirements and current resources.
Yes, OctoStack is designed to be able to support deployment within customer environment, including environments with no connectivity to the Internet.
OctoStack emits metrics allowing you to decide when to scale GPU's up and down. These metrics include the number of pending and in-flight requests along with requests per second.
OctoStack includes an advanced load balancing solution to increase GPU throughput and utilization. Our system is designed for generative AI workloads, and can manage bot homogenous and heterogenous workload profiles.
Yes, OctoStack runs the same OctoAI serving stack as our SaaS API endpoint, and has the same capabilities available.
We have integrations and partnerships with industry leading services, including LangChain, Unstructured, LlamaIndex, Pinecone, and others. All of these integrations are available in OctoStack.
GenAI in your environment with optimized performance
Your choice of models, while controlling your data and utilizing OctoAI's world-class inference optimization.