Sign up
Log in
Sign up
Log in
Live Webinar: June 25th
Join our Builder's Roundtable to learn all about fine-tuning LLMs
Register now

OthersideAI achieves 12x reduction in LLM costs over OpenAI, with Mixtral on OctoAI

Blog Author - Ben Hamm
Blog Author - Matt Shumer

Jan 23, 2024

4 minutes
HyperWrite is an AI writing assistant built by OthersideAI which delivers real-time task assistance to customers. Quality, reliability, and costs are critical for retention and growth, and the OthersideAI team wanted to explore adding open source LLMs to the mix to improve these attributes. Working with OctoAI, OthersideAI evaluated and incorporated multiple open source models to the mix. Today, OctoAI serves production inferences supporting HyperWrite’s millions of customers, at 12x lower cost compared to OpenAI, comparable quality, and up to 30% lower latency.

Read on to learn more. You can get started with OctoAI Text Gen Solution with a free trial today.

OthersideAI's Goal: Tap into the power of open source LLMs

Open source LLMs have seen tremendous growth through 2023. From the first Llama from Meta, through Falcon, Llama 2 and CodeLlama, and now the Mixtral 8x7B mixture-of-experts model from Mistral, open source LLMs have been improving in leaps and bounds through the year. These models bring to AI innovators the ability to deliver highly customized and differentiated experiences, all at lower overall costs and speeds, compared to proprietary models like GPT 3.5 or GPT 4 from OpenAI.

OthersideAI and HyperWrite sit at the forefront of this space. With expansion of features and the growth in usage, HyperWrite had started hitting OpenAI rate limits with increasing frequency. The real-time nature of the product and millions of active customers make reliability and experience critical. In order to address these better, the team wanted to complement OpenAI usage with open source models, and to explore ways to deliver the desired production experience in a more cost-effective manner.

Evaluations validate the efficacy of fine-tuned open source models

After an initial proof of concept deployment and live traffic testing to validate reliability and latency, the OthersideAI team started a series of evaluations to compare different open source models to the baseline GPT-powered experience. Given the subjective nature of LLM performance, OthersideAI used a direct approach to evaluate the outcome – customer feedback, inferred through mechanisms like user upvotes (shown in the figure below) and conversions into paid services. Each new model was introduced and evaluated through A/B testing where quality was metricized through this framework.

Custom fine-tuning against use case specific needs and data showed a measurable improvement in quality of outcomes — including measurable increases in upvotes for models fine-tuned by OthersideAI. This same pattern was observed with smaller models, with custom fine tuned variants of smaller models delivering quality that outperformed larger models that were not fine-tuned for the use case. With the launch of the Mixtral 8x7B, the evaluations shifted to Mixtral on OctoAI — starting with the Mixtral 8x7B Instruct variant, and then moving on to a custom fine-tuned version.

Alongside, the OctoAI team expanded capabilities to improve flexibility and efficiency for such evaluations — including the ability to easily bring and load new fine-tuned checkpoints for a given model, batch processing to improve efficiency, lower latencies through effective multi-GPU utilization for the newer models, all building on OctoAI’s model acceleration and efficiencies.

12x savings over OpenAI and lower latency

With the custom fine-tuned Mixtral 8x7B on OctoAI, OthersideAI has seen LLM costs for the feature drop by about 12x compared to using GPT-4 Turbo on OpenAI, and latency reductions of up to 30%. The speed and cost improvements, combined with the overall quality, make Mixtral on OctoAI compelling as usage scales. The team believes that ongoing fine-tuning efforts will further improve the quality, and continue to improve the quality of end-customer experience and outcomes possible with HyperWrite. Work towards deploying and validating these are actively in progress!

The momentum in open source GenAI models is fascinating. In a year’s time, many if not most AI-powered business use-cases will be best served on open source models and fine-tuned versions of these models. The 12x cost savings and comparable quality we are seeing today are just early evidence of that, and we intend to continue investing in this direction in partnership with OctoAI.

Matt Shumer CEO & Co-Founder Otherside AI

Looking ahead, this open-source focus will also be key to new products being built by OthersideAI, including the self-operating-computer project — an open-source multi-modal framework to operate a computer using GenAI. OthersideAI and OctoAI will continue to collaborate on model evaluations and model serving improvements, to drive further improvements to the HyperWrite experience and to support future projects like the self-operating-computer.

Try the OctoAI Text Gen Solution today

You can start evaluating LLMs and building on OctoAI Text Gen Solution with a free trial today. You can also sign up and get started with HyperWrite today at no cost with the Starter plan. If you’d like to engage with the broader OctoAI community and teams, please join our OctoAI Discord.

For customers currently using a closed source LLM like GPT-3.5 Turbo, GPT-4 or GPT-4 Turbo from OpenAI, OctoAI has a new promotion to help accelerate adoption of open source LLMs. You can read more about this in the OctoAI Model Remix Program introduction.


Related Posts

All Posts
OctoAI Model Remix Program Shrinks Your Dependency on OpenAI

In the generative AI community, there’s a growing movement towards open source models in an effort to reduce dependence on OpenAI – and the recent corporate governance blow up is only a small part of why. Many OpenAI customers are finding it’s not the best-fit for every use-case.

Blog Author - Deepak Mohan
Deepak Mohan
Drop the “fine-tuning tax”: OctoAI brings you the industry’s lowest token based prices for fine-tuned LLMs

OctoAI delivers the best unit economics for generative AI models, and these efficiencies extend to fine-tuned LLMs on OctoAI.

Blog Author - Deepak Mohan
Deepak Mohan
Your 10 AI Resolutions for 2024

If last year was any indication, expect the pace of AI innovation to continue at warp speed. But don’t worry — OctoAI is here to use our ML systems expertise to help you turn rapid market change into your strategic advantage. Our hardware-independent infrastructure empowers you to build with any model you choose, ensuring you're fully prepared for whatever 2024 has in store.

Blog Author - Brittany Carambio
Brittany Carambio