OctoAI Logo
HomeBlogIntroducing the Llama 3.1 Herd on OctoAI
Text Gen Solution
Partner
LLM

Introducing the Llama 3.1 Herd on OctoAI

Jul 23, 20243 minutes
Quality Evaluation Chart for Llama 3.1, Claude 3.5 Sonnet, and GPT4o, showing Llama 3.1 has equal or better quality than these closed models

OctoAI is excited to offer the full Llama 3.1 herd to our customers, featuring models with 8 billion, 70 billion, and a groundbreaking 405 billion parameters. These latest models include a host of new developer features to increase quality and ease-of-use, including native tool-calling functionality, multi-language support, and up to a 128k token context window. 

Unlocking New Use Cases with Llama 3.1 Features

In the 96 days since the release of Llama 3, hundreds of OctoAI customers like Hyperwrite deployed it at-scale, in production, and we fully expect the same with this latest batch of models. The release of Llama 3.1 opens up new capabilities that can help them build smarter, faster, more accurate applications with GenAI. Let’s take a look at some of the most exciting updates:

Context Window 

Having longer context windows allows LLMs to ingest much more context information at once to produce helpful answers. By increasing context window size from 8k to up to 128k from Llama 3 to Llama 3.1, OctoAI customers can now pass in 16x more information into the model, opening new possibilities for Retrieval Augmented Generation (RAG), document summarization, business intelligence reporting tools and other enterprise applications. 

Multi-Language Support 

The more languages an LLM support, the more reach it has for business applications around the world. Last year, Mixtral broke new heights by supporting five languages. Llama 3.1 supports 8, doubling the languages supported by a state-of-the-art open source LLM, meaning they can serve many more markets and global organizations.

Enhanced Tool Calling Support

The big theme in GenAI is ‘AI agents’, and fundamentally the tech that powers these AI agents is the ability for an LLM to perform tool calling. If an LLM is a brain, tool calling gives that LLM arms and legs to perform actions in the real world: performing web search, retrieving information from a database, sending an SMS, booking a flight, or submitting a code review on GitHub. With its enhanced ability to support tool calls, the Llama 3.1 models enable many powerful agentic workflows to power all of your enterprise needs. 

The Llama 3.1 models bring a suite of powerful tool-calling capabilities we anticipate our customers will want to immediate take advantage of:

  • Brave Search: Conduct web searches seamlessly.

  • Wolfram Alpha: Execute complex mathematical calculations with ease.

  • Code Interpreter: Generate Python code directly.

The Most Capable Open-Source LLM: Meta Llama 3.1 405B

The Llama 3.1 herd includes the largest, most capable open-source LLM available: Llama 3.1 405B. Quality benchmarks indicate that 405B is highly competitive with leading closed-source models including GPT-4o and Claude 3.5 Sonnet in a variety of task types, far outpacing GPT-4 in tasks like code generation, math, and function calling., 

Charts showing a Quality evaluation for Llama 3.1 models with performance that meets or exceeds industry benchmarks

When Meta released Llama 2 in July 2023, there were just north of 16,000 open source LLMs available to developers. In the intervening 12 months, this number has skyrocketed 10-fold, and the state-of-the-art has taken a massive leap forward. As each successive model climbs the quality leaderboards, more companies choose to transition from proprietary model APIs to open source for greater privacy, cost savings, and model transparency. 

There are still a stubborn few use cases where open source quality hasn’t met the bar. Based on these promising benchmarks, Llama 3.1 405B may well clear the final hurdle, eliminating the tradeoff between privacy and quality that has held back GenAI adoption in industries like healthcare and finance, where security and compliance are paramount.

Deploying Llama 3.1 405B on OctoStack

With 405 billion parameters, the largest Llama 3.1 model is more compute-intensive compared to any foundation model that’s come before. Complexity on that scale requires a highly optimized AI-systems infrastructure to serve 405B in a performant and cost-efficient manner. Enter OctoStack, an end-to-end serving stack for LLMs which runs in a customer’s VPC. With OctoStack, enterprises can easily self-host a powerful LLM that meets or exceeds GPT-4 quality, without ever transmitting data out of their own environment. 

Enhanced Security with LlamaGuard

Meta is also evolving their Llama Guard model with enhanced prompt customizations and new categories to detect defamation and code interpreter abuse. We look forward to further exploration and experimentation to find out how Llama Guard 3 can enhance GenAI trust and security. 

Get Started with Llama 3.1 on OctoAI Today!

You can start using Llama 3.1 models today. Try out the entire herd of Llama 3.1 models on OctoAI for free today at octoai.cloud. Get in touch to request a no-cost Proof of Concept (POC) for the Llama 405B on OctoStack.

Stay tuned for more updates and explore the future of AI with OctoAI!