The latest version of Meta AI's Llama series, Llama 3.1, represents a significant leap forward in open-source language modeling. With support for eight languages, expanded context length, and its massive 405 billion parameter architecture on its larger model, Llama 3.1 is poised to revolutionize how developers build AI-powered applications (Meta AI Blog). One of the model's most exciting features is its built-in support for tool use and function calling, enabling more sophisticated and dynamic interactions. In this post, we'll explore what this is and how to use these capabilities that are available today in OctoAI endpoints.
Llama 3.1's built-in tools and function calling
Llama 3.1 comes with three built-in tools:
Web search with Brave Search.
Mathematical reasoning with Wolfram Alpha.
Code generation and interpretation with Python.
These tools allow the model to tap into external knowledge and perform complex computations, greatly enhancing its problem-solving abilities. For instance, the search tool can retrieve up-to-date information from the web to answer questions, while the Wolfram Alpha integration enables the model to use the Wolfram Alpha API to solve intricate mathematical problems. Likewise, the model is better prepared to answer questions that require code generation, specifically for Python.
Unlocking new possibilities
The combination of Llama 3.1's built-in tools and function calling, along with OctoAI's deployment and inference capabilities, opens up a vast range of possibilities for building AI-powered applications. Here are a few examples of what you might build:
AI Assistants: Create personalized AI assistants that can answer questions, perform complex tasks, and interact with external services on the user's behalf.
Business Intelligence: Build automated workflows that leverage the model's tool use and function calling capabilities to streamline business processes, like the creation of reports, or the analysis of databases. These models are great at those SQL queries.
Knowledge-Driven Applications: Develop applications that tap into real-time data to provide users with up-to-date and insightful information.
What does built-in tool mean?
Llama built-in tool support means that the model has been fine-tuned to more accurately make use of these functions whenever an interaction could benefit from it. You are still in charge of making a function implementation that runs locally, perhaps on the application’s backend server, that produces valid results for the given query. We have come up with a series of examples, which we will explore below in more detail.
New tools, same interface
You can easily take advantage of Llama 3.1’s built-in tools and function calling capabilities to build advanced AI applications using OctoAI. You may have heard that using built-in tools requires the decoding of special tokens or handling a new chat template. With OctoAI there is no need to worry about this. We have dealt with these low level implementation details for you. Calling a built-in tool is as easy as passing the name of the desired tool as part of the parameters of the request:
Code examples
With this new capability it can be daunting to get started. Let’s take a look at simple examples below that you can use to understand the potential of these functions.
Using Brave Search
Using the Brave Search tool is a powerful way to provide the model with real-time information. You can think of this as Retrieval Augmented Generation (RAG) using the context of the whole internet. For example, the training corpus of the model would not have contained any information about a major IT outage that occurred during the month of July of 2024:
However with the use of web search the model can craft a query to run-pass a search engine’s API, like the Brave Search API. The responses from the API can then be used to formulate the last response:
As a reference, our local implementation of the web search tool looks like this:
Using Wolfram Alpha
Using the Wolfram Alpha tool is a very simple way to enable the model to handle queries that require complex mathematical calculations, like equation resolution or prime number factorization:
We have truncated the output as the response is clearly wrong. Now let’s take a look at a response that benefits from an API call to the Wolfram Alpha API:
As a reference, our local implementation of the Wolfram Alpha tool looks like this:
Using Code Interpreter
Using the Code Interpreter tool is an effective way to avoid the model hallucinating responses where precision is an absolute requirement. Consider this example of a bank assistant, where a user is asking about the total interest that will be paid over their new mortgage:
However we know this calculation is wrong. When activating the use of Code Interpreter the model will generate a snippet of Python code, that will be executed by the application. The output of the code run will then be used by the LLM to produce its last response. This is an example of the code generated by the LLM to solve the problem above:
Now let’s take a look at the same interaction above with Code Interpreter enabled:
As you can see this is quite a difference between paying $116,800 and $249,064.70 of interest!
Code examples and more resources
In this post we have focused on the high level capabilities of this new feature. If you want to replicate this or get started with your own application then we have the following resources to guide you along the way:
There you will find guidelines and reference designs to get you using built-in tools today!
Conclusion
Llama 3.1's built-in tools and function calling capabilities represent a major breakthrough in language modeling, enabling developers to build more powerful and dynamic AI applications. By leveraging these features with OctoAI endpoints you can use the same API you already know and love, and start unlocking today new possibilities for automating complex tasks, providing personalized assistance, and driving business value with AI.
Did we miss anything? Is there something you want to chat to us about? Join the conversation on either of the following platforms:
Reply to our launch post in X
Comment on our launch post in LinkedIn
Give a thumbs up or ask a question in our launch video in YouTube