Built-in Tools with Llama 3.1

In this tutorial you will learn how to use “Built-In Tools” as introduced by the Llama 3.1 family of models.

Introduction

Function calling is a feature give LLMs the ability to consider using external code functions to respond to a user query. When the LLM triggers the use of a tool, it sends back a tool message to the application with the name and the parameters of the function to be used. The backend application then uses this inforamtion to execute the function locally. OctoAI models support function calling already, as described in this documentation page.

A new type of Functions

The release of the Llama 3.1 family of models introduced the concept of “Built-in Tools”. The models have enhanced support for a set of functions by default, without extra prompting or fine-tuning. In order to support these, the model was trained using a set of special tags. Using Llama built-in tools is easy with OctoAI. These are supported through our standard tool API, so you don’t need to worry about any low level implementation details.

Let’s take a look in the next section at how to use them. Each section will contain snippets of code that you can copy and test in your environment.

Built-In Tools

These are the built-in tools available in Llama 3.1 models, with their respective code tool name:

  • Brave Search: brave_search
    • Used to perform web searches.
  • Wolfram Alpha: wolfram_alpha
    • Used to perform complex mathematical calculations.
  • Code Interpeter: code_interpreter
    • Used to evaluate the generated Python code.

Built-in support only means that the models are better trained at triggering the use of these functions. The functions still need to be implemented locally. In the following sections we cover how you can trigger each of the Llama 3.1 Built-in tools.

Brave Search Tool

The Brave Search tool gets triggered by the model when the response benefits from a web search of a given query. We will mock the function so we can get up and running quickly.

Using the Brave Search Tool

The following snippet of code shows how to handle a chat interaction that uses the Brave Search tool:

1import os
2from openai import OpenAI
3import json
4
5
6# Brave search definition
7def brave_search(query: str) -> str:
8 return "Search results: The weather in Boston is Sunny, with 70 degrees Fahrenheit and clear skies."
9
10
11tools = [
12 {"type": "function", "function": {"name": "brave_search"}},
13]
14
15
16client = OpenAI(
17 base_url="https://text.octoai.run/v1",
18 api_key=os.environ["OCTOAI_API_KEY"],
19)
20model = "meta-llama-3.1-8b-instruct"
21
22messages = [
23 {
24 "role": "user",
25 "content": "what is the current weather like in Boston?",
26 },
27]
28
29# First LLM inference
30completion = client.chat.completions.create(
31 model=model,
32 messages=messages,
33 temperature=0.1,
34 max_tokens=512,
35 tools=tools,
36 tool_choice="auto",
37)
38
39# Append the assistant response to messages
40assistant_response = completion.choices[0].message
41messages.append(
42 {
43 "role": "assistant",
44 "content": "",
45 "tool_calls": completion.choices[0].message.tool_calls,
46 }
47)
48
49# Handle function call from tool message
50tool_call = completion.choices[0].message.tool_calls[0]
51function_params = json.loads(tool_call.function.arguments)
52
53# Compute the results (done by the backend application)
54function_result = brave_search(**function_params)
55
56# Append to the tools response
57messages.append(
58 {"role": "tool", "content": function_result, "tool_call_id": tool_call.id}
59)
60
61# Second LLM inference
62completion = client.chat.completions.create(
63 model=model,
64 messages=messages,
65 temperature=0.1,
66 tools=tools,
67 tool_choice="auto",
68 max_tokens=512,
69)
70
71print(completion.choices[0].message.content)

As you can see, we don’t have to specify the parameters of the function, because this is a function with Built-in support. This also means that custom functions can not use the brave_search identifier.

You can expect a final response similar to this:

The current weather in Boston is sunny, with a temperature of 70 degrees Fahrenheit and clear skies.

Wolfram Alpha Tool

The Wolfram Alpha tool gets triggered by the model when the response benefits from querying the Wolfram Alpha API. Let’s mock the function so we can get up and running quickly.

Using the Wolfram Alpha Tool

The following snippet of code shows how to handle a chat interaction that uses the Wolfram Alpha tool:

1import os
2from openai import OpenAI
3import json
4
5
6# Wolfram Alpha definition
7def wolfram_alpha(query: str) -> str:
8 """
9 Returns a representative response from Wolfram Alpha API
10 """
11 return '{"plaintext": "x = -1"}'
12
13
14tools = [
15 {"type": "function", "function": {"name": "wolfram_alpha"}},
16]
17
18client = OpenAI(
19 base_url="https://text.octoai.run/v1",
20 api_key=os.environ["OCTOAI_API_KEY"],
21)
22model = "meta-llama-3.1-8b-instruct"
23
24messages = [
25 {"role": "system", "content": "You are a helpful assistant."},
26 {
27 "role": "user",
28 "content": "what is the solution to the equation x^2 + 2x + 1 =0?",
29 },
30]
31
32# First LLM inference
33completion = client.chat.completions.create(
34 model=model,
35 messages=messages,
36 temperature=0.1,
37 max_tokens=512,
38 tools=tools,
39 tool_choice="auto",
40)
41
42# Append the assistant response to messages
43assistant_response = completion.choices[0].message
44messages.append(
45 {
46 "role": "assistant",
47 "content": "",
48 "tool_calls": completion.choices[0].message.tool_calls,
49 }
50)
51
52# Handle function call from tool message
53tool_call = completion.choices[0].message.tool_calls[0]
54function_params = json.loads(tool_call.function.arguments)
55
56# Compute the results (done by the backend application)
57function_result = wolfram_alpha(**function_params)
58
59
60# Append to the tools response
61messages.append(
62 {"role": "tool", "content": function_result, "tool_call_id": tool_call.id}
63)
64# Second LLM inference
65completion = client.chat.completions.create(
66 model=model,
67 messages=messages,
68 temperature=0.1,
69 max_tokens=512,
70 tools=tools,
71 tool_choice="auto",
72)
73
74print(completion.choices[0].message.content)

Similarly to the Brave Search tool, we don’t specify the parameters. Also custom functions can not use the wolfram_alpha identifier.

You can expect a final response similar to this:

The solution to the equation x^2 + 2x + 1 = 0 is x = -1.

Code Interpreter Tool

The Code Interpreter tool gets triggered by the model when the response requires executing a snippet of Python code generated by the model itself.

Using the Code Interpreter Tool

The following snippet of code shows how to handle a chat interaction that uses the Code Interpreter tool:

1import os
2from openai import OpenAI
3import json
4
5
6# Code interpreter definition
7def code_interpreter(code: str) -> str:
8 return "Code executed successfully. Exit code: 0"
9
10
11tools = [
12 {"type": "function", "function": {"name": "code_interpreter"}},
13]
14
15client = OpenAI(
16 base_url="https://text.octoai.run/v1",
17 api_key=os.environ["OCTOAI_API_KEY"],
18)
19model = "meta-llama-3.1-8b-instruct"
20
21messages = [
22 {
23 "role": "user",
24 "content": "create a sine wave in python",
25 },
26]
27
28# First LLM inference
29completion = client.chat.completions.create(
30 model=model,
31 messages=messages,
32 temperature=0.1,
33 max_tokens=512,
34 tools=tools,
35 tool_choice="auto",
36)
37
38# Append the assistant response to messages
39assistant_response = completion.choices[0].message
40
41# If there are function calls, handle the calls
42if assistant_response.tool_calls:
43 print("Function call detected")
44 # Append the assistant response to messages
45 messages.append(
46 {
47 "role": "assistant",
48 "content": "",
49 "tool_calls": assistant_response.tool_calls,
50 }
51 )
52
53 # Get tool call information
54 tool_call = assistant_response.tool_calls[0]
55 function_name = tool_call.function.name
56 function_params = json.loads(tool_call.function.arguments)
57
58 # Print the code created
59 print("=================================")
60 print("Code to be executed:")
61 print(function_params["code"])
62 print("=================================")
63
64 # Call the function
65 function_result = code_interpreter(**function_params)
66
67 # Append to the tools response
68 messages.append({"role": "tool", "content": function_result})
69
70 # Second LLM inference
71 completion = client.chat.completions.create(
72 model=model,
73 messages=messages,
74 temperature=0.1,
75 max_tokens=512,
76 tools=tools,
77 tool_choice="auto",
78 )
79 assistant_response = completion.choices[0].message
80
81print(assistant_response.content)

Similarly to the Wolfram Alpha tool, one does not require to define the parameters of the function. Also, custom functions can not use the code_interpreter identifier.

From this request you can expect the model to generate the following code:

1import numpy as np
2import matplotlib.pyplot as plt
3
4# Create an array of x values from 0 to 4π
5x = np.linspace(0, 4 * np.pi, 1000)
6
7# Create a sine wave with amplitude 1 and frequency 1
8y = np.sin(x)
9
10# Create a plot of the sine wave
11plt.plot(x, y)
12
13# Add title and labels
14plt.title('Sine Wave')
15plt.xlabel('x')
16plt.ylabel('sin(x)')
17
18# Display the plot
19plt.show()

With our mocked function you can expect the final response to be like this:

This code creates a sine wave with amplitude 1 and frequency 1, and plots it using matplotlib. The `np.linspace(0, 4 * np.pi, 1000)` function creates an array of 1000 x values from 0 to 4π, and the `np.sin(x)` function calculates the corresponding y values. The `plt.plot(x, y)` function creates the plot, and the `plt.title()`, `plt.xlabel()`, and `plt.ylabel()` functions add a title and labels to the plot. Finally, the `plt.show()` function displays the plot.

Implementation Notes

Care must be taken in order to handle the possiblity of the model calling the code_interpreter at any point if any of the other buil-in tools are active. This is expected behavior and your implementation needs to handle this case.

You are in charge of doing the final implementation of these functions. This provides interesting opportunities to create new and innovative experiences. Still, we will be providing examples of implementations of these functions for the default cases in our Text-Gen Cookbook repository soon.

Conclusion

In this tutorial we have seen how to use Llama 3.1’s Built-in Tools. You can easily take advantage of them using OctoAI’s convenient API, wihtout having to worry about low-level implementation or cumbersome tool definitions.

For more examples and reference designs take a look at our Text-Gen Cookbook repository in GitHub, or for more inspiration browse through our demo pages.