Built-in Tools with Llama 3.1

In this tutorial you will learn how to use “Built-In Tools” as introduced by the Llama 3.1 family of models.

Introduction

Function calling is a feature give LLMs the ability to consider using external code functions to respond to a user query. When the LLM triggers the use of a tool, it sends back a tool message to the application with the name and the parameters of the function to be used. The backend application then uses this inforamtion to execute the function locally. OctoAI models support function calling already, as described in this documentation page.

A new type of Functions

The release of the Llama 3.1 family of models introduced the concept of “Built-in Tools”. The models have enhanced support for a set of functions by default, without extra prompting or fine-tuning. In order to support these, the model was trained using a set of special tags. Using Llama built-in tools is easy with OctoAI. These are supported through our standard tool API, so you don’t need to worry about any low level implementation details.

Let’s take a look in the next section at how to use them. Each section will contain snippets of code that you can copy and test in your environment.

Built-In Tools

These are the built-in tools available in Llama 3.1 models, with their respective code tool name:

Brave Search: brave_search
- Used to perform web searches.
Wolfram Alpha: wolfram_alpha
- Used to perform complex mathematical calculations.
Code Interpeter: code_interpreter
- Used to evaluate the generated Python code.

Built-in support only means that the models are better trained at triggering the use of these functions. The functions still need to be implemented locally. In the following sections we cover how you can trigger each of the Llama 3.1 Built-in tools.

Brave Search Tool

The Brave Search tool gets triggered by the model when the response benefits from a web search of a given query. We will mock the function so we can get up and running quickly.

Using the Brave Search Tool

The following snippet of code shows how to handle a chat interaction that uses the Brave Search tool:

1 import os
2 from openai import OpenAI
3 import json
4 
5 
6 # Brave search definition
7 def brave_search(query: str) -> str:
8     return "Search results: The weather in Boston is Sunny, with 70 degrees Fahrenheit and clear skies."
9 
10 
11 tools = [
12     {"type": "function", "function": {"name": "brave_search"}},
13 ]
14 
15 
16 client = OpenAI(
17     base_url="https://text.octoai.run/v1",
18     api_key=os.environ["OCTOAI_API_KEY"],
19 )
20 model = "meta-llama-3.1-8b-instruct"
21 
22 messages = [
23     {
24         "role": "user",
25         "content": "what is the current weather like in Boston?",
26     },
27 ]
28 
29 # First LLM inference
30 completion = client.chat.completions.create(
31     model=model,
32     messages=messages,
33     temperature=0.1,
34     max_tokens=512,
35     tools=tools,
36     tool_choice="auto",
37 )
38 
39 # Append the assistant response to messages
40 assistant_response = completion.choices[0].message
41 messages.append(
42     {
43         "role": "assistant",
44         "content": "",
45         "tool_calls": completion.choices[0].message.tool_calls,
46     }
47 )
48 
49 # Handle function call from tool message
50 tool_call = completion.choices[0].message.tool_calls[0]
51 function_params = json.loads(tool_call.function.arguments)
52 
53 # Compute the results (done by the backend application)
54 function_result = brave_search(**function_params)
55 
56 # Append to the tools response
57 messages.append(
58     {"role": "tool", "content": function_result, "tool_call_id": tool_call.id}
59 )
60 
61 # Second LLM inference
62 completion = client.chat.completions.create(
63     model=model,
64     messages=messages,
65     temperature=0.1,
66     tools=tools,
67     tool_choice="auto",
68     max_tokens=512,
69 )
70 
71 print(completion.choices[0].message.content)

As you can see, we don’t have to specify the parameters of the function, because this is a function with Built-in support. This also means that custom functions can not use the brave_search identifier.

You can expect a final response similar to this:

The current weather in Boston is sunny, with a temperature of 70 degrees Fahrenheit and clear skies.

Wolfram Alpha Tool

The Wolfram Alpha tool gets triggered by the model when the response benefits from querying the Wolfram Alpha API. Let’s mock the function so we can get up and running quickly.

Using the Wolfram Alpha Tool

The following snippet of code shows how to handle a chat interaction that uses the Wolfram Alpha tool:

1 import os
2 from openai import OpenAI
3 import json
4 
5 
6 # Wolfram Alpha definition
7 def wolfram_alpha(query: str) -> str:
8     """
9     Returns a representative response from Wolfram Alpha API
10     """
11     return '{"plaintext": "x = -1"}'
12 
13 
14 tools = [
15     {"type": "function", "function": {"name": "wolfram_alpha"}},
16 ]
17 
18 client = OpenAI(
19     base_url="https://text.octoai.run/v1",
20     api_key=os.environ["OCTOAI_API_KEY"],
21 )
22 model = "meta-llama-3.1-8b-instruct"
23 
24 messages = [
25     {"role": "system", "content": "You are a helpful assistant."},
26     {
27         "role": "user",
28         "content": "what is the solution to the equation x^2 + 2x + 1 =0?",
29     },
30 ]
31 
32 # First LLM inference
33 completion = client.chat.completions.create(
34     model=model,
35     messages=messages,
36     temperature=0.1,
37     max_tokens=512,
38     tools=tools,
39     tool_choice="auto",
40 )
41 
42 # Append the assistant response to messages
43 assistant_response = completion.choices[0].message
44 messages.append(
45     {
46         "role": "assistant",
47         "content": "",
48         "tool_calls": completion.choices[0].message.tool_calls,
49     }
50 )
51 
52 # Handle function call from tool message
53 tool_call = completion.choices[0].message.tool_calls[0]
54 function_params = json.loads(tool_call.function.arguments)
55 
56 # Compute the results (done by the backend application)
57 function_result = wolfram_alpha(**function_params)
58 
59 
60 # Append to the tools response
61 messages.append(
62     {"role": "tool", "content": function_result, "tool_call_id": tool_call.id}
63 )
64 # Second LLM inference
65 completion = client.chat.completions.create(
66     model=model,
67     messages=messages,
68     temperature=0.1,
69     max_tokens=512,
70     tools=tools,
71     tool_choice="auto",
72 )
73 
74 print(completion.choices[0].message.content)

Similarly to the Brave Search tool, we don’t specify the parameters. Also custom functions can not use the wolfram_alpha identifier.

You can expect a final response similar to this:

The solution to the equation x^2 + 2x + 1 = 0 is x = -1.

Code Interpreter Tool

The Code Interpreter tool gets triggered by the model when the response requires executing a snippet of Python code generated by the model itself.

Using the Code Interpreter Tool

The following snippet of code shows how to handle a chat interaction that uses the Code Interpreter tool:

1 import os
2 from openai import OpenAI
3 import json
4 
5 
6 # Code interpreter definition
7 def code_interpreter(code: str) -> str:
8     return "Code executed successfully. Exit code: 0"
9 
10 
11 tools = [
12     {"type": "function", "function": {"name": "code_interpreter"}},
13 ]
14 
15 client = OpenAI(
16     base_url="https://text.octoai.run/v1",
17     api_key=os.environ["OCTOAI_API_KEY"],
18 )
19 model = "meta-llama-3.1-8b-instruct"
20 
21 messages = [
22     {
23         "role": "user",
24         "content": "create a sine wave in python",
25     },
26 ]
27 
28 # First LLM inference
29 completion = client.chat.completions.create(
30     model=model,
31     messages=messages,
32     temperature=0.1,
33     max_tokens=512,
34     tools=tools,
35     tool_choice="auto",
36 )
37 
38 # Append the assistant response to messages
39 assistant_response = completion.choices[0].message
40 
41 # If there are function calls, handle the calls
42 if assistant_response.tool_calls:
43     print("Function call detected")
44     # Append the assistant response to messages
45     messages.append(
46         {
47             "role": "assistant",
48             "content": "",
49             "tool_calls": assistant_response.tool_calls,
50         }
51     )
52 
53     # Get tool call information
54     tool_call = assistant_response.tool_calls[0]
55     function_name = tool_call.function.name
56     function_params = json.loads(tool_call.function.arguments)
57 
58     # Print the code created
59     print("=================================")
60     print("Code to be executed:")
61     print(function_params["code"])
62     print("=================================")
63 
64     # Call the function
65     function_result = code_interpreter(**function_params)
66 
67     # Append to the tools response
68     messages.append({"role": "tool", "content": function_result})
69 
70     # Second LLM inference
71     completion = client.chat.completions.create(
72         model=model,
73         messages=messages,
74         temperature=0.1,
75         max_tokens=512,
76         tools=tools,
77         tool_choice="auto",
78     )
79     assistant_response = completion.choices[0].message
80 
81 print(assistant_response.content)

Similarly to the Wolfram Alpha tool, one does not require to define the parameters of the function. Also, custom functions can not use the code_interpreter identifier.

From this request you can expect the model to generate the following code:

1 import numpy as np
2 import matplotlib.pyplot as plt
3 
4 # Create an array of x values from 0 to 4π
5 x = np.linspace(0, 4 * np.pi, 1000)
6 
7 # Create a sine wave with amplitude 1 and frequency 1
8 y = np.sin(x)
9 
10 # Create a plot of the sine wave
11 plt.plot(x, y)
12 
13 # Add title and labels
14 plt.title('Sine Wave')
15 plt.xlabel('x')
16 plt.ylabel('sin(x)')
17 
18 # Display the plot
19 plt.show()

With our mocked function you can expect the final response to be like this:

This code creates a sine wave with amplitude 1 and frequency 1, and plots it using matplotlib. The `np.linspace(0, 4 * np.pi, 1000)` function creates an array of 1000 x values from 0 to 4π, and the `np.sin(x)` function calculates the corresponding y values. The `plt.plot(x, y)` function creates the plot, and the `plt.title()`, `plt.xlabel()`, and `plt.ylabel()` functions add a title and labels to the plot. Finally, the `plt.show()` function displays the plot.

Implementation Notes

Care must be taken in order to handle the possiblity of the model calling the code_interpreter at any point if any of the other buil-in tools are active. This is expected behavior and your implementation needs to handle this case.

You are in charge of doing the final implementation of these functions. This provides interesting opportunities to create new and innovative experiences. Still, we will be providing examples of implementations of these functions for the default cases in our Text-Gen Cookbook repository soon.

Conclusion

In this tutorial we have seen how to use Llama 3.1’s Built-in Tools. You can easily take advantage of them using OctoAI’s convenient API, wihtout having to worry about low-level implementation or cumbersome tool definitions.

For more examples and reference designs take a look at our Text-Gen Cookbook repository in GitHub, or for more inspiration browse through our demo pages.

1	import os
2	from openai import OpenAI
3	import json
4
5
6	# Brave search definition
7	def brave_search(query: str) -> str:
8	return "Search results: The weather in Boston is Sunny, with 70 degrees Fahrenheit and clear skies."
9
10
11	tools = [
12	{"type": "function", "function": {"name": "brave_search"}},
13	]
14
15
16	client = OpenAI(
17	base_url="https://text.octoai.run/v1",
18	api_key=os.environ["OCTOAI_API_KEY"],
19	)
20	model = "meta-llama-3.1-8b-instruct"
21
22	messages = [
23	{
24	"role": "user",
25	"content": "what is the current weather like in Boston?",
26	},
27	]
28
29	# First LLM inference
30	completion = client.chat.completions.create(
31	model=model,
32	messages=messages,
33	temperature=0.1,
34	max_tokens=512,
35	tools=tools,
36	tool_choice="auto",
37	)
38
39	# Append the assistant response to messages
40	assistant_response = completion.choices[0].message
41	messages.append(
42	{
43	"role": "assistant",
44	"content": "",
45	"tool_calls": completion.choices[0].message.tool_calls,
46	}
47	)
48
49	# Handle function call from tool message
50	tool_call = completion.choices[0].message.tool_calls[0]
51	function_params = json.loads(tool_call.function.arguments)
52
53	# Compute the results (done by the backend application)
54	function_result = brave_search(**function_params)
55
56	# Append to the tools response
57	messages.append(
58	{"role": "tool", "content": function_result, "tool_call_id": tool_call.id}
59	)
60
61	# Second LLM inference
62	completion = client.chat.completions.create(
63	model=model,
64	messages=messages,
65	temperature=0.1,
66	tools=tools,
67	tool_choice="auto",
68	max_tokens=512,
69	)
70
71	print(completion.choices[0].message.content)

1	import numpy as np
2	import matplotlib.pyplot as plt
3
4	# Create an array of x values from 0 to 4π
5	x = np.linspace(0, 4 * np.pi, 1000)
6
7	# Create a sine wave with amplitude 1 and frequency 1
8	y = np.sin(x)
9
10	# Create a plot of the sine wave
11	plt.plot(x, y)
12
13	# Add title and labels
14	plt.title('Sine Wave')
15	plt.xlabel('x')
16	plt.ylabel('sin(x)')
17
18	# Display the plot
19	plt.show()