LlamaIndex Integration

A developer building AI apps can now access highly optimized LLMs and Embeddings models on OctoAI.

Introduction

LlamaIndex strives to help manage the interactions between your language modles and private DataTransfer. If you are building your application and using LlamaIndex you benefit from the vast ecosystem of integrations, and top LLMs amd Embeddings models hosted by OctoAI.

OctoAI’s LLMs and Embeddings classes in LlamaIndex

Get started reviewing more about LlamaIndex.

LlamaIndex has both Python and TypScript libraries. OctoAI is available in the Python SDK.

Using the LlamaIndex integration class for OctoAI LLMs

To use OctoAI LLM endpoints with LlamaIndex start with the code below using Llama 3 8B as the LLM.

1from os import environ
2from llama_index.llms.octoai import OctoAI
3
4OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")
5
6octoai = OctoAI(model="meta-llama-3.1-70b-instruct", token=OCTOAI_API_KEY)
7
8# Using complete
9response = octoai.complete("Octopi can not play chess because...")
10print(response)
11
12print("\n=====================\n")
13
14# Using the chat interface
15from llama_index.core.llms import ChatMessage
16
17messages = [
18 ChatMessage(
19 role="system",
20 content="Below is an instruction that describes a task. Write a response that appropriately completes the request.",
21 ),
22 ChatMessage(role="user", content="Write a short blog about Seattle"),
23]
24response = octoai.chat(messages)
25print(response)

Using the LlamaIndex integration class for OctoAI Embeddings

To use OctoAI Embedding endpoints with llamaindex you can use the code below to get started. We’re using GTE large in the example below (default model).

1from os import environ
2from llama_index.embeddings.octoai import OctoAIEmbedding
3
4OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")
5embed_model = OctoAIEmbedding(api_key=OCTOAI_API_KEY)
6
7# Single embedding request
8embeddings = embed_model.get_text_embedding("Once upon a time in Seattle.")
9assert len(embeddings) == 1024
10print(embeddings[:10])
11
12
13# Batch embedding request
14texts = [
15 "Once upon a time in Seattle.",
16 "This is a test.",
17 "Hello, world!"
18]
19embeddings = embed_model.get_text_embedding_batch(texts)
20assert len(embeddings) == 3
21print(embeddings[0][:10])

Building LlamaIndex Agents with OctoAI endpoints

There are different types of agent classes in LlamaIndex. Each one of these classes implement different types of agentic programming patterns. In the following section we will take a look at how to build two types of agents with OctoAI endpoints:

  • ReAct Agents
  • OpenAI Agents
Note that to build agents we will be using the OpenAILike class instead of the OctoAI class used above.

LlamaIndex ReAct Agents with OctoAI

ReAct agents are based in an execution cycle comprised of three steps: Reason, Obersve, Act. This is outlined by the ReAct research paper. One can build ReAct agents in LlamaIndex by using the OpenAILike and ReActAgent classes like so:

1from llama_index.core.agent import ReActAgent
2from llama_index.llms.openai_like import OpenAILike
3
4# Create an llm object to use for the ReActAgent
5llm = OpenAILike(
6 model="meta-llama-3.1-70b-instruct",
7 api_base="https://text.octoai.run/v1",
8 api_key=environ["OCTOAI_API_KEY"],
9 is_function_calling_model=True,
10 is_chat_model=True,
11 temperature=0.4,
12 max_tokens=60000,
13)
14
15# Here we define a list of tools available to the ReAct agent
16tools = [...]
17
18agent = ReActAgent.from_tools(
19 tools,
20 llm=llm,
21 verbose=True,
22 max_turns=10,
23)

Open in Colab Open in Codespaces

ReAct agents are very convenient and popular to create. One consideration to have is that this class in particular does not use the tool’s API for the LLM requests, so prompting may need to be adjusted in order to meet your performance requirements. You can see a fully working example of this pattern in this Jupyter Notebook written by Yujian Tang.

LlamaIndex OpenAI Agents with OctoAI

OpenAI agents are a good class to use to develop agents based on OSS models. In particular, this class makes use of the tools API of our LLM endpoints, which will provide better behavior than a prompt-only approach. One can build OpenAI agents in LlamaIndex by using the OpenAILike and OpenAIAgent classes like so:

1from llama_index.agent.openai import OpenAIAgent
2from llama_index.llms.openai_like import OpenAILike
3
4llm = OpenAILike(
5 model="meta-llama-3.1-70b-instruct",
6 api_base="https://text.octoai.run/v1",
7 api_key=environ["OCTOAI_API_KEY"],
8 is_function_calling_model=True,
9 is_chat_model=True,
10 temperature=0.4,
11 max_tokens=60000,
12)
13
14# we have pre-defined a set of built-in tools for this example
15agent = OpenAIAgent.from_tools(
16 [
17 brave_tool,
18 code_tool,
19 ],
20 llm=llm,
21 system_prompt="You are a helpful AI assistant that can answer questions and run code.Answer questions based on the information returned by the tools, even when they are wrong. You will provide to the user the answers given by the tools.",
22 verbose=True,
23)

Open in Codespaces

OpenAI agents are the prefered way to create LlamaIndex agents using OctoAI LLM endpoints. This will guarantee that your requests will benefit from using the enhances and adaptations distributed through our API. For a fully functioning script of the above example take a look at out Text-Gen Cookbook Recipe.

The details matter

If you take a closer look at the constructor of the agent classses on the snippets above you will notice that we are using quite a few parameters:

1 context_window=10000,
2 is_function_calling_model=True,
3 is_chat_model=True,
4 temperature=0.4,
5 max_tokens=60000,

Setting these parameters is important to guarantee good behavior. Setting a lower temperature, or too few output tokens, will severily hinder the performance of the model.

Learn with our shared resources

  • Learn how to use LLMs and Embedding APIs with OctoAI’s documentation.
  • Take a look at other LlamaIndex Cookbook recipes in GitHub, here.
  • Learn how to build agents, indexes, parse documents, and more, at the LlamaIndex resources here.