Integrations

LlamaIndex Integration

A developer building AI apps can now access highly optimized LLMs and Embeddings models on OctoAI.

Introduction

LlamaIndex strives to help manage the interactions between your language modles and private DataTransfer. If you are building your application and using LlamaIndex you benefit from the vast ecosystem of integrations, and top LLMs amd Embeddings models hosted by OctoAI.

Using OctoAI’s LLMs and LlamaIndex

Get started reviewing more about LlamaIndex, and signing up for a free OctoAI account.

LlamaIndex has both Python and TypScript libraries, and OctoAI is available in the Python SDK.

To use OctoAI LLM endpoints with LlamaIndex start with the code below using Llama 3 8B as the LLM.

1 from os import environ
2 from llama_index.llms.octoai import OctoAI
3 
4 OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")
5 
6 octoai = OctoAI(model="meta-llama-3-8b-instruct", token=OCTOAI_API_KEY)
7 
8 # Using complete
9 response = octoai.complete("Octopi can not play chess because...")
10 print(response)
11 
12 print("\n=====================\n")
13 
14 # Using the chat interface
15 from llama_index.core.llms import ChatMessage
16 
17 messages = [
18     ChatMessage(
19         role="system",
20         content="Below is an instruction that describes a task. Write a response that appropriately completes the request.",
21     ),
22     ChatMessage(role="user", content="Write a short blog about Seattle"),
23 ]
24 response = octoai.chat(messages)
25 print(response)

To use OctoAI Embedding endpoints with llamaindex you can use the code below to get started. We’re using GTE large in the example below (default model).

1 from os import environ
2 from llama_index.embeddings.octoai import OctoAIEmbedding
3 
4 OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")
5 embed_model = OctoAIEmbedding(api_key=OCTOAI_API_KEY)
6 
7 # Single embedding request
8 embeddings = embed_model.get_text_embedding("Once upon a time in Seattle.")
9 assert len(embeddings) == 1024
10 print(embeddings[:10])
11 
12 
13 # Batch embedding request
14 texts = [
15     "Once upon a time in Seattle.", 
16     "This is a test.", 
17     "Hello, world!"
18 ]
19 embeddings = embed_model.get_text_embedding_batch(texts)
20 assert len(embeddings) == 3
21 print(embeddings[0][:10])

If you are using LlamaIndex you can easily switch model provider, and enjoy using models hosted and optimized for scale on OctoAI.

1	from os import environ
2	from llama_index.llms.octoai import OctoAI
3
4	OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")
5
6	octoai = OctoAI(model="meta-llama-3-8b-instruct", token=OCTOAI_API_KEY)
7
8	# Using complete
9	response = octoai.complete("Octopi can not play chess because...")
10	print(response)
11
12	print("\n=====================\n")
13
14	# Using the chat interface
15	from llama_index.core.llms import ChatMessage
16
17	messages = [
18	ChatMessage(
19	role="system",
20	content="Below is an instruction that describes a task. Write a response that appropriately completes the request.",
21	),
22	ChatMessage(role="user", content="Write a short blog about Seattle"),
23	]
24	response = octoai.chat(messages)
25	print(response)

1	from os import environ
2	from llama_index.embeddings.octoai import OctoAIEmbedding
3
4	OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")
5	embed_model = OctoAIEmbedding(api_key=OCTOAI_API_KEY)
6
7	# Single embedding request
8	embeddings = embed_model.get_text_embedding("Once upon a time in Seattle.")
9	assert len(embeddings) == 1024
10	print(embeddings[:10])
11
12
13	# Batch embedding request
14	texts = [
15	"Once upon a time in Seattle.",
16	"This is a test.",
17	"Hello, world!"
18	]
19	embeddings = embed_model.get_text_embedding_batch(texts)
20	assert len(embeddings) == 3
21	print(embeddings[0][:10])