Prerequisites
The creation of this was inspired by chatPDF. We have fully built end-to-end examples on GitHub you can clone and edit.
Environment setup
To run our example app complete the following steps:
Utilize the Llama 2 demo model
Paste the Endpoint URL in a file called
.env
in the root directory of the project
ENDPOINT_URL="https://text.octoai.run/v1/chat/completions"
Get an OctoAI API Token
Paste the OctoAI API key in a file called
.env
in the root directory of the project
OCTOAI_API_TOKEN=<your key here>
Python wrapper for LangChain
The following is a walkthrough of the code in OctoAI's endpoint wrapper for LangChain.
At a high level, we define a Python wrapper class to help developers easily use OctoAI’s LLM endpoints within a LangChain. LangChain is a Python library commonly used to build LLM applications. Our class extends the LLM base class from the LangChain library.
First, our class defines some attributes:
endpoint_url: str = ""
"""Endpoint URL to use."""
task: Optional[str] = None
"""Task to call the model with. Should be a task that returns generated_text."""
model_kwargs: Optional[dict] = None
"""Key word arguments to pass to the model."""
octoai_api_token: Optional[str] = None
The endpoint_url
points to the OctoAI-hosted endpoint for your model. The task refers to the model task/function to call. model_kwargs
are any arguments to pass to the model. octoai_api_token
is the API access token.
Next, the class defines a Config class and root_validator to validate the required environment variables:
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
values["octoai_api_token"] = get_from_dict_or_env(
values, "octoai_api_token", "OCTOAI_API_TOKEN")
values["endpoint_url"] = get_from_dict_or_env(
values, "endpoint_url", "ENDPOINT_URL")
return values
The _llm_type
method returns the model type, which is octoai_cloud_llm
:
@property
def _llm_type(self) -> str:
"""Return the type of the language model."""
return "octoai_cloud_llm"
The _identifying_params
method returns the parameters that identify the model, such as the endpoint, task, and arguments:
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
"endpoint_url": self.endpoint_url,
"task": self.task,
"model_kwargs": self.model_kwargs or {},
}
Finally, the _call
method makes a request to the inference endpoint to generate text:
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
) -> str:
"""
Call out to inference endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
"""
# Prepare the payload
parameter_payload = {"prompt": prompt,
"parameters": self.model_kwargs or {}}
# Prepare the headers
headers = {
"Authorization": f"Bearer {self.octoai_api_token}",
"Content-Type": "application/json",
}
# Send the request
response = requests.post(
self.endpoint_url, headers=headers, json=parameter_payload
)
# Extract the generated text
generated_text = response.json()
# Enforce stop tokens if provided
text = generated_text["generated_text"]
if stop is not None:
text = enforce_stop_tokens(text, stop)
return text
The method constructs the request payload and headers, sends a POST request to the endpoint, and returns the generated text from the response.
Chat app that responds to a user
The following is a code walkthrough for the chatbot app.
First, we import the necessary libraries for logging, environment variables, the OctoAI-hosted LLM, the LangChain library, and LlamaIndex’s LMPredictor:
import logging
import os
import sys
from dotenv import load_dotenv
from langchain.llms.octoai_endpoint import OctoAIEndpoint
from langchain import PromptTemplate, LLMChain
Next, we set the current directory and load environment variables from a .env
file to get credentials for the OctoAI endpoint:
# Get the current file's directory
current_dir = os.path.dirname(os.path.abspath(**file**))
# Change the current working directory
os.chdir(current_dir)
# Load environment variables
load_dotenv()
Then we define a function to handle exiting the program:
def handle_exit():
"""Print a goodbye message and exit the program."""
print("\nGoodbye!\n")
sys.exit(1)
Next, we define the main ask()
function which will interactively ask questions to the model:
def ask():
"""Interactively ask questions to the language model."""
print("Loading...")
# Load necessary values from environment
endpoint_url = os.getenv("ENDPOINT_URL")
# Set up the language model and predictor
llm = OctoAIEndpoint(
endpoint_url=endpoint_url,
model_kwargs={
"model": "llama-2-70b-chat-fp16",
"messages": [
{
"role": "system",
"content": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
}
],
"stream": False,
"max_tokens": 256,
},
)
We load the endpoint URL from the environment, instantiate the OctoAI LLM endpoint, and create an LLMPredictor.
# Define a prompt template
template = "{question}"
prompt = PromptTemplate(template=template, input_variables=["question"])
# Set up the language model chain
llm_chain = LLMChain(prompt=prompt, llm=llm)
We define a prompt template with a {question}
placeholder, create a PromptTemplate, and construct an LLMChain to generate responses.
while True:
# Collect user's prompt
user_prompt = input("\nPrompt: ")
if user_prompt.lower() == "exit":
handle_exit()
# Generate and print the response
start_time = time.time()
response = llm_chain(user_prompt)
end_time = time.time()
elapsed_time = end_time - start_time
response = str(response).lstrip("\n").split("\n")[-1]
print(f"Response({round(elapsed_time, 1)} sec): {response}")
We provide an example prompt/response, then enter a loop to collect user prompts and generate responses until the user exits.
Finally, we call the ask()
function:
if __name__ == "__main__":
ask()
Q & A on a custom PDF app
Below is a code walkthrough for the app that indexes a PDF document and answers questions about that document.
First, we import the necessary libraries:
from dotenv import load_dotenv
from langchain.llms.octoai_endpoint import OctoAIEndpoint as OctoAiCloudLLM
from langchain.embeddings.octoai_embeddings import OctoAIEmbeddings
from langchain.vectorstores import Chroma, FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.text_splitter import CharacterTextSplitter
from PyPDF2 import PdfReader
We import libraries for environment variables, the OctoAI LLM endpoint, Langchain, embeddings, and the LLama index.
Next, we set the current directory and logging level:
# Get the current file's directory
current_dir = os.path.dirname(os.path.abspath(**file**))
# Change the current working directory
os.chdir(current_dir)
# Set logging level to CRITICAL
logging.basicConfig(level=logging.CRITICAL)
We need to load environment variables from a .env file to get credentials for the OctoAI model. We set the logging level to CRITICAL to reduce noise.
Then we define a function to initialize the files directory:
def init():
"""Initialize the files directory."""
if not os.path.exists(FILES):
os.mkdir(FILES)
Next, we define a function to handle exiting the program:
def handle_exit():
"""Handle exit gracefully."""
print("\nGoodbye!\n")
sys.exit(1)
Functions to load a PDF file, create a query engine, and prompt the user to ask questions:
def setup_langchain_environment():
"""
Set up the language model and embeddings.
"""
endpoint_url = os.getenv("ENDPOINT_URL")
if not endpoint_url:
raise ValueError("The ENDPOINT_URL environment variable is not set.")
# Initialize the LLM and Embeddings
llm = OctoAiCloudLLM(
endpoint_url=endpoint_url,
model_kwargs={
"model": "llama-2-70b-chat-fp16",
"messages": [
{
"role": "system",
"content": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
}
],
"stream": False,
"max_tokens": 256,
},
)
embeddings = OctoAIEmbeddings(
endpoint_url="https://instructor-large-f1kzsig6xes9.octoai.run/predict"
)
return llm, embeddings
def extract_text_from_pdf(pdf_path):
"""
Extract text from the given PDF file.
"""
pdf_reader = PdfReader(pdf_path)
return "".join(page.extract_text() or "" for page in pdf_reader.pages)
def interactive_qa_session(file_path):
"""
Interactively answer user questions about the document.
"""
print("Loading...")
raw_text = extract_text_from_pdf(file_path)
text_splitter = CharacterTextSplitter(
separator="\n", chunk_size=400, chunk_overlap=100, length_function=len
)
texts = text_splitter.split_text(raw_text)
llm, embeddings = setup_langchain_environment()
print("Creating embeddings")
document_search = FAISS.from_texts(texts, embeddings)
chain = load_qa_chain(llm, chain_type="stuff")
clear_screen()
print("Ready! Ask anything about the document.")
print("\nPress Ctrl+C to exit.")
try:
from termios import tcflush, TCIFLUSH
tcflush(sys.stdin, TCIFLUSH)
while True:
prompt = input("\nPrompt: ").strip()
if not prompt:
continue
if prompt.lower() == "exit":
handle_exit()
start_time = time.time()
docs = document_search.similarity_search(prompt)
response = chain.run(input_documents=docs, question=prompt)
elapsed_time = time.time() - start_time
print(f"Response ({round(elapsed_time, 1)} sec): {response}\n")
except KeyboardInterrupt:
handle_exit()
We load the selected PDF, instantiate the OctoAI-hosted LLM and a predictor, create embeddings and a ServiceContext, build an index of the document, and construct a query engine to answer questions.
The select_file()
function prompts the user to select a PDF file to process:
def select_file():
"""Select a file for processing."""
...
file_path = os.path.abspath(os.path.join(FILES, files[selection - 1]))
return file_path
Finally, we call the initialization function, prompt the user to select a file, and if a file is selected, start the interactive query session:
if __name__ == "__main__":
# Initialize the file directory
init()
# Prompt user to select a file
file = select_file()
if file:
# Start the interactive query session
ask(file)
else:
print("No files found")
handle_exit()
Share what you build
We are excited to see what you build. Feel free to showcase it in our Discord, and see what other community members are building.