Sign up
Log in
Sign up
Log in
Live Webinar: July 11th
Join us for a technical deep dive into using function calling to develop AI agents.
Register now
Back to demos & webinars
Model

Llama 2 Chat

Libraries

LangChain, LlamaIndex

Date Published

Aug 7, 2023

Publisher

Bassem Yacoube

Back to demos & webinars

Build a Q&A app using an LLM

Model

Llama 2 Chat

Libraries

LangChain, LlamaIndex

Date Published

Aug 7, 2023

Publisher

Bassem Yacoube

Learn how to build and end-to-end chatbot and custom question answering app using OctoAI. The app will use OctoAI, LangChain, and Llama Index.

Repo

Prerequisites

The creation of this was inspired by chatPDF. We have fully built end-to-end examples on GitHub you can clone and edit.

Environment setup

To run our example app complete the following steps:

  • Utilize the Llama 2 demo model

  • Paste the Endpoint URL in a file called .env in the root directory of the project

ENDPOINT_URL="https://text.octoai.run/v1/chat/completions"
  • Get an OctoAI API Token

  • Paste the OctoAI API key in a file called .env in the root directory of the project

OCTOAI_API_TOKEN=<your key here>

Python wrapper for LangChain

The following is a walkthrough of the code in OctoAI's endpoint wrapper for LangChain.

At a high level, we define a Python wrapper class to help developers easily use OctoAI’s LLM endpoints within a LangChain. LangChain is a Python library commonly used to build LLM applications. Our class extends the LLM base class from the LangChain library.

First, our class defines some attributes:

endpoint_url: str = ""  
"""Endpoint URL to use."""  
task: Optional[str] = None  
"""Task to call the model with. Should be a task that returns generated_text."""  
model_kwargs: Optional[dict] = None  
"""Key word arguments to pass to the model."""  
octoai_api_token: Optional[str] = None

The endpoint_url points to the OctoAI-hosted endpoint for your model. The task refers to the model task/function to call. model_kwargs are any arguments to pass to the model. octoai_api_token is the API access token.

Next, the class defines a Config class and root_validator to validate the required environment variables:

class Config:  
    """Configuration for this pydantic object."""  
    extra = Extra.forbid  

@root_validator()  
def validate_environment(cls, values: Dict) -> Dict:  
    """Validate that api key and python package exists in environment."""  
    values["octoai_api_token"] = get_from_dict_or_env(  
        values, "octoai_api_token", "OCTOAI_API_TOKEN")  
    values["endpoint_url"] = get_from_dict_or_env(  
        values, "endpoint_url", "ENDPOINT_URL")  
    return values

The _llm_type method returns the model type, which is octoai_cloud_llm:

@property  
def _llm_type(self) -> str:  
    """Return the type of the language model."""  
    return "octoai_cloud_llm"

The _identifying_params method returns the parameters that identify the model, such as the endpoint, task, and arguments:

@property  
def _identifying_params(self) -> Mapping[str, Any]:  
    """Get the identifying parameters."""  
    return {  
        "endpoint_url": self.endpoint_url,  
        "task": self.task,  
        "model_kwargs": self.model_kwargs or {},  
    }

Finally, the _call method makes a request to the inference endpoint to generate text:

def _call(  
    self,  
    prompt: str,  
    stop: Optional[List[str]] = None,  
    run_manager: Optional[CallbackManagerForLLMRun] = None,  
) -> str:  
    """  
    Call out to inference endpoint.
    
    Args:
    prompt: The prompt to pass into the model.  
    stop: Optional list of stop words to use when generating.  

Returns:  
    The string generated by the model.  
"""
# Prepare the payload
parameter_payload = {"prompt": prompt,  
                     "parameters": self.model_kwargs or {}}  

# Prepare the headers  
headers = {
    "Authorization": f"Bearer {self.octoai_api_token}",  
    "Content-Type": "application/json",  
}  

# Send the request  
response = requests.post(  
    self.endpoint_url, headers=headers, json=parameter_payload  
)  

# Extract the generated text
generated_text = response.json()  
# Enforce stop tokens if provided  
text = generated_text["generated_text"]  
if stop is not None:  
    text = enforce_stop_tokens(text, stop)  

return text

The method constructs the request payload and headers, sends a POST request to the endpoint, and returns the generated text from the response.

Chat app that responds to a user

The following is a code walkthrough for the chatbot app.

First, we import the necessary libraries for logging, environment variables, the OctoAI-hosted LLM, the LangChain library, and LlamaIndex’s LMPredictor:

import logging
import os
import sys
from dotenv import load_dotenv
from langchain.llms.octoai_endpoint import OctoAIEndpoint
from langchain import PromptTemplate, LLMChain

Next, we set the current directory and load environment variables from a .env file to get credentials for the OctoAI endpoint:

# Get the current file's directory
current_dir = os.path.dirname(os.path.abspath(**file**))  

# Change the current working directory
os.chdir(current_dir)  

# Load environment variables
load_dotenv()

Then we define a function to handle exiting the program:

def handle_exit():  
    """Print a goodbye message and exit the program."""  
    print("\nGoodbye!\n")  
    sys.exit(1)

Next, we define the main ask() function which will interactively ask questions to the model:

def ask():  
    """Interactively ask questions to the language model."""  
    print("Loading...")
    
    # Load necessary values from environment
    endpoint_url = os.getenv("ENDPOINT_URL")

    # Set up the language model and predictor
    llm = OctoAIEndpoint(
        endpoint_url=endpoint_url,
        model_kwargs={
            "model": "llama-2-70b-chat-fp16",
            "messages": [
                {
                    "role": "system",
                    "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
                }
            ],
            "stream": False,
            "max_tokens": 256,
        },
    )

We load the endpoint URL from the environment, instantiate the OctoAI LLM endpoint, and create an LLMPredictor.


# Define a prompt template
template = "{question}"
prompt = PromptTemplate(template=template, input_variables=["question"])

# Set up the language model chain
llm_chain = LLMChain(prompt=prompt, llm=llm)

We define a prompt template with a {question} placeholder, create a PromptTemplate, and construct an LLMChain to generate responses.


while True:
       # Collect user's prompt
       user_prompt = input("\nPrompt: ")
       if user_prompt.lower() == "exit":
           handle_exit()

       # Generate and print the response
       start_time = time.time()

       response = llm_chain(user_prompt)
       end_time = time.time()
       elapsed_time = end_time - start_time
       response = str(response).lstrip("\n").split("\n")[-1]
       print(f"Response({round(elapsed_time, 1)} sec): {response}") 

We provide an example prompt/response, then enter a loop to collect user prompts and generate responses until the user exits.

Finally, we call the ask() function:

if __name__ == "__main__":  
    ask()

Q & A on a custom PDF app

Below is a code walkthrough for the app that indexes a PDF document and answers questions about that document.

First, we import the necessary libraries:

from dotenv import load_dotenv
from langchain.llms.octoai_endpoint import OctoAIEndpoint as OctoAiCloudLLM
from langchain.embeddings.octoai_embeddings import OctoAIEmbeddings
from langchain.vectorstores import Chroma, FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.text_splitter import CharacterTextSplitter
from PyPDF2 import PdfReader

We import libraries for environment variables, the OctoAI LLM endpoint, Langchain, embeddings, and the LLama index.

Next, we set the current directory and logging level:

# Get the current file's directory
current_dir = os.path.dirname(os.path.abspath(**file**))   

# Change the current working directory
os.chdir(current_dir)   

# Set logging level to CRITICAL
logging.basicConfig(level=logging.CRITICAL)

We need to load environment variables from a .env file to get credentials for the OctoAI model. We set the logging level to CRITICAL to reduce noise.

Then we define a function to initialize the files directory:

def init():  
    """Initialize the files directory."""  
    if not os.path.exists(FILES):  
        os.mkdir(FILES)

Next, we define a function to handle exiting the program:

def handle_exit():  
    """Handle exit gracefully."""  
    print("\nGoodbye!\n")  
    sys.exit(1)

Functions to load a PDF file, create a query engine, and prompt the user to ask questions:

def setup_langchain_environment():
    """
    Set up the language model and embeddings.
    """
    endpoint_url = os.getenv("ENDPOINT_URL")
    if not endpoint_url:
        raise ValueError("The ENDPOINT_URL environment variable is not set.")

    # Initialize the LLM and Embeddings
    llm = OctoAiCloudLLM(
        endpoint_url=endpoint_url,
        model_kwargs={
            "model": "llama-2-70b-chat-fp16",
            "messages": [
                {
                    "role": "system",
                    "content": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
                }
            ],
            "stream": False,
            "max_tokens": 256,
        },
    )
    embeddings = OctoAIEmbeddings(
        endpoint_url="https://instructor-large-f1kzsig6xes9.octoai.run/predict"
    )
    return llm, embeddings
  
  
def extract_text_from_pdf(pdf_path):
    """
    Extract text from the given PDF file.
    """
    pdf_reader = PdfReader(pdf_path)
    return "".join(page.extract_text() or "" for page in pdf_reader.pages)

def interactive_qa_session(file_path):
    """
    Interactively answer user questions about the document.
    """
    print("Loading...")
    raw_text = extract_text_from_pdf(file_path)
    text_splitter = CharacterTextSplitter(
        separator="\n", chunk_size=400, chunk_overlap=100, length_function=len
    )
    texts = text_splitter.split_text(raw_text)

    llm, embeddings = setup_langchain_environment()
    print("Creating embeddings")
    document_search = FAISS.from_texts(texts, embeddings)
    chain = load_qa_chain(llm, chain_type="stuff")

    clear_screen()
    print("Ready! Ask anything about the document.")
    print("\nPress Ctrl+C to exit.")

    try:
        from termios import tcflush, TCIFLUSH
        tcflush(sys.stdin, TCIFLUSH)
        while True:
            prompt = input("\nPrompt: ").strip()
            if not prompt:
                continue
            if prompt.lower() == "exit":
                handle_exit()

            start_time = time.time()
            docs = document_search.similarity_search(prompt)
            response = chain.run(input_documents=docs, question=prompt)
            elapsed_time = time.time() - start_time
            print(f"Response ({round(elapsed_time, 1)} sec): {response}\n")
    except KeyboardInterrupt:
        handle_exit()

We load the selected PDF, instantiate the OctoAI-hosted LLM and a predictor, create embeddings and a ServiceContext, build an index of the document, and construct a query engine to answer questions.

The select_file() function prompts the user to select a PDF file to process:

def select_file():  
    """Select a file for processing."""  
    ...  
    file_path = os.path.abspath(os.path.join(FILES, files[selection - 1]))  
    return file_path

Finally, we call the initialization function, prompt the user to select a file, and if a file is selected, start the interactive query session:

if __name__ == "__main__":
    # Initialize the file directory
    init()  

    # Prompt user to select a file 
    file = select_file()
    if file:
        # Start the interactive query session 
        ask(file)  
    else: 
        print("No files found")
        handle_exit()

Share what you build

We are excited to see what you build. Feel free to showcase it in our Discord, and see what other community members are building.