Make sure you’ve followed the steps in LLM application examples to clone our example repo, create your own OctoAI LLM endpoint, and set up your local environment.

Code walkthrough

Below is an explanation of the code in OctoAI’s endpoint wrapper for for LangChain

At a high level, we define a Python wrapper class to help developers easily use OctoAI’s LLM endpoints within a LangChain. LangChain is a Python library commonly used to build LLM applications. Our class extends the LLM base class from the LangChain library.

First, our class defines some attributes:

endpoint_url: str = ""  
"""Endpoint URL to use."""  
task: Optional[str] = None  
"""Task to call the model with. Should be a task that returns generated_text."""  
model_kwargs: Optional[dict] = None  
"""Key word arguments to pass to the model."""  
octoai_api_token: Optional[str] = None

The endpoint_url points to the OctoAI-hosted endpoint for your model. The task refers to the model task/function to call. model_kwargs are any arguments to pass to the model. octoai_api_token is the API access token.

Next, the class defines a Config class and root_validator to validate the required environment variables:

class Config:  
    """Configuration for this pydantic object."""  
    extra = Extra.forbid  

def validate_environment(cls, values: Dict) -> Dict:  
    """Validate that api key and python package exists in environment."""  
    values["octoai_api_token"] = get_from_dict_or_env(  
        values, "octoai_api_token", "OCTOAI_API_TOKEN")  
    values["endpoint_url"] = get_from_dict_or_env(  
        values, "endpoint_url", "ENDPOINT_URL")  
    return values

The _llm_type method returns the model type, which is octoai_cloud_llm:

def _llm_type(self) -> str:  
    """Return the type of the language model."""  
    return "octoai_cloud_llm"

The _identifying_params method returns the parameters that identify the model, such as the endpoint, task, and arguments:

def _identifying_params(self) -> Mapping[str, Any]:  
    """Get the identifying parameters."""  
    return {  
        "endpoint_url": self.endpoint_url,  
        "task": self.task,  
        "model_kwargs": self.model_kwargs or {},  

Finally, the _call method makes a request to the inference endpoint to generate text:

def _call(  
    prompt: str,  
    stop: Optional[List[str]] = None,  
    run_manager: Optional[CallbackManagerForLLMRun] = None,  
) -> str:  
    Call out to inference endpoint.
    prompt: The prompt to pass into the model.  
    stop: Optional list of stop words to use when generating.  

    The string generated by the model.  
# Prepare the payload
parameter_payload = {"prompt": prompt,  
                     "parameters": self.model_kwargs or {}}  

# Prepare the headers  
headers = {
    "Authorization": f"Bearer {self.octoai_api_token}",  
    "Content-Type": "application/json",  

# Send the request  
response =  
    self.endpoint_url, headers=headers, json=parameter_payload  

# Extract the generated text
generated_text = response.json()  
# Enforce stop tokens if provided  
text = generated_text["generated_text"]  
if stop is not None:  
    text = enforce_stop_tokens(text, stop)  

return text

The method constructs the request payload and headers, sends a POST request to the endpoint, and returns the generated text from the response.