At a Glance

For a quick glance at the text generation features supported by the TypeScript SDK, please see the client.chat and client.completions SDK docs.

The Client class in our Example Models on TypeScript SDK examples allows you to simply run inferences to any model that accepts JSON-formatted inputs as an object, and provides you with all JSON-formatted outputs as an object or StreamReady for the inferStream method.

The TypeScript SDK supports both the Chat Completions API and the Completions API through the client.chat.completions and client.completions classes respectively. Each interface supports streaming and non-streaming inferences. Streaming can be enabled by simply setting stream to true in your create method calls. Additionally, if you have a custom endpoint that you would like to call that adheres to either the Chat Completions or Completions interface, you can provide it in your create calls. If no custom endpoint is provided, inferences will default to OctoAI’s text gen endpoint. More in-depth examples are shown further on in this guide.

This guide will walk you through the simplest version of using this API and using streaming in the TypeScript SDK. As with the Text Gen API, there are additional parameters you can pass to the create request including frequency_penalty, max_tokens, presence_penalty, stop, temperature, top_p, etc. to have a finer gradient of control over your response as well.

Requirements

  • Please follow How to create an OctoAI API token if you don’t have one already.
  • Please also verify you’ve completed TypeScript SDK Installation & Setup.
    • If you use the OCTOAI_TOKEN envvar for your token, you can instantiate the client with client = new Client(), otherwise you will need to pass an API token using: client = new Client(OCTOAI_TOKEN)

Supported Models

You can get an array of supported models by using the following:

TypeScript
import { Client } from "@octoai/client";

const OCTOAI_TOKEN = process.env.OCTOAI_TOKEN; // Or your token here
const client = new Client(OCTOAI_TOKEN);

const models = client.chat.listAllModels();

These are the values that you can put in the model field in either your chat completion creation requests or completion requests.

Chat Completions API

Non-Streaming Example

The requirements to create a chat completion are to have messages and a model. A simple example looks as follows:

TypeScript
import { Client } from "@octoai/client";

const OCTOAI_TOKEN = process.env.OCTOAI_TOKEN; // Or your token here
const client = new Client(OCTOAI_TOKEN);

const response = await client.chat.completions.create({
  messages: [
    {
      role: "system",
      content:
        "You are a helpful assistant. Keep your responses limited to one short paragraph if possible.",
    },
    {
      role: "user",
      content: "Write `hello world` in TypeScript",
    },
  ],
  model: "codellama-13b-instruct-fp16",
});

console.log(JSON.stringify(response));

This will return a response similar to the following:

JSON
{
  "id": "cmpl-2b32fb0c959d47ab99404059768c056f",
  "object": "chat.completion",
  "created": 1700526187,
  "model": "codellama-13b-instruct-fp16",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure, here's an example of how to write \"Hello, World!\" in TypeScript using the `console.log()` method:\n...",
        "function_call": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 47,
    "completion_tokens": 146,
    "total_tokens": 193
  }
}

The TypeScript SDK returns the JSON mapped to a CreateChatCompletionResponse object, and this type can be used as an interface in your code if desired.

Streaming Example

In the above example, we did not select whether or not to use streaming, and so by default the client will not use streaming. However if you’re working with larger requests and don’t want to store the entire response in memory or want to have a more interactive experience for users, you may opt to use the stream interface instead.

The request is almost identical, with just one minor change: explicitly requesting a stream response.

TypeScript
import { Client } from "@octoai/client";

const OCTOAI_TOKEN = process.env.OCTOAI_TOKEN; // Or your token here
const client = new Client(OCTOAI_TOKEN);

const stream = await client.chat.completions.create({
  messages: [
    {
      role: "system",
      content:
        "You are a helpful assistant. Keep your responses limited to one short paragraph if possible.",
    },
    {
      role: "user",
      content: "Write `hello world` in TypeScript",
    },
  ],
  mode: "codellama-13b-instruct-fp16",
  stream: true,
});

let text = "";

for await (const chunk of stream) {
  text += chunk.choices[0].delta.content;
  console.log(chunk);
}

This will return a Stream<ChatCompletionStreamResponse>. Sparing all the chunks that can be received, there is generally a unique starting and ending chunk. In interest of brevity, let’s review the starting chunk and the ending chunk. In examples that are not the starting or finishing tokens, you’ll typically find a string or token in chunk.choices[0].delta.content. The ChatCompletionStreamResponse can be used as an interface in your code for handling responses as well.

JSON
// Starting chunk, note that content is null and finish_reason is also null.
{
  "id": "cmpl-994f6307a891454cb0f57b7027f5f113",
  "created": 1700527881,
  "model": "codellama-13b-instruct-fp16",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": null
      },
      "finish_reason": null
    }
  ]
}

// Ending chunk, note the finish_reason "length" instead of null.
// This means we reached the max tokens allowed in this request.
{
  "id": "cmpl-994f6307a891454cb0f57b7027f5f113",
  "object": "chat.completion.chunk",
  "created": 1700527881,
  "model": "codellama-13b-instruct-fp16",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "",
        "function_call": null
      },
      "finish_reason": "length"
    }
  ]
}

Completions API

The Completions API is available through the client.completions class. The create method has similar customization options to the Chat Completions API, but the key difference is passing in a single prompt string instead of a list of message objects.

A full list of request options and response data for completion calls can be found on the create method page of the SDK docs.

Non-Streaming Example

Since the TypeScript SDK defaults to non-streaming responses, a bare minimum text gen example only requires the model you would like to call and the prompt you would like to use for inference. Take note that the response is structured differently than the Chat Completions API as well. The generated text for non-streamed responses is found under choices[0].text instead of choices[0].message.content.

TypeScript
import { Client } from "@octoai/client";

const client = new Client(process.env.OCTOAI_TOKEN);

const response = await client.completions.create({
  model: "llama-2-13b-chat-fp16",
  prompt: "Write a blog about Seattle",
});

console.log(response.choices[0].text);
// "Seattle is a vibrant and eclectic city..."

Streaming Example

Streamed responses are easily enabled by setting the stream flag to true in your create calls. You can then construct the final generated output by concatenating the text from each individual chunk. Again, take note that the chunks are structured a bit differently than chat completion chunks. A chunk’s text content is available through chunk.choices[0].text instead of chunk.choices[0].delta.content.

TypeScript
import { Client } from "@octoai/client";

const client = new Client(process.env.OCTOAI_TOKEN);

const stream = await client.completions.create({
  model: "llama-2-13b-chat-fp16",
  prompt: "Write a blog about Seattle",
  stream: true,
});

for await (const chunk of stream) {
  text += chunk.choices[0].text;
}

stream.controller.abort();

console.log(text);
// "Seattle is a vibrant and eclectic city..."