Text Gen Python SDK
Use the OctoAI Chat Completion API to easily generate text.
The OctoAI
class allows you to run inferences simply to any model that accepts JSON-formatted inputs as a dictionary, and provides you with all JSON-formatted outputs as a dictionary. The OctoAI
class also supports the Chat Completions API and provides easy access to a set of highly optimized text models on OctoAI.
This guide will walk you through how to select your model of interest, how to call highly optimized text models on OctoAI using the Chat Completions API, and how to use the responses in both streaming and regular modes.
Requirements
- Please create an OctoAI API token if you don’t have one already.
- Please also verify you’ve completed Python SDK Installation & Setup.
- If you use the
OCTOAI_TOKEN
envvar for your token, you can instantiate the OctoAI client withoctoai = OctoAI()
after importing theoctoai
package.
- If you use the
Text Generation
The following snippet shows you how to use the Chat Completions API to generate text using Llama2.
The response is of type octoai.text_gen.ChatCompletionResponse
. If you print the response from this call as in the example above, it looks similar to the following:
Note that billing is based upon “prompt tokens” and “completion tokens” above. View prices on our pricing page.
Streaming Responses
The following snippet shows you how to obtain the model’s response incrementally as it is generated using streaming (using stream=True
).
When using streaming mode, the response is of type Iterable[ChatCompletionChunk]
. To read each incremental response from the model, you can use a for
loop over the returned object. The example above prints each incremental response as it arrives, and they accumulate to form the entire response in the output as the model prediction progresses.
Additional Parameters
To learn about the additional parameters supported by the OctoAI().text_gen.create_chat_completion()
method.