Unstructured.io Integration
The OctoAIEmbedingEncoder is available, so documents parsed with Unstructured can easily be embedded with the OctoAI embeddings endpoint.
Introduction
Unstructured is both an open-source library and an API service. The library provides components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more.
It also provides components to very easily embed these documents. In Unstructured’s jargon this component is called an EmbeddingEncoder. The OctoAIEmbedingEncoder is available, so documents parsed with Unstructured can easily be embedded with the OctoAI embeddings endpoint.
Using the OctoAIEmbeddingEncoder
Before you get started let’s review some concepts. You will need an OctoAI API Token for this integration.
- The
OctoAIEmbeddingEncoder
class connects to the OctoAI Text&Embedding API to obtain embeddings for pieces of text. embed_documents
will receive a list of Elements, and return an updated list which includes the embeddings attribute for each Element.embed_query
will receive a query as a string, and return a list of floats which is the embedding vector for the given query string.num_of_dimensions
is a metadata property that denotes the number of dimensions in any embedding vector obtained via this class.is_unit_vector
is a metadata property that denotes if embedding vectors obtained via this class are unit vectors.
Now, let’s get started with the following code example for how to use OctoAIEmbeddingEncoder
.
You will see the updated elements list (with the embeddings
attribute included for each element),
the embedding vector for the query string, and some metadata properties about the embedding model.
You will need to set an environment variable named OCTOAI_API_KEY
to be able to run this example.
Learn more about our partnership: