Text Gen Fine-tuning CLI Guide
Fine-tune with OctoAI's CLI Tools
CLI Guide
This guide will walk you through creating a fine-tuned LoRA using our Command Line Tools to upload text file assets, create a tune, set the hyperparameters, monitor the performance, and then use that LoRA to run an inference with our Text Gen service once it’s ready.
Please refer to the Text Gen Fine-tuning CLI Reference for more details on each parameter in the fine-tuning CLI. Additionally, our Asset Library API reference provides further specifics on using various asset methods.
Visit our pricing page and pricing plan to learn more about how fine-tuned model training and usage are billed.
OctoAI now supports creating fine-tuned LoRAs for the following models:
- Meta-Llama-3.1-8b-Instruct (8K token context)
- Meta-Llama-3.1-70b-Instruct (8K token context) (contact us for access)
Requirements
- Please create an OctoAI API token if you don’t have one already.
- Please also verify that you’ve completed the steps in the CLI installation section of the Installation Guide.
- Use the
OCTOAI_TOKEN
as an environment variable for your token. - An account and API token are required for all the following steps.
High-Level Steps to Create a Fine-Tuned LoRA
Assuming the dataset is already prepared in the required format (see the next section for an example), you can fine-tune with OctoAI in three steps:
- Asset Create (to upload your data)
- Tune Create (to make the fine-tune)
- Inference (supplying your fine-tune in your Chat Completion request to OctoAI)
Here are the three steps all presented together for fine tuning Llama-3.1 8B and Chat Completion data:
Once fine-tuning is complete, the tune is ready for inference. Note: You can specify the name of your fine-tuning in the “peft” field. Please ensure the name you use matches your completed fine-tuning.
1. Prepare and Upload Datasets
Prepare Datasets
We accept JSONL files in which each line is a JSON object. Prepare your JSONL datasets according to the following requirements. Deviating from these requirements will result in upload failure.
- Format: Each line in your JSONL file must be a valid JSON object, conforming to the chat completion or completion format outlined in OctoAI docs.
- Size: Datasets should be under 100MB and limited to 20,000 examples.
See the following for examples of different data formats:
a. Chat Completion Format Examples:
Each message object has a role (either system, user, or assistant) and content.
- The system message (optional): Can be used to set the behavior of the assistant.
- The user messages (required): Provide requests or comments for the assistant to respond to.
- The assistant messages (required): Store previous assistant responses, but can also be written by you to give examples of desired behavior.
- Tool calls (optional): Allow for triggering specific actions or functions within the assistant or integrating external functionalities to enhance the conversation.
Here are some examples in chat completion format.
Single-turn example:
Multi-turn example:
b. Completion Format Example:
(Note that if you fine-tune using Completions format, you must be sure to use our /completions API, not the /chat/completions API. Mismatching these can lead to poor quality outputs.)
Upload Datasets
Run the following command to upload the asset. Wait for the asset’s status to change to “uploaded”.
Use octoai asset list
to check asset status. The latest asset appears at the top.
Check the status of the dataset; it should be “uploaded”. If not, review the error code, make necessary changes, and retry. Wait until they are marked as “uploaded” before moving to the next step.
2. Start a Text Generation Fine-tuning Job
To start a fine-tuning job, several elements need to be provided:
-
name (required): The name of the fine-tuning task. It is unique for each tune.
-
tune type (required): Set
tune_type
to ‘text_tune’ for text generation. -
checkpoint (required): Specify a checkpoint for a particular model. Presently, we support only Llama-3.1 8B and Llama-3.1 70B with an 8K token context. For Llama 3.1 8B Instruct, use
"octoai:meta-llama-3-1-8b-instruct"
. For Llama 3.1 70B Instruct, use"octoai:meta-llama-3-1-70b-instruct"
. Please contact us if you want to use Llama 3.1 70B. -
dataset (required): Specify the dataset name or ID from the asset library. OctoAI also provides 3 sample datasets:
“octoai:golden-gate-5k”
,“octoai:ai-medical-chatbot-20k”
,“octoai:ai-medical-chatbot-1k”
. -
hyperparameters (optional): OctoAI will default to the following hyperparameter values: Learning rate = 0.0001, LoRA rank = 8, Epochs = 3. We recommend leaving these as default unless you have expertise. Available LoRA ranks are 8 (default), 16, and 32 (deeper LoRA ranks can increase learning but can also result in overfitting or loss of the model’s original knowledge). Note that increasing epochs also increases the amount you will be billed.
-
Weights and Biases parameters (optional): Enable with “—wandb-project-name” and “—wandb-secret-key”. To use this feature, sign up for Weights and Biases and follow the Weights & Biases Setup Guide.
Use this command to start a tune:
Job Status: Get a list of all tunes and their statuses using octoai tune list
. The status is “pending” while waiting in the queue, “running” when processing begins, and “succeeded” upon successful completion of fine-tuning.
Job Management: You can cancel the fine-tuning job halfway while it is running and still get a LoRA. When you delete the job, the job will be deleted from the tune list and the LoRA will still exist in the asset library.
Evaluate LoRA Performance: Navigate to your Weights & Biases dashboard in your own project to see the metrics and charts.The following image is a screenshot of Weight & Biases dashboard.
When the job is completed, it becomes a LoRA model. Check your LoRA using octoai asset list
.
3. Run an Inference with the Tuned LoRA
Once you see the status of your LoRA model is “ready” and fine-tuning status is “succeeded”, you can use it for inference. To do so, you’ll need to specify the right “model” and “peft” arguments. A “PEFT” stands for “Parameter-efficient fine-tuning”, and LoRAs are a type of “PEFT”.
- Input your
OCTOAI_TOKEN
for authorization. - In the “model” parameter, use “meta-llama-3.1-8b-instruct” for Llama 3.1 8B. For Llama 3.1 70B, use “meta-llama-3.1-70b-instruct”. Contact us if you want to use Llama 3.1 70B.
- In the
"peft"
field, use thetune_name
of the fine-tuning job. You can useoctoai tune list
to find a list of tuned jobs. - For details of other parameters, refer to the Inference Guide.
Example of Chat Completion Request:
If you used completion data during fine-tuning, you will need to send a completion inference request. Change the "messages"
to "prompt": "<Your Prompt>"
.
If you were able to fine-tune on Llama 3.1 70B, you will need to inference on Llama 3.1 70B LoRA. To run an inference using this LoRA, set the “model” parameter in your inference request to “meta-llama-3.1-70b-instruct”. For Llama 3.1 70B fine-tuning access, please contact us.