OctoAI Logo
Sign up
Log in
Sign up
Log in
Home
Models
Text Generation

Llama 2 API Endpoint

The world’s most popular open LLM model, released by Meta in July 2023, served via a fast and affordable developer API. Also used for source code generation and instruction following.

Run Model

Advice from ML Experts

The way a language model communicates can be shaped by different settings in chat configurations. Let's consider 'temperature' as an example. When you decrease the temperature, you're essentially making the model's responses less random. This comes in handy for tasks like answering questions precisely. On the other hand, cranking up the temperature can prevent the responses from becoming too repetitive and can lead to more imaginative outputs. A temperature value around 0.7 is often a good starting point as it strikes a nice balance between controlled, focused responses and sparking creativity.

Supported Model Variants:

Llama 2 13B Chat

Llama 2 70B Chat

Code Llama (coming soon)

Your Llama 2 variant

License:

Meta
from octoai.client import Client

client = Client()
completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Keep your responses limited."
        },
        {
            "role": "user",
            "content": "Hello world"
        }
    ],
    model="llama-2-13b-chat-fp16",
    max_tokens=1000,
    presence_penalty=0,
    temperature=1,
    top_p=1,
)

Token generation at human speed

OctoAI offers easy multi-GPU inference for the largest models like Llama 2 70B to unlock the most capable models while also offering quantized versions of all base models for faster and lower-cost applications.

OctoAI allows customers to run Llama 2 70B with a variety of options for hitting latency and quality targets.

Your data's accuracy, our blazing speeds

Lightning-fast runtime LLM behavior modification through the application of LoRAs. Contact us to accelerate your checkpoints.

Fine tune Llama 2 for your use-case

You can use pre-finetuned models on OctoAI, like: Llama Chat, fine-tuned on public instruction datasets. Or, coming soon to OctoAI, Code Llama, trained on 500B tokens of code in Python, C++, Java, Javascript, C#, and Bash.

Llama 2 on OctoAI features

FeaturesOctoAI

Bring your own fine tunes and checkpoints

Coming soon via fine tuning

Token based pricing and metering

Coming soon on all models

Llama model history

Large Language Model Meta AI (LLaMA)

The open source (for research) LLM released on Feb. 24, 2023 by Meta which offered superior quality to GPT 3. LlaMA was offered in several sizes: 7B, 13B, 33B, and 65B.

Llama 2

In July 2023 Meta released several models as Llama 2 using 7, 13, and 70 billion parameters. Unlike LlaMA, Llama 2 is open source and available for commercial use.

Code Llama

Meta released Code Llama, an AI tool for coding on Aug. 24, 2023. It is built on top of Llama 2, and fine-tuned for generating and discussion code. Like Llama 2 it is available for commercial use.