Using Structured Outputs (JSON mode) with Text Gen endpoints

Ensure Text Gen outputs fit into your desired JSON schema.

OctoAI’s Large Language Models (LLMs) can generate outputs that not only adhere to JSON format but also align with your unique schema specifications. This guide covers two approaches to JSON mode: OpenAI Compatible JSON mode for Llama-3.1-8B and 70B, and Legacy JSON mode.

**Supported models

  • Llama 3.1 8B
  • Llama 3.1 70B

OpenAI Compatible JSON mode for Llama-3.1-8B and 70B

This section covers the new JSON mode compatible with OpenAI’s new response format standard, specifically for Llama-3.1-8B and 70B models.

Setup

First, set up the OpenAI client and set it to run with OctoAI base and tokens.

1from openai import OpenAI
2import os
3
4client = OpenAI(
5 base_url="https://text.octoai.run/v1",
6 api_key=os.environ["OCTOAI_API_KEY"],
7)
8
9model = "meta-llama-3.1-8b-instruct"

Generate JSON without adhering to any schema (json_object)

If you want the response as a JSON object but without any specific schema:

1import json
2
3def generate_json_object():
4 response = client.chat.completions.create(
5 model=model,
6 messages=[
7 {
8 "role": "system",
9 "content": "Generate a JSON object, without any additional text or comments.",
10 },
11 {"role": "user", "content": "who won the world cup in 2022? answer in JSON"},
12 ],
13 max_tokens=max_tokens,
14 response_format={
15 "type": "json_object",
16 },
17 temperature=0,
18 )
19
20 content = response.choices[0].message.content
21 data = json.loads(content)
22 return data

Generating JSON adhering to schema (without constrained decoding):

For generating JSON that adheres to a simple schema, but without strict (guarenteed) schema following (see the “strict”: False below). This mode is faster and works on both Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct. For most use cases, it is sufficient and recommended.

1from pydantic import BaseModel
2from jsonschema import validate
3
4class Output(BaseModel):
5 answer: str
6
7def generate_json_schema_strict_false():
8 schema = Output.model_json_schema()
9 response = client.chat.completions.create(
10 model=model,
11 messages=[
12 {
13 "role": "system",
14 "content": "Generate a JSON object, without any additional text or comments.",
15 },
16 {"role": "user", "content": "who won the world cup in 2022?"},
17 ],
18 response_format={
19 "type": "json_schema",
20 "json_schema": {"name": "output", "schema": schema, "strict": False},
21 },
22 temperature=0,
23 )
24 content = response.choices[0].message.content
25 data = json.loads(content)
26 validate(instance=data, schema=schema)
27 return data

Generating JSON adhering to schema (with constrained decoding):

When you need strict adherence to a JSON schema, you can activate this mode on Llama-3.1-8b-Instruct only. This is recommended for more complex schemas. Activating this mode can create a latency increase.

1from textwrap import dedent
2
3math_tutor_prompt = """
4 You are a helpful math tutor. You will be provided with a math problem,
5 and your goal will be to output a step by step solution, along with a final answer.
6 For each step, just provide the output as an equation use the explanation field to detail the reasoning.
7"""
8
9question = "how can I solve 8x + 7 = -23"
10
11schema = {
12 "type": "object",
13 "properties": {
14 "steps": {
15 "type": "array",
16 "items": {
17 "type": "object",
18 "properties": {
19 "explanation": {"type": "string"},
20 "output": {"type": "string"},
21 },
22 "required": ["explanation", "output"],
23 "additionalProperties": False,
24 },
25 },
26 "final_answer": {"type": "string"},
27 },
28 "required": ["steps", "final_answer"],
29 "additionalProperties": False,
30}
31
32def generate_json_schema_strict_true():
33 response = client.chat.completions.create(
34 model=model,
35 messages=[
36 {"role": "system", "content": dedent(math_tutor_prompt)},
37 {"role": "user", "content": question},
38 ],
39 response_format={
40 "type": "json_schema",
41 "json_schema": {"name": "math_reasoning", "schema": schema, "strict": True},
42 },
43 temperature=0,
44 )
45 content = response.choices[0].message.content
46 data = json.loads(content)
47 validate(instance=data, schema=schema)
48 return data