Structured Output¶

For APIs that has supported specifying response_format as a Pydantic object, as in OpenAI, this argument can be directly used within APPL's gen function. Otherwise, you can use Instrcutor that supports similar functionality with another argument response_model.

Get Started¶

Let's use the example from the instructor and implement in two ways:

from pydantic import BaseModel

from appl import gen, ppl


# Define your desired output structure
class UserInfo(BaseModel):
    name: str
    age: int


@ppl
def get_user_info() -> UserInfo:
    # Extract structured data from natural language
    "John Doe is 30 years old."
    return gen(response_format=UserInfo).results


@ppl
def get_user_info_instructor() -> UserInfo:
    # Extract structured data from natural language
    "John Doe is 30 years old."
    return gen(response_model=UserInfo).results


print("Using response_format:")
user_info = get_user_info()

print(user_info.name)
# > John Doe
print(user_info.age)
# > 30

try:
    import instructor

    print("Using Instructor's response_model:")
    user_info = get_user_info_instructor()

    print(user_info.name)
    # > John Doe
    print(user_info.age)
    # > 30
except (ImportError, ModuleNotFoundError):
    print("Instructor is not installed, skipping instructor example.")

Usage: Choices¶

One common use case of structured output is to make the response choose from a set of options. For example,

@ppl
def answer(question: str):
    "Answer the question below."
    question
    return gen(response_format=Literal["Yes", "No"])

Usage: Thoughts¶

Or extend with thoughts before the answer:

from typing import Literal

from pydantic import BaseModel, Field

from appl import gen, ppl


class Answer(BaseModel):
    # You can use the description annotaion to guide the generation of the structured output.
    thoughts: str = Field(..., description="The thoughts for thinking step by step.")
    answer: Literal["Yes", "No"]


@ppl
def answer(question: str):
    "Answer the question below."
    question
    # Use response_obj to retrieve the generation results will give correct type hint.
    return gen(response_format=Answer).response_obj


ans = answer(
    "The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1."
)
print(f"Thoughts:\n{ans.thoughts}")
print(f"Answer: {ans.answer}")

An example output will be:

Thoughts:
1. **Identify the Odd Numbers:**
   - 15
   - 5
   - 13
   - 7
   - 1   

2. **Add the Odd Numbers Together:**
   \[
   15 + 5 + 13 + 7 + 1 = 41
   \]

3. **Determine Whether the Sum is Even or Odd:**
   - 41 is an odd number because it cannot be evenly divided by 2.

4. **Conclusion:** The odd numbers in the given group add up to an odd number, not an even number.
Answer: No

With Streaming¶

For response_format, the streaming is captured and displayed, and the returned object is a complete object. For response_model, you need to make the response_model a Partial or Iterable object so that they can be streamed, and the response object is a generator that yields partial objects.

Let's slightly modify the example from instructor:

# https://jxnl.github.io/instructor/why/?h=iterable#partial-extraction
from typing import List

try:
    from instructor import Partial

    run_instructor = True
except (ImportError, ModuleNotFoundError):
    run_instructor = False
from pydantic import BaseModel

from appl import gen, ppl


class User(BaseModel):
    name: str
    age: int


class Info(BaseModel):
    users: List[User]


@ppl
def generate_info() -> Info:
    f"randomly generate 10 users."
    return gen(response_format=Info, stream=True).response_obj


print(f"Generated Info: {generate_info()}")
# streaming is displayed but not return a generator object

if run_instructor:

    @ppl
    def generate_info_instructor():
        f"randomly generate 10 users."
        return gen(response_model=Partial[Info], stream=True).response_obj

    print("Generated Info:", generate_info_instructor())
    # streaming is displayed but not return a generator object

It will gradually print the output:

{
  'users': [
    {'name': 'Alice', 'age': 25},
    {'name': 'Bob', 'age': 30},
    {'name': 'Charlie', 'age': 20}
  ]
}