Skip to content

Using Caching and Tracing

APPL supports tracing APPL functions and LM calls to facilitate users to understand and debug the program executions. Both the trace and the persistent LLM caching are useful for reproducing (potentially partial) execution results by loading cached responses of the LM calls, which enables failure recovery and avoids the extra costs of resending these calls. This also unlocks the possibility of conveniently debugging one specific LM call out of the whole program.

Enabling APPL Caching

The persistent LLM caching (default path: ~/.appl/caches/cache.db) is automatically enabled since v0.1.5. LLM calls with temperature 0 will look up the cache first, use the cached responses if found, generate and cache the responses otherwise.

appl.yaml
settings:
  caching:
    enabled: true # default to enable the caching
    folder: "~/.appl/caches" # The folder to store the cache files
    max_size: 100000  # Maximum number of entries in cache
    time_to_live: 43200 # Time-to-live in minutes (30 days)
    cleanup_interval: 1440 # Cleanup interval in minutes (1 day)
    allow_temp_greater_than_0: false # Whether to cache the generation results with temperature to be greater than 0

Enabling APPL Tracing

To enable tracing in APPL, you can set the tracing configuration to true in appl.yaml:

appl.yaml
settings:
  tracing:
    enabled: true
    path_format: <Your custom path for trace files>
    # The default path format is "./dumps/traces/{basename}_{time:YYYY_MM_DD__HH_mm_ss}"
    patch_threading: true # whether to patch `threading.Thread`
    strict_match: true # For loading the trace files, explain later.

Obtaining the Trace File

Then we run the QA example with tracing enabled:

answer_questions.py
from appl import AIRole, gen, ppl
from appl.const import NEWLINE


@ppl(ctx="copy")  # copy the context from caller
def get_answer(question: str):
    question  # append to the prompt
    return gen()  # return as a future object


@ppl  # marks APPL function
def answer_questions(quotation: str, questions: list[str]):
    "Extract the name of the author from the quotation below and answer questions."
    quotation  # append to the prompt
    with AIRole():  # assistant message
        f"The name of the author is {gen(stop='.')}"  # specify the prefix
    return [get_answer(q) for q in questions]  # parallelize calls


quotation = '"Simplicity is the ultimate sophistication." -- Leonardo da Vinci'
questions = [
    "In what era did the author live?",
    "What is the most famous painting of the author?",
]
for ans in answer_questions(quotation, questions):
    print(ans)
$ python answer_questions.py

You can find the result trace file in the specified path. The default location is ./dumps/traces/answer_questions_<the timestamp>.pkl. You can visualize the trace file using the method you want.

Resuming from a Previous Trace

You can reproduce the execution results from a previous trace by specifying the APPL_RESUME_TRACE environment variable with the path to the trace file:

$ APPL_RESUME_TRACE=<path to the trace file> python answer_questions.py

Then each LM call will be loaded from the trace file if it exists (loading from the trace is of higher priority than the persistent cache). Such loading can be useful for:

  • Debugging a specific LM call: the LM calls before that can be loaded from the trace file, therefore no need to resend them with extra costs.
  • Reproducible results: the trace file can be shared with others to reproduce the same results.
  • Recovery from failures: if the program fails, you can resume from the trace file to avoid resending the LM calls.

strict_match for calls with same prompts

When strict_match is False, the LM calls with the same prompt will load the same response from the trace file. To load the response for each LM call correspondingly, you can set strict_match to True (which is the default setting), then the gen_id of the LM call will also be used for matching.

Visualizing the Trace

Langfuse is an open-source web-based tool for visualizing traces and LLM calls.

You can host Langfuse locally or use public version.

git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up

Then you can set the environment variables for the Langfuse server by:

.env
LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>
LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>
LANGFUSE_HOST=<your-langfuse-host>
# Set to http://localhost:3000 if you are hosting Langfuse locally
You can find your Langfuse public and private API keys in the project settings page (Project Dashboard -> Configure Tracing).

Then you can visualize the traces by:

$ appltrace <path to the trace file>

Then you will see conversation like:

Langfuse Conversation

and the timeline like:

Langfuse Timeline

Troubleshooting: Incomplete traces on Langfuse

You may see incomplete traces (function calls tree) in Langfuse when you click from the Traces page. This might because langfuse apply a filter based on the timestamp. Try to remove the ?timestamp=<timestamp> in the url and refresh the page.

Lunary

Lunary is another open-source web-based tool for visualizing traces and LLM calls.

You can host Lunary locally or use their hosted version.

You can follow the steps to start a local Lunary server. After installing the Postgres, you may create a database for Lunary:

createuser postgres --createdb
createdb lunary -U postgres
# Your postgres url is: "postgresql://postgres:@localhost:5432/lunary"
# you can verify the database by:
psql postgresql://postgres:@localhost:5432/lunary

Then you can use this url to set the DATABASE_URL in "packages/backend/.env". You may also change other environment variables in the ".env" file according to your needs. Then you can set the environment variables for the Lunary server by:

# `1c1975c5-13b9-4977-8003-89fff5c71c27` is the project ID of the default project, you can get the project ID from the website.
export LUNARY_API_KEY=<your project ID>
# `http://localhost:3333` is the default url
export LUNARY_API_URL=<the url of the Lunary server>

Then you can visualize the traces by:

$ appltrace <path to the trace file> --platform lunary

Then you will see:

Lunary

Simple HTML and Chrome Tracing

You can then visualize the traces using the script:

$ appltrace <path to the trace file> -o <output file>

The default output file is a HTML file, which can be viewed in a browser. We provide a sample trace file here.

If you specify the output file to be a .json file, the script will generate a JSON file that is loadable by Chrome's tracing viewer (with address: chrome://tracing/). The loaded trace will look like this:

Chrome Trace Viewer

This way is going to be deprecated where Langfuse provides much better visualization.

LangSmith

Optionally, you can use LangSmith to inspect the LM calls and responses in the trace files. You need to obtain your API key from LangSmith and add the following environment variables to your .env file:

.env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<your api key>
# [Optional] specify the project name
# LANGCHAIN_PROJECT=<your project name>
LangSmith may contain inaccurate statistics for asynchronous LM calls

Langsmith

When running the example, the time statistics for the get_answer function calls are not consistent.

Nonetheless, it is sometimes useful to record and inspect the LM calls and responses using LangSmith.