Setup Guide¶

You need to set up API keys or your own LLM backends to interact with LLMs. Besides, APPL has many configurations that you can customize to fit your needs.

Setup Environment Variables¶

Using Dotenv (Recommended)¶

We recommended you to store all your environment variables, including API keys, in a .env file in the root directory of your project (or other directories, see Priority of Configurations for more details). Based on the python-dotenv package, APPL will automatically load the environment variables into the current environment.

Remember to setup your .gitignore file

Make sure to add .env to your .gitignore file to prevent it from being committed to your repository.

For example, you can create a .env file with the following content to specify your OpenAI API key:

.env

OPENAI_API_KEY=<your openai api key>

We provide an example of .env.example file in the root directory, you can copy it to your project directory and modify it.

.env.example

# API keys
OPENAI_API_KEY=<your-openai-api-key>
ANTHROPIC_API_KEY=<your-anthropic-api-key>

# Observability platform
## Langfuse
## You can find the keys at: <your-langfuse-host>/project/<project-id>/setup (Project Dashboard -> Configure Tracing)
LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>
LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>
LANGFUSE_HOST=<your-langfuse-host>

## Lunary
LUNARY_PUBLIC_KEY=<your-lunary-public-key>
LUNARY_API_URL=<your-lunary-api-url>

Export or Shell Configuration¶

Alterantively, you can export the environment variables directly in your terminal, or add them to your shell configuration file (e.g., .bashrc, .zshrc). For example:

export OPENAI_API_KEY=<your openai api key>

Setup APPL Configuration¶

Default Configs¶

default_configs.yaml contains the default configurations for APPL.

default_configs.yaml
metadata: {}

settings:
  logging:
    format: >- # The format of the log message, change HH:mm:ss.SSS to HH:mm:ss for the default loguru format
      <green>{time:YYYY-MM-DD HH:mm:ss}</green> |
      <level>{level: <8}</level> |
      <cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> |
      <level>{message}</level>
    log_level: "INFO" # The level of the log messages
    max_length: 800 # The maximum length of the log message in bash
    suffix_length: 200 # The length of the suffix (when truncated)
    enable_stderr: true  # enable logging to stderr
    enable_file: true  # enable logging to a file
    log_file:
      path_format: './logs/{basename}_{time:YYYY_MM_DD__HH_mm_ss}'
      # The path to the log file, ext will be added automatically
      log_level: null # default to use the same log level as stdout
    display:
      configs: false # Display the configurations
      configs_update: false # Display the updates of the configurations
      docstring_warning: true # Display the warning message when docstring are excluded
      llm_raw_call_args: false # Display the raw args for the llm calls
      llm_raw_response: false # Display the raw response of the llm calls
      llm_raw_usage: false # Display the raw usage of the llm calls
      llm_call_args: false # Display the args for the llm calls
      llm_response: true # Display the response of the llm calls
      llm_usage: false # Display the usage of the llm calls
      llm_cache: false # Display the cache info
      llm_cost: true # Display the cost of the calls
      tool_calls: true # Display the tool calls
      tool_results: true # Display the results of the tool calls
      streaming_mode: "print" # The mode to display the streaming output, choices are "live", "print", "none"
      rich:
        lexer: "markdown" # The language of the rich output
        theme: "monokai" # The theme of the rich output
        line_numbers: false # Whether to display the line numbers
        word_wrap: true # Whether to wrap the words
        refresh_per_second: 4 # The refresh rate of the rich output

  caching:
    enabled: true # default to enable the caching
    folder: "~/.appl/caches" # The folder to store the cache files
    max_size: 100000  # Maximum number of entries in cache
    time_to_live: 43200 # Time-to-live in minutes (30 days)
    cleanup_interval: 1440 # Cleanup interval in minutes (1 day)
    allow_temp_greater_than_0: false # Whether to cache the generation results with temperature to be greater than 0

  tracing:
    enabled: false # default to not trace the calls
    path_format: './dumps/traces/{basename}_{time:YYYY_MM_DD__HH_mm_ss}'
    # The path to the trace file, ext will be added automatically
    patch_threading: true # whether to patch `threading.Thread`
    strict_match: true # when saving and loading cache, whether need to match the generation id
    display_trace_info: true # whether to display the trace info
    trace_to_resume: null  # the trace to resume

  concurrency:
    llm_max_workers: 10 # The maximum number of workers for the llm executor
    thread_max_workers: 20 # The maximum number of workers for the threading executor
    process_max_workers: 10 # The maximum number of workers for the processing executor

  messages:
    colors:
      system: red
      user: green
      assistant: cyan
      tool: magenta

  misc:
    suppress_litellm_debug_info: true

prompts:
  continue_generation: >-
    The previous message was cut off due to length limit, please continue to
    complete the message by starting with the last line (marked with
    {last_marker}). Make sure the indentation is correct when continuing.
    Begin your continuation with {last_marker}.
  continue_generation_alt: >-
    The previous message was cut off due to length limit, please continue to
    complete the message by starting with the last part (marked with
    {last_marker}). Make sure the newline and indentation are correct when
    continuing. Begin your continuation with {last_marker}.

# When using APIs through litellm,
#   see the list of available models at https://docs.litellm.ai/docs/providers
# You can directly use the model name as server name.
default_servers:
  default: null
  small: null
  large: null

# When using SRT server,
#   set model to "srt" and api_base to the address of the server.
servers:
  # create aliases for the models, you may use the model name as server name directly
  # You can also set the default args for the server, for example:
  # gpt-4o-t07:
  #   model: gpt-4o
  #   temperature: 0.7

Setup your default models

You should specify your own default model in the appl.yaml file. You may also specify the default "small" and "large" models, which will fallback to the default model if not specified. The name can be a server name in your configuration (servers section), or a model name that is supported by litellm.

appl.yaml (example)

settings:
  model:
    default: gpt-4o-mini # small model fallback to this
    large: gpt-4o

Override Configs¶

You can override these configurations by creating a appl.yaml file in the root directory of your project (or other directories, see Priority of Configurations for more details). A typical usage is to override the servers configuration to specify the LLM servers you want to use, as shown in the following example appl.yaml file.

appl.yaml (example)
settings:
  logging:
    # log_file:
    #   enabled: true
    display:
      configs: false
      # llm_call_args: false # true
      # llm_response: false # true
      # llm_cache: false # true
      # llm_cost: false # true
  tracing:
    enabled: true

# default_servers:
#   default: azure-gpt35 # override the default server according to your needs

# example for setting up servers
servers:
  azure-gpt35: # the name of the server
    model: azure/gpt35t # the model name
    # temperature: 1.0 # set the default temperature for the calls to this server
  gpt4-preview:
    model: gpt-4-0125-preview
  claude-35-sonnet:
    model: claude-3-5-sonnet-20240620
  claude-3-opus:
    model: claude-3-opus-20240229
  moonshot-8k:
    model: moonshot-v1-8k
    provider: custom # https://docs.litellm.ai/docs/providers/custom
    api_key: os.environ/MOONSHOT_API_KEY # setup your API key in the environment variable
    base_url: "https://api.moonshot.cn/v1"
    # add cost, see also https://docs.litellm.ai/docs/proxy/custom_pricing
    input_cost_per_token: 1.2e-5
    output_cost_per_token: 1.2e-5
    cost_currency: "CNY"
  # litellm has supported deepseek now, you can simply set up the deepseek server like this:
  # deepseek-chat:
  #   model: deepseek/deepseek-chat
  # deepseek-coder:
  #   model: deepseek/deepseek-coder
  # Below use deepseek apis to illustrate how to set up a group of apis that not (yet) supported by litellm
  deepseek-chat:
    model: deepseek-chat
    provider: custom
    api_key: os.environ/DEEPSEEK_API_KEY
    base_url: "https://api.deepseek.com/v1"
    input_cost_per_token: 1.0e-6
    output_cost_per_token: 2.0e-6
    cost_currency: "CNY"
  deepseek-coder:
    template: deepseek-chat
    model: deepseek-coder
  srt-llama2: # llama2 served using SRT
    model: custom/default # the default is just a irrelevant name, could be changed to your preference
    base_url: "http://127.0.0.1:30000/v1" # the example address of the SRT server

How configurations are updated?

The configurations are implemented as a nested dictionary in Python using addict.Dict. The update is recursively applied according to this link.

Other file formats like JSON or TOML are also supported.

You can also use other file formats, such as JSON (appl.json) or TOML (appl.toml), to specify the configurations. We recommend using YAML for better readability.

Setup LLMs¶

You can configure the LLM servers in the appl.yaml file by overriding the servers configuration as shown in the example above.

LLM APIs¶

APPL uses litellm to support various LLM APIs using the OpenAI format. Please refer to the list of supported providers.

You need to setup the corresponding API keys for the LLM backend you want to use in environment variables and specify corresponding configurations in appl.yaml.

An example of .env file that matches the appl.yaml example above to support using APIs from OpenAI, Anthropic, Azure, Moonshot, and DeepSeek is as follows:

.env

OPENAI_API_KEY=<your openai api key>
# Anthropic environment variables
ANTHROPIC_API_KEY=<your anthropic api key>
# Azure environment variables
AZURE_API_KEY=<your azure api key>
AZURE_API_BASE=<the base url of the API>
AZURE_API_VERSION=<the version of the API>
# Moonshot environment variables
MOONSHOT_API_KEY=<your moonshot api key>
# DeepSeek environment variables
DEEPSEEK_API_KEY=<your deepseek api key>

Local LLMs¶

We recommend using SGlang Runtime (SRT) to serve the local LLMs, which is fast and supports the regex constraints. You can install it following the official guide.

To serve local LLMs, please following SGLang's official guide.

Using Llama-2-7b-chat-hf as an example:

python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000

You may also use vLLM to host a local server, and the usage in APPL is similar to SRT.

Setup Tracing¶

APPL Tracing¶

You can enable APPL tracing by overriding the tracing configuration to true in appl.yaml.

appl.yaml

settings:
  tracing:
    enabled: true

To resume from a previous trace, you can specify the APPL_RESUME_TRACE environment variable with the path to the trace file. See more details in the tutorial.

Visualize Traces¶

Langfuse (Recommended)¶

Langfuse is an open-source web-based tool for visualizing traces and LLM calls.

You can host Langfuse locally or use public version.

git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up

Then you can set the environment variables for the Langfuse server by:

.env

LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key>
LANGFUSE_SECRET_KEY=<your-langfuse-secret-key>
LANGFUSE_HOST=<your-langfuse-host>
# Set to http://localhost:3000 if you are hosting Langfuse locally

You can find your Langfuse public and private API keys in the project settings page (Project Dashboard -> Configure Tracing).

Please see the tutorial for more details.

You can see conversation like:

Langfuse Conversation

and the timeline like:

Langfuse Timeline

Lunary¶

Please see the tutorial for more details.

LangSmith¶

To enable LangSmith tracing, you need to to obtain your API key from LangSmith and add the following environment variables to your .env file:

.env

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<your api key>
# [Optional] specify the project name
# LANGCHAIN_PROJECT=<your project name>

Priority of Configurations¶

Let's say you are running a script (python main.py) in the project directory, and you have the following directory structure:

.env
appl.yaml
project/
├── .env
├── appl.yaml
└── main.py

Starting from the directory containing the file to be executed (in this case, main.py), APPL will walk up the directory tree to find custom configurations. The configurations in the files closer to the file will have higher priority. For the .env file, the search starts from the current working directory (project directory in this case).

In this case, the configurations in appl.yaml will be loaded first, followed by the ones in project/appl.yaml with overriding the previous ones. The environment variables in .env will be loaded first, followed by the ones in project/.env with overriding the previous ones.

Difference between .env and appl.yaml

.env is used to store environment variables, including API keys. The contents should not be committed to the repository as they may contain sensitive information.
appl.yaml is used to store APPL configurations, such as LLM servers and tracing settings. When used in a project, it should not contain sensitive information. The settings in general should be committed to the repository so that others can reproduce the results.