Skip to main content

📈 [BETA] Prometheus metrics

info

✨ Prometheus metrics is on LiteLLM Enterprise starting at $250/mo

Enterprise Pricing

Contact us here to get a free trial

LiteLLM Exposes a /metrics endpoint for Prometheus to Poll

Quick Start

If you're using the LiteLLM CLI with litellm --config proxy_config.yaml then you need to pip install prometheus_client==0.20.0. This is already pre-installed on the litellm Docker image

Add this to your proxy config.yaml

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
success_callback: ["prometheus"]
failure_callback: ["prometheus"]

Start the proxy

litellm --config config.yaml --debug

Test Request

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

View Metrics on /metrics, Visit http://localhost:4000/metrics

http://localhost:4000/metrics

# <proxy_base_url>/metrics

📈 Metrics Tracked

Virtual Keys, Teams, Internal Users Metrics

Use this for for tracking per user, key, team, etc.

Metric NameDescription
litellm_requests_metricNumber of requests made, per "user", "key", "model", "team", "end-user"
litellm_spend_metricTotal Spend, per "user", "key", "model", "team", "end-user"
litellm_total_tokensinput + output tokens per "user", "key", "model", "team", "end-user"

LLM API / Provider Metrics

Use this for LLM API Error monitoring and tracking remaining rate limits and token limits

Labels Tracked for LLM API Metrics

litellm_model_name: The name of the LLM model used by LiteLLM
requested_model: The model sent in the request
model_id: The model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id
api_base: The API Base of the deployment
api_provider: The LLM API provider, used for the provider. Example (azure, openai, vertex_ai)
Metric NameDescription
litellm_deployment_success_responsesTotal number of successful LLM API calls for deployment
litellm_deployment_failure_responsesTotal number of failed LLM API calls for a specific LLM deploymeny. exception_status is the status of the exception from the llm api
litellm_deployment_total_requestsTotal number of LLM API calls for deployment - success + failure
litellm_remaining_requests_metricTrack x-ratelimit-remaining-requests returned from LLM API Deployment
litellm_remaining_tokensTrack x-ratelimit-remaining-tokens return from LLM API Deployment
litellm_deployment_stateThe state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage.
litellm_deployment_latency_per_output_tokenLatency per output token for deployment

Load Balancing, Fallback, Cooldown Metrics

Use this for tracking litellm router load balancing metrics

Metric NameDescription
litellm_deployment_cooled_downNumber of times a deployment has been cooled down by LiteLLM load balancing logic. exception_status is the status of the exception that caused the deployment to be cooled down
litellm_deployment_successful_fallbacksNumber of successful fallback requests from primary model -> fallback model
litellm_deployment_failed_fallbacksNumber of failed fallback requests from primary model -> fallback model

Request Latency Metrics

Metric NameDescription
litellm_request_total_latency_metricTotal latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels litellm_call_id, model
litellm_llm_api_latency_metriclatency (seconds) for just the LLM API call - tracked for labels litellm_call_id, model

Budget Metrics

Metric NameDescription
litellm_remaining_team_budget_metricRemaining Budget for Team (A team created on LiteLLM)
litellm_remaining_api_key_budget_metricRemaining Budget for API Key (A key Created on LiteLLM)

Monitor System Health

To monitor the health of litellm adjacent services (redis / postgres), do:

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
service_callback: ["prometheus_system"]
Metric NameDescription
litellm_redis_latencyhistogram latency for redis calls
litellm_redis_failsNumber of failed redis calls
litellm_self_latencyHistogram latency for successful litellm api call

🔥 Community Maintained Grafana Dashboards

Link to Grafana Dashboards made by LiteLLM community

https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard