Metric Descriptions
All metrics below are broken down by model and sampled at the minute level, but depending on your selected time interval, data points may not be displayed for every minute. In such cases, data points within that interval are averaged.
- Requests Per Minute (RPM) Shows the number of API requests made per minute, helping you understand usage patterns and API concurrency levels.
- Request Success Rate Shows the percentage of successful API responses (non-5xx status codes) per minute, reflecting API availability.
- Average Tokens Per Request Shows the average number of input and output tokens per request per minute, helping you understand token consumption patterns.
- End-to-End (E2E) Latency Shows the total time required for the model to generate a complete response per minute of requests. Includes p99, p95, and average latency metrics.
-
Time to First Token (TTFT)
Shows the time required to process the prompt and generate the first output token per minute of requests. Includes p99, p95, and average latency metrics.This metric is only tracked for streaming requests with the
stream=trueparameter enabled. -
Time Per Output Token (TPOT)
Shows the average time between consecutive output tokens per minute of requests. Includes p99, p95, and average latency metrics.This metric is only tracked for streaming requests with the
stream=trueparameter enabled.