$0.00

Metrics

Latency, throughput, error rates, and resource usage per model

p50 Latency210 ms
p95 Latency440 ms
p99 Latency620 ms
Peak RPS51.7
Avg Error Rate1.4%
Total Errors68
GPU Avg63%
Mem Avg67%

Latency Distribution

Request count per latency bucket (all models)

Normal (<500ms)Warning (400-500ms)Slow (>500ms)

Latency Percentiles by Model

p50 / p90 / p95 / p99 in milliseconds

Throughput Trend

Requests per second (RPS) per model over time

Error Rate & p95 Latency

Error % (left axis) and p95 latency ms (right axis) over time

Error Breakdown

By error type — last period

5xx Server Error
3855.9%
Timeout (>30s)
1623.5%
4xx Client Error
913.2%
OOM / Resource
57.4%
Total errors68

Resource Usage per Model

GPU utilization, memory, CPU, and request volume

ModelGPU Util.MemoryCPURequests
Qwen3.6-35B
78%
82%
34%
51,700
Llama-3.1-70B
91%
88%
41%
26,100
Mistral-7B
45%
52%
22%
17,400
Gemma-2-9B
38%
44%
18%
11,200