Metrics
Latency, throughput, error rates, and resource usage per model
p50 Latency210 ms
p95 Latency440 ms
p99 Latency620 ms
Peak RPS51.7
Avg Error Rate1.4%
Total Errors68
GPU Avg63%
Mem Avg67%
Latency Distribution
Request count per latency bucket (all models)
Normal (<500ms)Warning (400-500ms)Slow (>500ms)
Latency Percentiles by Model
p50 / p90 / p95 / p99 in milliseconds
Throughput Trend
Requests per second (RPS) per model over time
Error Rate & p95 Latency
Error % (left axis) and p95 latency ms (right axis) over time
Error Breakdown
By error type — last period
5xx Server Error
3855.9%
Timeout (>30s)
1623.5%
4xx Client Error
913.2%
OOM / Resource
57.4%
Total errors68
Resource Usage per Model
GPU utilization, memory, CPU, and request volume
| Model | GPU Util. | Memory | CPU | Requests |
|---|---|---|---|---|
| Qwen3.6-35B | 78% | 82% | 34% | 51,700 |
| Llama-3.1-70B | 91% | 88% | 41% | 26,100 |
| Mistral-7B | 45% | 52% | 22% | 17,400 |
| Gemma-2-9B | 38% | 44% | 18% | 11,200 |