Insights
Platform overview — usage, tokens, active models, and system health
Total Requests
4.83M
across all deployments
Tokens Consumed
19.2B
input + output combined
Active Models
3
2 deploying · 1 failed
Avg p95 Latency
312 ms
SLA threshold: 500 ms
GPU Utilization
78%
eu-west-1 at 96%
Error Rate
1.4%
68 errors / 4.83M req
Token Consumption
Input vs. output tokens (billions)
Model Usage Share
% of total requests by model
Request Volume (today)
Requests per 2-hour window
Running Models
Live status of all model deployments
| Model | Status | GPU | p95 Latency | Req/s | Uptime |
|---|---|---|---|---|---|
| Qwen/Qwen3.6-35B-A3B | active | A100 × 4 | 312 ms | 18.4 | 99.8% |
| meta-llama/Llama-3.1-70B | active | H100 × 2 | 428 ms | 9.2 | 99.5% |
| mistralai/Mistral-7B-v0.3 | active | A10G × 1 | 98 ms | 6.1 | 100% |
| google/gemma-2-9b-it | A10G × 2 | — | — | — | |
| Phi-3.8B-mini | failed | A10G × 1 | — | — | — |
Recent Alerts
System events & warnings
eu-west-1 GPU utilization at 96% — near capacity limit
4 min ago
Phi-3.8B-mini deployment failed — OOM on node gpu-07
22 min ago
Gemma-2-9B-it deployment started — ETA ~8 min
31 min ago
p95 latency spike on Llama-3.1-70B (+140 ms vs baseline)
1 h ago
Available Quota
$4,280
of $5,000 monthly allocation
Shared Infra Charges
$412
developer background clusters
Dedicated Infra Charges
$308
isolated enterprise nodes