Inbox
4 unread alerts
Deployment failed — Phi-3.8B-mini
Node gpu-07 ran out of memory during model load. OOM error at 94% allocation. Retry or select a larger GPU instance.
GPU quota at 91% — eu-west-1
Your dedicated GPU allocation in eu-west-1 is at 91% capacity. Consider requesting a quota increase to avoid throttling.
p95 latency spike — Llama-3.1-70B
p95 latency increased by +140 ms above the 7-day baseline. Current p95: 568 ms. Investigate GPU saturation or request queue depth.
Quota increase request — pending approval
Your request for +8 A100 GPUs in us-east-1 is pending review by the infrastructure team. Expected SLA: 24 h.
Deployment started — Gemma-2-9B-it
Model deployment initiated on A10G × 2 in us-east-1. Estimated time to ready: ~8 minutes.
Error rate exceeded threshold — Mistral-7B
Error rate reached 4.2% over the last 15 minutes, exceeding the 2% alert threshold. Top error: HTTP 503 (upstream timeout).
Token quota at 78% — shared environment
Monthly token consumption in the shared dev cluster has reached 78% of the allocated quota. Resets on June 1.
Deployment successful — Qwen3.6-35B-A3B
Model is live and serving traffic on A100 × 4 in us-east-1. Current RPS: 18.4. Health checks passing.
Quota increase approved — ap-southeast-1
+4 A10G GPUs approved for ap-southeast-1. Quota is now active. You can deploy additional model instances immediately.
Throughput anomaly detected — Qwen3.6-35B
Throughput dropped 22% between 09:00–09:30 UTC. Likely correlated with upstream load balancer health check interval. Auto-resolved.
Deployment failed — Falcon-40B
Scheduling failed: no eligible nodes with H100 GPUs available in eu-west-1. All H100 slots are currently occupied.
Quota request requires additional info
Your H100 quota increase request for eu-west-1 requires a business justification. Please update the request with a use-case description.