Insights

Platform overview — usage, tokens, active models, and system health

Total Requests

4.83M

across all deployments

+12.4%vs. prev period

Tokens Consumed

19.2B

input + output combined

+8.7%vs. prev period

Active Models

2 deploying · 1 failed

—no change

Avg p95 Latency

312 ms

SLA threshold: 500 ms

+18 msvs. 1h ago

GPU Utilization

78%

eu-west-1 at 96%

+9%vs. 1h ago

Error Rate

1.4%

68 errors / 4.83M req

-0.3%vs. prev period

Token Consumption

Input vs. output tokens (billions)

InputOutput

Model Usage Share

% of total requests by model

Request Volume (today)

Requests per 2-hour window

Peak: 610 req at 14:00

Running Models

Live status of all model deployments

3 active · 1 deploying · 1 failed

Model	Status	GPU	p95 Latency	Req/s	Uptime
Qwen/Qwen3.6-35B-A3B	active	A100 × 4	312 ms	18.4	99.8%
meta-llama/Llama-3.1-70B	active	H100 × 2	428 ms	9.2	99.5%
mistralai/Mistral-7B-v0.3	active	A10G × 1	98 ms	6.1	100%
google/gemma-2-9b-it	deploying	A10G × 2	—	—	—
Phi-3.8B-mini	failed	A10G × 1	—	—	—

Recent Alerts

System events & warnings

eu-west-1 GPU utilization at 96% — near capacity limit

4 min ago

Phi-3.8B-mini deployment failed — OOM on node gpu-07

22 min ago

Gemma-2-9B-it deployment started — ETA ~8 min

31 min ago

p95 latency spike on Llama-3.1-70B (+140 ms vs baseline)

1 h ago

Available Quota

$4,280

of $5,000 monthly allocation

Shared Infra Charges

$412

developer background clusters

Dedicated Infra Charges

$308

isolated enterprise nodes