$0.00

Insights

Platform overview — usage, tokens, active models, and system health

Total Requests

4.83M

across all deployments

+12.4%vs. prev period

Tokens Consumed

19.2B

input + output combined

+8.7%vs. prev period

Active Models

3

2 deploying · 1 failed

no change

Avg p95 Latency

312 ms

SLA threshold: 500 ms

+18 msvs. 1h ago

GPU Utilization

78%

eu-west-1 at 96%

+9%vs. 1h ago

Error Rate

1.4%

68 errors / 4.83M req

-0.3%vs. prev period

Token Consumption

Input vs. output tokens (billions)

InputOutput

Model Usage Share

% of total requests by model

Request Volume (today)

Requests per 2-hour window

Peak: 610 req at 14:00

Running Models

Live status of all model deployments

3 active · 1 deploying · 1 failed
ModelStatusGPUp95 LatencyReq/sUptime
Qwen/Qwen3.6-35B-A3BactiveA100 × 4312 ms18.499.8%
meta-llama/Llama-3.1-70BactiveH100 × 2428 ms9.299.5%
mistralai/Mistral-7B-v0.3activeA10G × 198 ms6.1100%
google/gemma-2-9b-itdeployingA10G × 2
Phi-3.8B-minifailedA10G × 1

Recent Alerts

System events & warnings

eu-west-1 GPU utilization at 96% — near capacity limit

4 min ago

Phi-3.8B-mini deployment failed — OOM on node gpu-07

22 min ago

Gemma-2-9B-it deployment started — ETA ~8 min

31 min ago

p95 latency spike on Llama-3.1-70B (+140 ms vs baseline)

1 h ago

Available Quota

$4,280

of $5,000 monthly allocation

Shared Infra Charges

$412

developer background clusters

Dedicated Infra Charges

$308

isolated enterprise nodes