Deploy a Model

Choose a preset model or enter any HuggingFace model path to deploy an inference server.

Choose Model

Model Family	Version	Specialized Task	GPU	VRAM	Tags
Qwen3-Coder	30B MoE	Code Generation	H200:1	46 GB VRAM	Tool callingfp8
Qwen3-Coder	30B MoE FP8	Code Generation	H200:1	48 GB VRAM	Tool callingfp8
Qwen3	4B Instruct 2507	Text Generation	RTX5090:1	24 GB VRAM	Tool callingfp8
Qwen3	0.6B	Text Generation	RTX5090:1	16 GB VRAM	Tool calling
Qwen3	1.7B	Text Generation	RTX5090:1	16 GB VRAM	Tool calling
Qwen3	4B	Text Generation	RTX5090:1	16 GB VRAM	Tool calling
Qwen3	8B	Text Generation	RTX5090:1	24 GB VRAM	Tool calling
Qwen3	3.5B	Text Generation	RTX5090:1	24 GB VRAM	Tool calling

23 models · page 1 of 3

Custom HuggingFace model path (optional)

Serving Framework

GPU Type

Number of GPUs

Context Length (tokens)

Strict GPU (require exact GPU match, no fallback)Retry until up (keep retrying GPU provisioning until available)