ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

ReleasesPlatformNSFW

HuggingFace Jobs launches one-command vLLM deployment on H100 and A100

HuggingFace's Jobs platform added one-command vLLM deployment this week, letting practitioners launch inference servers on H100 or A100 hardware without writing YAML or managing containers.

ByAlex Sokoloff·July 5, 2026

HuggingFace Jobs launches one-command vLLM deployment on H100 and A100

HuggingFace Jobs now deploys vLLM inference servers with a single terminal command. The new CLI flag --vllm spins up a production-ready endpoint on rented H100 or A100 hardware, skipping the usual container-config and YAML-writing steps that slow down model deployment.

The feature targets practitioners who want to serve open-weight models—Llama, Qwen, Mistral, Gemma—at scale without managing Kubernetes or cloud-provider dashboards. A typical invocation looks like hf jobs create --vllm meta-llama/Llama-3.1-70B-Instruct --gpu h100:1, which provisions a single H100 instance, pulls the model weights from the Hub, and returns an OpenAI-compatible API endpoint. The server auto-scales request batching and supports streaming, function calling, and multi-GPU tensor parallelism when more than one accelerator is specified.

What stands out

Zero-config deployment. No Dockerfile, no docker-compose.yml, no Helm chart. The CLI reads model metadata from the HuggingFace card, picks vLLM's optimal quantization and attention backend, and starts serving.
OpenAI-drop-in compatibility. The endpoint exposes /v1/completions and /v1/chat/completions routes that match OpenAI's schema, so existing client code—LangChain, LlamaIndex, custom Python scripts—works without modification.
Per-minute billing. Jobs charges only for active inference time, not idle uptime. An H100 instance costs roughly $3.60/hour when serving requests; the meter stops when traffic drops to zero.

ZenCreator

HuggingFace Jobs launches one-command vLLM deployment on H100 and A100

What stands out

More in Releases

Hugging Face embeds 200+ benchmark scores directly on model cards

NVIDIA NeMo AutoModel cuts fine-tuning setup time for Llama, Mistral, Gemma

Ford admits AI design tools fell short, rehires veteran engineers

Fable 5 returns to Claude after three-week suspension

OpenAI maps 450 EU occupations for AI automation risk and growth potential