Selectel deploys one-click inference for Qwen, Whisper, DeepSeek on GPU clusters
Russian cloud provider Selectel launched a catalog of pre-configured AI models that deploy inference services in two clicks on dedicated GPU hardware with pay-per-use pricing.
Selectel, a Russian cloud infrastructure provider, released an AI model catalog that deploys inference endpoints for Qwen, Whisper, DeepSeek, and other open-weight models on private GPU clusters. Users pick a model from the catalog and receive a running inference API in under a minute, eliminating the need to provision GPUs, tune batch sizes, or manage container orchestration.
The offering targets teams running open-weight models in production who need both performance and data residency. Selectel pre-configures each model with optimized serving infrastructure, and compute scales automatically under load. Pricing is usage-based, billed by actual GPU time consumed rather than reserved capacity — a cost-predictable model for workloads with variable throughput. The catalog includes text generation, code completion, speech recognition, and content synthesis models, running on current-generation GPUs within Selectel's data centers.
Keeping model weights and user data within Selectel's infrastructure addresses data sovereignty requirements for organizations handling sensitive workloads. The approach mirrors managed inference offerings from Replicate, Modal, and Together AI in Western markets, but anchored to Russian infrastructure.




