ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Meituan's LongCat 2.0: 1.6T parameters trained on 50,000 Chinese chips | UncensoredHub

Releases

Meituan's LongCat 2.0: 1.6T parameters trained on 50,000 Chinese chips

Meituan released LongCat 2.0, a 1.6-trillion-parameter model trained on 50,000 domestic Chinese chips, marking the first large-scale LLM pretrain outside Nvidia and Google hardware.

ByAlex Sokoloff·June 30, 2026

Meituan's LongCat 2.0: 1.6T parameters trained on 50,000 Chinese chips

Meituan released LongCat 2.0, a 1.6-trillion-parameter mixture-of-experts model trained entirely on 50,000 unnamed Chinese chips believed to be Huawei Ascend 910C accelerators. The company trained the model on 35 trillion tokens, including several hundred billion tokens at context lengths near one million tokens. Until now, pretraining at this scale had only been demonstrated on Nvidia GPUs and Google TPUs.

LongCat 2.0 activates 48 billion parameters per forward pass. Meituan priced API access at $0.75 per million input tokens and $3 per million output tokens. The model ran under the codename "Owl Alpha" on OpenRouter for the past two months, where performance was middling. Weights will be released soon under Apache 2.0 or MIT license, according to the company's typical practice.

Architecture highlights

01N-gram embeddings consume 10 percent of total parameters. LongCat routes inactive parameters not only to MoE layers but also to massive n-gram embedding tables. In the smaller LongCat Flash-Lite variant, n-gram embeddings account for nearly half of all parameters.
02Six-dimensional parallelism across embeddings. Meituan parallelizes the n-gram embedding layer itself, adding a sixth dimension to the training parallelism strategy on top of standard data, pipeline, tensor, expert, and sequence splits.
03Custom sparse attention derived from DSA. The team built a proprietary sparse attention mechanism by heavily modifying Dynamic Sparse Attention, though details on the changes remain unpublished.
04Million-token context in pretraining data. Several hundred billion of the 35 trillion pretraining tokens came from documents with context lengths around one million tokens, making LongCat one of the few models pretrained on ultra-long sequences rather than fine-tuned for them afterward.

ZenCreator

Meituan's LongCat 2.0: 1.6T parameters trained on 50,000 Chinese chips

Architecture highlights

More in Releases

Qwen3-ASR hits state-of-the-art on 30 languages with 2000× throughput at 0.6B

OTUS free RAG workshop teaches enterprise support teams document retrieval on July 6

ComfyUI MCP server lets AI agents control workflows with plain-text prompts

DreamForge-World 0.1 Preview reaches 15 FPS interactive simulation on single RTX 4090

MegaTrain trains 120B-parameter models on single GPU using CPU RAM