DeepSeek v4 replica at 100M parameters trained on TinyStories

A miniature implementation of DeepSeek v4 architecture at 100 million parameters, trained on the TinyStories corpus, is now available on HuggingFace with an interactive demo space.

May 15, 2026

DeepSeek v4 replica at 100M parameters trained on TinyStories

A 100-million-parameter toy implementation of the DeepSeek v4 architecture has been released on HuggingFace, trained on the TinyStories dataset. The model, ml-intern-v4-100m-tinystories, replicates the architectural patterns of DeepSeek's latest release at a scale suitable for educational experimentation and local testing. Weights and model card are available at AlexWortega/ml-intern-v4-100m-tinystories-20260512-1721, with an interactive demo hosted in a HuggingFace Space. Released May 12, 2026, the model sits in the range accessible to consumer hardware—small enough to run inference on a laptop CPU, large enough to exhibit the routing and expert-selection behaviors that define modern mixture-of-experts architectures.

TinyStories is a synthetic dataset of simple children's stories designed for training small language models on coherent narrative generation. Because the model is open-weight and runs locally, it can be fine-tuned or prompted without server-side safety enforcement. Toy-scale reproductions of frontier architectures have become a common pattern in the open-source AI community, letting students and hobbyists trace the mechanics of systems like DeepSeek, Mixtral, and Qwen without the infrastructure overhead of billion-parameter training runs.

ByAlex Sokoloff·AI enthusiast·MSc Computer Science

More in Releases