DeepSeek v4 replica at 100M parameters trained on TinyStories
A miniature implementation of DeepSeek v4 architecture at 100 million parameters, trained on the TinyStories corpus, is now available on HuggingFace with an interactive demo space.
A 100-million-parameter toy implementation of the DeepSeek v4 architecture has been released on HuggingFace, trained on the TinyStories dataset. The model, ml-intern-v4-100m-tinystories, replicates the architectural patterns of DeepSeek's latest release at a scale suitable for educational experimentation and local testing. Weights and model card are available at AlexWortega/ml-intern-v4-100m-tinystories-20260512-1721, with an interactive demo hosted in a HuggingFace Space. Released May 12, 2026, the model sits in the range accessible to consumer hardware—small enough to run inference on a laptop CPU, large enough to exhibit the routing and expert-selection behaviors that define modern mixture-of-experts architectures.
TinyStories is a synthetic dataset of simple children's stories designed for training small language models on coherent narrative generation. Because the model is open-weight and runs locally, it can be fine-tuned or prompted without server-side safety enforcement. Toy-scale reproductions of frontier architectures have become a common pattern in the open-source AI community, letting students and hobbyists trace the mechanics of systems like DeepSeek, Mixtral, and Qwen without the infrastructure overhead of billion-parameter training runs.
