DeepSeek v4 architecture shrunk to 100M parameters, trained on TinyStories
A 100-million-parameter DeepSeek v4 architecture experiment trained on the TinyStories dataset is now live on HuggingFace with an interactive demo space.
ml-intern-v4-100m is a 100-million-parameter implementation of DeepSeek v4 architecture trained on the TinyStories dataset, released May 12, 2026. The model compresses DeepSeek's mixture-of-experts routing and multi-head latent attention into a weight class comparable to early GPT-2 checkpoints, making it runnable on consumer hardware. Weights are available on HuggingFace under AlexWortega/ml-intern-v4-100m-tinystories-20260512-1721, with an interactive demo at AlexWortega/ml-intern-v4-100m-tinystories-demo.
Training on TinyStories—a corpus of roughly 2 million synthetic short stories totaling around 500MB of text—completed in a timeframe practical for individual researchers rather than institutional labs. The 100M-parameter scale creates a test bed for architectural experiments without multi-GPU infrastructure, demonstrating how DeepSeek's advanced routing and attention mechanisms behave in low-resource settings. An interactive HuggingFace Spaces demo allows browser-based text generation without local setup.
