Loading…

Image Video Prompts Gallery Battles News Agents About

Terms Privacy Cookies DMCA

18+ · Adults only · Not affiliated with hosted platforms

Image Video Prompts Gallery Battles News

DeepSeek v4 architecture shrunk to 100M parameters, trained on TinyStories | UncensoredHub

← All news
·
Releases

ReleasesNSFWPlatform

DeepSeek v4 architecture shrunk to 100M parameters, trained on TinyStories

A 100-million-parameter DeepSeek v4 architecture experiment trained on the TinyStories dataset is now live on HuggingFace with an interactive demo space.

May 14, 2026

DeepSeek v4 architecture shrunk to 100M parameters, trained on TinyStories

ml-intern-v4-100m is a 100-million-parameter implementation of DeepSeek v4 architecture trained on the TinyStories dataset, released May 12, 2026. The model compresses DeepSeek's mixture-of-experts routing and multi-head latent attention into a weight class comparable to early GPT-2 checkpoints, making it runnable on consumer hardware. Weights are available on HuggingFace under AlexWortega/ml-intern-v4-100m-tinystories-20260512-1721, with an interactive demo at AlexWortega/ml-intern-v4-100m-tinystories-demo.

Training on TinyStories—a corpus of roughly 2 million synthetic short stories totaling around 500MB of text—completed in a timeframe practical for individual researchers rather than institutional labs. The 100M-parameter scale creates a test bed for architectural experiments without multi-GPU infrastructure, demonstrating how DeepSeek's advanced routing and attention mechanisms behave in low-resource settings. An interactive HuggingFace Spaces demo allows browser-based text generation without local setup.

ByAlex Sokoloff·AI enthusiast·MSc Computer Science

More in Releases