SelfCI framework trains LLMs to honor privacy norms without sacrificing task performance

A new self-distillation method trains language models to honor contextual privacy norms without sacrificing task performance, outperforming reinforcement learning baselines on agentic workflows.

ByAlex Sokoloff·May 18, 2026

SelfCI framework trains LLMs to honor privacy norms without sacrificing task performance

Privacy in large language models isn't just about hiding information—it's about respecting when and how data should flow in a given context. A preprint published May 21 proposes SelfCI, a self-distillation framework that lets models learn to withhold sensitive information without degrading their ability to complete tasks.

The method, developed by Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi, Hyomin Lee, and Kangsan Kim, addresses what researchers call Contextual Integrity: the principle that privacy norms vary by situation. A calendar app should never share your medical history; an email draft tool should never leak your salary. Existing safety approaches either leak information or cripple performance. SelfCI decouples the two problems.

The framework trains two independent teacher distributions from feedback signals. One preserves task-relevant information to maintain utility; the other enforces minimal, appropriate disclosure. The model learns from both via complementary reverse KL divergences, producing what the authors call a Product-of-Experts target—essentially the intersection of "can do the job" and "respects privacy norms." No external supervision is required beyond the feedback itself.

In evaluations against online reinforcement learning baselines like GRPO, SelfCI consistently outperformed on both in-domain and out-of-domain tests. The gains held up in agentic workflows where models accumulate private context over multiple turns—the kind of scenario where a personal agent (calendar manager, email drafter, health tracker) must make nuanced decisions about what to reveal and when. The results suggest the approach scales to real deployment scenarios where models handle sensitive workflows as trusted agents.

ZenCreator

SelfCI framework trains LLMs to honor privacy norms without sacrificing task performance

More in Research

Avito launches year-long Data Science Bootcamp with ML and NLP tracks

HuggingFace Jobs launches one-command vLLM deployment on H100 and A100

Gemma 4 voice AI hits sub-100ms latency on Cerebras wafer-scale chips

Hugging Face embeds 200+ benchmark scores directly on model cards

NVIDIA NeMo AutoModel cuts fine-tuning setup time for Llama, Mistral, Gemma