Loading…

LiSA adapts AI agent guardrails to deployment contexts without retraining | UncensoredHub

ReleasesResearch

LiSA adapts AI agent guardrails to deployment contexts without retraining

Researchers introduce LiSA, a memory-based framework that adapts AI agent guardrails to real-world deployment contexts using occasional user-reported failures, tested across three safety benchmarks.

May 15, 2026

LiSA adapts AI agent guardrails to deployment contexts without retraining

LiSA (Lifelong Safety Adaptation) is a conservative policy induction framework that lets deployed AI agent guardrails adapt to their operating environment without fine-tuning the underlying model. Posted to arXiv on May 15, the preprint addresses a practical gap: as agents move beyond chat into workflows that read private data and call external tools, guardrail failures can leak secrets or authorize unsafe actions. Yet deployment feedback is typically sparse and noisy, and repeated fine-tuning is often impractical.

The system converts occasional user-reported failures into reusable policy abstractions stored in structured memory. When a new query arrives, LiSA checks whether stored rules apply, adds conflict-aware local rules to prevent overgeneralization when training examples contradict each other, and gates memory reuse with an evidence-aware confidence threshold tied to a posterior lower bound. The result is a guardrail that improves over time without touching the base model's weights.

Across three benchmarks—PrivacyLens+, ConFaide+, and AgentHarm—LiSA consistently outperforms memory-based baselines under sparse feedback, remains stable even at 20% label-flip rates, and pushes the latency-performance frontier beyond backbone model scaling. The framework is designed for the hardest class of guardrail failures: those that depend on local privacy norms, organizational policies, or user expectations that can't be fully specified before deployment.

What stands out

01Sparse feedback regime. LiSA delivers consistent gains when only a small fraction of failures are flagged, making it practical for real-world deployments where user reports are rare and delayed.
02Noise robustness. The framework remains stable under 20% label-flip rates—a realistic reflection of user reports that may be wrong, inconsistent, or context-dependent.
03Latency vs performance trade-off. Memory-based adaptation delivers better safety outcomes at lower inference cost than switching to a larger guardrail model, pushing beyond simple backbone scaling.

What stands out

More in Releases