LLM trading agents show embedding drift before portfolio collapse

Researchers mapped embedding trajectories in LLM trading agents and found measurable pre-failure signatures—planning embeddings drift, effective rank contracts—across 80 rolling failure anchors and eight model runs.

ByAlex Sokoloff·May 29, 2026

LLM trading agents show embedding drift before portfolio collapse

A new preprint reveals that large language model trading agents exhibit measurable embedding drift before portfolio drawdowns occur. Researchers used TradeArena, an auditable trading testbed with risk reports and execution simulation, to analyze 80 rolling failure anchors across eight LLM trajectories. They found that planning embeddings drift from normal-state centroids and effective rank contracts before failures—a pattern that persists across hash, LSA, Transformer, and white-box hidden-state probes.

Stress tests with chain-of-thought-free weights, lexical controls, OHLCV noise, and false audit reports revealed that rationale-level contraction can vanish without explicit reasoning, while intent-space contraction may remain. Structured risk feedback acted as an external alignment signal without fine-tuning, but proved inconsistent: true audit feedback improved calibration for some models and return-drawdown metrics for others, while hidden or placebo feedback sometimes produced higher short-horizon returns with weaker alignment diagnostics. A 51-stock intraday experiment exposed a blind spot: LLM rationales often justify concentrated exposure to coupled assets that the risk layer repeatedly clips, with a rolling Markowitz baseline as a covariance reference.

The authors frame their work as a research claim rather than a profitability claim, arguing that auditable risk feedback and representation trajectories reveal when LLM financial reasoning is aligning, drifting, or failing.

ZenCreator

LLM trading agents show embedding drift before portfolio collapse

More in Research

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines