Dywave cuts IoT sensor token length 75% with event-aligned compression
A new tokenization framework uses wavelet decomposition to compress IoT sensor streams by up to 75% while improving activity recognition accuracy by 12% across five real-world datasets.
Dywave, a dynamic tokenization framework for IoT sensing signals, compresses input sequences by identifying temporal boundaries aligned with physical events. The method applies wavelet-based hierarchical decomposition to non-stationary sensor streams — accelerometer traces, physiological signals, environmental readings — and adaptively shortens redundant intervals while preserving the temporal structure that matters for downstream tasks. Across five real-world datasets spanning human activity recognition, stress assessment, and nearby object detection, Dywave reduced input token lengths by up to 75 percent compared to fixed-window tokenization, while lifting accuracy by up to 12 percent over state-of-the-art baselines.
The core insight is that IoT signals are multi-scale and event-driven: a sudden acceleration spike during a fall, a gradual heart-rate climb under stress, a sharp temperature drop when a door opens. Standard fixed-length tokenization treats all intervals equally, forcing models to process long stretches of near-constant readings. Dywave instead decomposes the signal into wavelet subbands, spots boundaries where the signal changes character, and assigns shorter tokens to stable regions and longer tokens to transient events. The authors tested the approach on mainstream sequence models — Transformers, LSTMs, temporal convolutional networks — and found consistent gains in both accuracy and inference speed. The method also proved more robust to domain shifts and varying sequence lengths, a practical advantage when deploying models across different sensor hardware or sampling rates.
The preprint, posted to arXiv on May 15, stops short of open weights or a public code repository, leaving practitioners without a drop-in implementation. The evaluation focused on relatively small datasets (the largest had tens of thousands of samples), and the authors did not report wall-clock training time or memory footprint under real-time constraints. Industrial IoT deployments often run on edge devices with tight power budgets, and it remains unclear whether the wavelet decomposition overhead fits within those limits. Whether the team releases a reference implementation and benchmarks the method on larger-scale datasets or streaming scenarios — where token compression directly translates to battery life — will determine how quickly the approach moves from theory to production.
