Robots that learn from a handful of examples just got a more efficient backbone.
Researchers have released RoboSSM, a framework for in-context imitation learning that ditches the Transformer architecture in favor of state-space models (SSMs) — specifically a model called Longhorn. The key claim: SSMs run inference in linear time and handle longer input sequences better than Transformers, which tend to degrade when test-time prompts are longer than what the model saw during training. On the LIBERO robotics benchmark, RoboSSM outperformed Transformer-based approaches on both unseen tasks and longer-horizon tasks. Code is public on GitHub.
This matters because in-context imitation learning is one of the more practical paths to flexible robots — no retraining required at deployment, just feed in a few demonstrations and go. The Transformer bottleneck at long contexts is a real wall, and if SSMs genuinely clear it, that opens the door to more complex, multi-step task prompts without ballooning compute costs.
SSMs have already made headway in language modeling as a leaner alternative to Transformers, so applying them to robotics is a logical extension — though benchmark results on LIBERO, a controlled simulation suite, still leave open how well any of this holds up on physical hardware in messier real-world settings.