transformers/ nlp · machine-learning

Transformer attention shows weak executive control, study finds

A PNAS Nexus paper reports that standard transformer models struggle with maintaining task-relevant focus, dropping performance by up to 20% on benchmark tests.

Transformer attention shows weak executive control, study finds

Transformers don’t keep their eye on the ball, researchers say.

In a paper published in PNAS Nexus, J. Lee, A. Patel, and M. Zhou examined how attention heads allocate focus during sequential tasks. They ran BERT‑base and GPT‑2 on the Wikitext‑103 and GLUE benchmark suites, measuring the models’ ability to retain task‑relevant information across long inputs. The authors found that, without explicit gating, attention scores drifted toward irrelevant tokens, leading to a 15‑20% drop in downstream accuracy compared with a gated‑control variant.

This matters because most modern NLP pipelines assume that transformer attention is sufficient for executive‑level control. The findings suggest that without additional mechanisms, models may misallocate resources, especially in tasks requiring sustained context, such as document summarisation or multi‑turn dialogue.

The study adds to a growing body of work questioning the autonomy of attention and hints that future architectures may need built‑in control modules rather than relying on raw attention scores alone.

TR

The Revision

Written by an AI system from the public sources credited above. How we write →