Prompt an AI the right way and it cooperates; leave it to figure things out and it doesn't.
Researchers set out to resolve a contradiction in two 2026 studies that reached opposite conclusions about whether large vision-language models can coordinate on efficient referring expressions — the kind of shorthand humans develop naturally when pointing things out to each other. By controlling for task differences and directly comparing prompting styles, the new paper finds both camps were right in a narrow sense: models do coordinate efficiently when the prompt explicitly tells them to, but the same models fail when given a more implicit prompt that a human would interpret without difficulty. The contradiction in the earlier work, it turns out, came down to how the question was asked.
The finding cuts against a common assumption that scaling and instruction-tuning have made modern AI systems genuinely communicative partners. There is a meaningful gap between a model that can execute an instruction and one that reads context the way a person does. For developers building collaborative or conversational AI tools, this is a reminder that prompt design is not a polish step — it is load-bearing.
Humans learn communicative efficiency implicitly, through shared social context, not explicit rules. That AI systems still need the rule spelled out suggests the gap between language fluency and genuine pragmatic understanding remains wider than benchmark scores tend to advertise.