- A recent bug caused Linux's default CUBIC congestion controller to keep its congestion window at the minimum size after a short loss burst, stalling QUIC transfers.
- The issue surfaced in Cloudflare's quiche library during a test that injects 30% packet loss for the first two seconds of a 10 MB download. In about 60% of runs, the download never finished within the 10‑second timeout because cwnd stayed at 2700 bytes and the controller flipped between recovery and avoidance every RTT. The same test with Reno passed every time, confirming a CUBIC‑specific flaw.
- The root cause traced back to a 2017 Linux kernel optimization that reset the CUBIC epoch only on loss events. When an application resumes after being idle, the elapsed time makes the epoch delta huge, which the controller interprets as a need to inflate cwnd dramatically. A later fix that simply reset the epoch on resume introduced a new problem: CUBIC would repeatedly think it was in recovery, locking cwnd at the floor. The final patch adds a tiny guard that prevents the premature recovery transition, letting cwnd grow normally once loss stops.
- The patch is tiny—essentially one line—but it resolves a corner‑case that could affect any QUIC deployment using CUBIC, including major CDNs and browsers. Cloudflare has upstreamed the change, and it is slated for inclusion in the next Linux mainline release, meaning the fix will propagate to all kernels that ship with the default CUBIC implementation.
- While the bug only appears under heavy early loss, its existence highlights how kernel‑level tweaks can ripple into higher‑level protocols. With the fix merged, QUIC traffic should regain its expected recovery speed without needing a custom congestion controller.
