[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-fomoe-trains-huge-models-across-data-centers-without-full-copies":10,"sections":41},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":30,"tags":31,"sources":36,"feedback":40,"feedback_at":22,"cost_usd":40,"total_tokens":40},1653,"fomoe-trains-huge-models-across-data-centers-without-full-copies","FoMoE Trains Huge Models Across Data Centers Without Full Copies","A new system called FoMoE splits expert layers across sites, cutting communication costs and memory demands for distributed LLM pre-training.","Researchers say they have found a way to train massive language models across geographically separated data centers without keeping a full copy of the model at every site.\n\nPre-training large language models typically requires tightly coupled, high-speed hardware inside a single data center. Mixture-of-Experts architectures eased the compute burden by decoupling parameter count from active computation, but they still hit a wall: existing distributed approaches like DiLoCo and Photon need a complete model replica at each node, which turns memory and bandwidth into hard limits. FoMoE breaks that constraint by partitioning expert layers across workers rather than duplicating them everywhere. The paper reports up to a 1.42x reduction in communication costs over those efficient baselines, and up to 45.44x over standard distributed data-parallel training — a figure that reflects a much lower bar, since DDP does not apply any of the same optimizations.\n\nThe practical stakes are real: whoever can train frontier models on loosely connected, commodity infrastructure has a large cost advantage over labs that depend on dense GPU clusters. FoMoE also claims up to 1.4x throughput gains via a skip-token mechanism, and the authors project the memory and communication benefits to 100-billion-parameter scale through system modeling — though those larger numbers are projections, not measured results.\n\nFoMoE follows a line of work — DiLoCo from Google DeepMind, Photon shortly after — that treats geographic distribution as a first-class training constraint rather than an afterthought. If the 100B projections hold up in practice, the gap between what a well-funded lab and a well-organized coalition of smaller operators can train may narrow considerably.","[\"machine learning\",\"distributed training\",\"mixture of experts\",\"llm\"]","2026-06-18T04:00:00.000Z","2026-06-19T09:13:38.331Z","2026-06-19T09:13:39.922Z","published",null,[24],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"The article misrepresents the 1.42x communication reduction as the headline figure against 'efficient baselines' — the 45.44x is vs. standard DDP, not vs. DiLoCo\u002FPhoton, and the draft implies the latter; clarify the correct baseline for each figure to avoid misleading readers.","resolved","ai",[32,33,34,35],"machine learning","distributed training","mixture of experts","llm",[37],{"name":38,"url":39},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.19025",0,{"sections":42},[43,47,51,56,61,66,71,76,80,84,89,94,99,104],{"name":44,"slug":30,"count":45,"latest_published_at":46},"AI",490,"2026-06-19T04:00:00.000Z",{"name":48,"slug":49,"count":50,"latest_published_at":46},"Security","security",132,{"name":52,"slug":53,"count":54,"latest_published_at":55},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":57,"slug":58,"count":59,"latest_published_at":60},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":62,"slug":63,"count":64,"latest_published_at":65},"Hardware","hardware",62,"2026-06-18T15:24:16.000Z",{"name":67,"slug":68,"count":69,"latest_published_at":70},"Software","software",58,"2026-06-16T20:00:00.000Z",{"name":72,"slug":73,"count":74,"latest_published_at":75},"Deals","deals",56,"2026-06-19T12:30:04.000Z",{"name":77,"slug":78,"count":79,"latest_published_at":46},"Dev Tools","dev-tools",50,{"name":81,"slug":82,"count":83,"latest_published_at":18},"Science","science",38,{"name":85,"slug":86,"count":87,"latest_published_at":88},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":90,"slug":91,"count":92,"latest_published_at":93},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":95,"slug":96,"count":97,"latest_published_at":98},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":100,"slug":101,"count":102,"latest_published_at":103},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":105,"slug":106,"count":107,"latest_published_at":108},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]