[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-why-ai-still-flunks-turning-physics-papers-into-working-code":10,"sections":45},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":35,"tags":36,"sources":40,"feedback":44,"feedback_at":22,"cost_usd":44,"total_tokens":44},1606,"why-ai-still-flunks-turning-physics-papers-into-working-code","Why AI still flunks turning physics papers into working code","Forcing a paper's unwritten conventions into an explicit spec made quantum simulation code reproducible, but a weak model still flunked a strong one's spec.","Getting an AI to turn a physics paper into working code still fails the moment the paper leaves something unsaid.\n\nA revised preprint argues the hard part of AI-assisted scientific coding isn't the equations — it's the conventions papers never bother to print: index choices, gauge fixing, fermionic sign rules, contraction order, and the checks that tell you the answer is right. The authors call making these explicit \"knowledge externalization,\" and they write it all into a specification before any code is generated. As calibration they used DMRG — the density matrix renormalization group, a workhorse method for simulating one-dimensional quantum systems — drawn from a well-known pedagogical review. Spec-guided runs passed in all 16 model pairings, versus 6 of 13 for direct paper-to-code attempts, and a prose-only version worked just as well, showing the content mattered, not the LaTeX. The stress test was nastier: converting Hartree-Fock-Bogoliubov states into matrix product states, a compact representation of quantum states, from a five-page letter with no public implementation. There the workflow logged 11 of 26 audited passes; direct prompting got none.\n\nThis matters because computational physics runs on code that often never ships with the paper, leaving published results no one outside the group can rerun. Externalizing the unwritten rules makes a textbook algorithm reproducible and a research-grade one auditable, which is a real gain for a field where the working code is often the missing half of the proof. But the same experiment marks a hard ceiling.\n\nCross-model results were lopsided: GPT 5.5 implemented other models' specs 4 out of 4 times, while weaker models failed GPT 5.5's specs 4 out of 4. A better spec, in other words, can't rescue a weaker coder. The takeaway is almost mundane — write down what you know — paired with a less comfortable one: prompting discipline has limits, and past them you simply need a more capable model.","[\"ai\",\"scientific-computing\",\"code-generation\",\"reproducibility\"]","2026-06-18T04:00:00.000Z","2026-06-19T05:38:09.399Z","2026-06-19T05:38:12.397Z","published",null,[24,30],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"publisher-r1","publisher",1,"The DMRG comparison cites mismatched denominators for what reads as the same set of model pairings (spec-guided 'all 16' vs direct prompting '6 of 13'), an internal numerical inconsistency that must be reconciled before publishing.","resolved",{"id":31,"reviewer":32,"round":33,"reason":34,"status":29},"editor-r2","editor",2,"Strong, accurate draft but it runs only ~250 words, under the 300-word minimum — expand it past the threshold with genuine context (briefly explain what DMRG and HFB-to-MPS conversion are and why reproducible physics code matters) rather than padding.","ai",[35,37,38,39],"scientific-computing","code-generation","reproducibility",[41],{"name":42,"url":43},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.04089",0,{"sections":46},[47,51,56,61,66,71,76,81,86,90,95,100,105,110],{"name":48,"slug":35,"count":49,"latest_published_at":50},"AI",385,"2026-06-19T04:00:00.000Z",{"name":52,"slug":53,"count":54,"latest_published_at":55},"Security","security",129,"2026-06-19T00:05:00.000Z",{"name":57,"slug":58,"count":59,"latest_published_at":60},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":62,"slug":63,"count":64,"latest_published_at":65},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":67,"slug":68,"count":69,"latest_published_at":70},"Hardware","hardware",62,"2026-06-18T15:24:16.000Z",{"name":72,"slug":73,"count":74,"latest_published_at":75},"Software","software",58,"2026-06-16T20:00:00.000Z",{"name":77,"slug":78,"count":79,"latest_published_at":80},"Deals","deals",54,"2026-06-16T15:26:40.000Z",{"name":82,"slug":83,"count":84,"latest_published_at":85},"Dev Tools","dev-tools",49,"2026-06-16T04:00:00.000Z",{"name":87,"slug":88,"count":89,"latest_published_at":18},"Science","science",38,{"name":91,"slug":92,"count":93,"latest_published_at":94},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":96,"slug":97,"count":98,"latest_published_at":99},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":101,"slug":102,"count":103,"latest_published_at":104},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":106,"slug":107,"count":108,"latest_published_at":109},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":111,"slug":112,"count":113,"latest_published_at":114},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]