[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"branding":3,"analytics":7,"article-rlhf-sharpens-pepper-robots-co-speech-gestures":10,"sections":47},{"siteName":4,"siteTagline":5,"publisherName":4,"contactEmail":6},"The Revision","Tech news, decoded.","editor@therevision.news",{"gaMeasurementId":8,"adsenseClientId":9},"G-ZW2MV82GYR","ca-pub-8533917693782264",{"article":11},{"id":12,"slug":13,"title":14,"dek":15,"body_md":16,"tags_json":17,"published_at":18,"created_at":19,"updated_at":20,"status":21,"review_note":22,"review_notes":23,"image_url":22,"persona_id":22,"persona_name":22,"section":38,"tags":39,"sources":42,"feedback":46,"feedback_at":22,"cost_usd":46,"total_tokens":46},1558,"rlhf-sharpens-pepper-robots-co-speech-gestures","RLHF sharpens Pepper robot’s co-speech gestures","Iterative reinforcement learning with human feedback makes the Pepper robot’s generated gestures noticeably more natural and expressive.","# New system teaches Pepper to gesture like a person\n\nResearchers integrated ChatGPT with the Pepper humanoid to produce on‑the‑fly co‑speech gestures, then refined the output through an iterative reinforcement‑learning‑with‑human‑feedback loop. The baseline system could translate spoken text into motion code, but the resulting arm swings and hand waves looked robotic. Over a series of user studies, participants rated each gesture on naturalness, relevance, and fluidity. Those ratings fed back into the RL algorithm, which adjusted the motion parameters and prompted ChatGPT to generate revised code. After several rounds, the system consistently outperformed the original baseline across all three metrics.\n\nWhy it matters: Current robot gesture pipelines rely on hand‑crafted animation libraries that are costly to expand and brittle in unfamiliar settings. By leveraging a large language model for code generation and closing the loop with real‑world human judgments, the team demonstrates a scalable path to adaptable, socially aware motion. The approach also sidesteps the classic trade‑off where more degrees of freedom make learning harder; the LLM supplies plausible motion skeletons that the RL step polishes rather than learning raw joint trajectories from scratch. If the method generalises, manufacturers could outfit service robots, museum guides, or home assistants with gesture repertoires that evolve after deployment, improving acceptance without endless manual tweaking.\n\nThe study is a modest but concrete step toward robots that converse with bodies as well as voices. It shows that combining language‑model code synthesis with human‑in‑the‑loop reinforcement can bridge the gap between flexibility and naturalness—a gap that has limited social robots to scripted demos. Future work will need to test the pipeline on robots with different kinematics and in noisier, multi‑user environments, but the current results suggest the concept is viable beyond Pepper’s limited platform.\n\nIn short, the RLHF‑enhanced system turns a generic LLM into a gesture‑crafting co‑pilot, producing motions that users actually perceive as human‑like. That may be the quiet catalyst that nudges social robots from novelty acts to everyday companions.","[\"ai\",\"human-robot-interaction\",\"robotics\"]","2026-06-18T04:00:00.000Z","2026-06-19T00:55:21.401Z","2026-06-19T00:55:24.242Z","published",null,[24,30,34],{"id":25,"reviewer":26,"round":27,"reason":28,"status":29},"editor-r1","editor",1,"The draft is under 300 words, lacks a clear introductory paragraph and concluding summary, and reads like a list of facts without enough narrative flow.","resolved",{"id":31,"reviewer":26,"round":32,"reason":33,"status":29},"editor-r2",2,"The article is under 300 words, lacks a distinct concluding paragraph, and reads more like a list of findings than a cohesive narrative; expand it with a clear intro, smoother flow, and a summary that ties the significance together.",{"id":35,"reviewer":26,"round":36,"reason":37,"status":29},"editor-r3",3,"Expand the piece to at least 300 words, add a clear introductory paragraph that sets context and why it matters, and include a concluding paragraph that summarizes the findings and their significance.","ai",[38,40,41],"human-robot-interaction","robotics",[43],{"name":44,"url":45},"arXiv cs.AI","https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.18747",0,{"sections":48},[49,52,57,62,67,72,77,82,87,91,96,101,106,111],{"name":50,"slug":38,"count":51,"latest_published_at":18},"AI",303,{"name":53,"slug":54,"count":55,"latest_published_at":56},"Security","security",123,"2026-06-17T19:54:31.000Z",{"name":58,"slug":59,"count":60,"latest_published_at":61},"Policy","policy",88,"2026-06-16T09:26:09.000Z",{"name":63,"slug":64,"count":65,"latest_published_at":66},"Consumer Tech","consumer-tech",78,"2026-06-16T17:58:24.000Z",{"name":68,"slug":69,"count":70,"latest_published_at":71},"Hardware","hardware",61,"2026-06-18T15:24:16.000Z",{"name":73,"slug":74,"count":75,"latest_published_at":76},"Software","software",58,"2026-06-16T20:00:00.000Z",{"name":78,"slug":79,"count":80,"latest_published_at":81},"Deals","deals",54,"2026-06-16T15:26:40.000Z",{"name":83,"slug":84,"count":85,"latest_published_at":86},"Dev Tools","dev-tools",49,"2026-06-16T04:00:00.000Z",{"name":88,"slug":89,"count":90,"latest_published_at":18},"Science","science",38,{"name":92,"slug":93,"count":94,"latest_published_at":95},"Gaming","gaming",31,"2026-06-16T15:25:13.000Z",{"name":97,"slug":98,"count":99,"latest_published_at":100},"General","general",26,"2026-06-13T18:35:15.000Z",{"name":102,"slug":103,"count":104,"latest_published_at":105},"Startups","startups",23,"2026-06-16T15:00:00.000Z",{"name":107,"slug":108,"count":109,"latest_published_at":110},"Reviews","reviews",19,"2026-06-14T08:00:00.000Z",{"name":112,"slug":113,"count":114,"latest_published_at":115},"How-To","how-to",6,"2026-06-16T09:00:00.000Z"]