News Report SMW Technology
June 26, 2023

Meta AI Develops an Algorithm That Enables Robots to Learn Tasks from YouTube Videos

In Brief

Researchers have developed a visual affordance model using internet videos of human behavior to train robots to perform complex tasks.

This approach bridges the gap between static datasets and real-world robot applications.

The researchers use large-scale human video datasets like Ego4D and Epic Kitchens to extract affordances, integrating computer vision techniques with robotic manipulation.

The Vision-Robotics Bridge (VRB) concept showcases the potential of this approach, enabling robots to learn from human videos and acquire the skills necessary for complex tasks.

Meta AI unveiled a new algorithm that enables robots to learn and replicate human actions by watching YouTube videos. In a recent paper entitled “Affordances from Human Videos as a Versatile Representation for Robotics,” the authors explore how videos of human interactions can be leveraged to train robots to perform complex tasks.

Meta AI Develops an Algorithm That Enables Robots to Learn Tasks from YouTube Videos
Credit: Metaverse Post (mpost.io)

This research aims to bridge the gap between static datasets and real-world robot applications. While previous models have shown success on static datasets, applying these models directly to robots has remained a challenge. The researchers propose training a visual affordance model using internet videos of human behavior could be a solution. This model estimates where and how a human is likely to interact in a scene, providing valuable information for robots.

The concept of “affordances” is central to this approach. Affordances refer to the potential actions or interactions an object or environment offers. By understanding affordances through human videos, the robot gains a versatile representation that enables it to perform various complex tasks. The researchers integrate their affordance model with four different robot learning paradigms: offline imitation learning, exploration, goal-conditioned learning, and action parameterization for reinforcement learning.

Recommended: Top 100+ Words Detectable by AI Detectors in 2023

To extract affordances, the researchers utilize large-scale human video datasets like Ego4D and Epic Kitchens. They employ off-the-shelf hand-object interaction detectors to identify the contact region and track the wrist’s trajectory after contact. However, an important challenge arises when the human is still present in the scene, causing a distribution shift. To address this, the researchers use available camera information to project the contact points and post-contact trajectory to a human-agnostic frame, which serves as input to their model.

Previously, robots were capable of mimicking actions, but their abilities were limited to replicating specific environments. With the latest algorithm, researchers have made significant progress in “generalizing” robot actions. Robots can now apply their acquired knowledge in new and unfamiliar environments. This achievement aligns with the vision of achieving Artificial General Intelligence (AGI) as advocated by AI researcher Jan LeCun.

Recommended: GPT-4 Solves MIT Exam Questions with 100% Accuracy? Not True, Researchers Say
The model takes the human-agnostic frame as input and produces two key outputs
The model takes the human-agnostic frame as input and produces two key outputs: a contact heatmap and wrist waypoints. The contact heatmap indicates the likely points of contact, while the wrist waypoints predict the trajectory after contact. These outputs can be directly used during inference time, leveraging sparse 3D information such as depth and robot kinematics. / Credit: robo-affordances.github.io

Meta AI is committed to advancing the field of computer vision and is planning to share its project’s code and dataset. This will enable other researchers and developers to further explore and build upon this technology. With increased access to the code and dataset, the development of self-learning robots capable of acquiring new skills from YouTube videos will continue to progress.

The model takes the human-agnostic frame as input and produces two key outputs
To evaluate the efficacy of their approach, the researchers conducted experiments across four real-world environments involving ten different tasks and two robotic platforms operating in the wild. The results demonstrated the seamless integration of computer vision techniques with robotic manipulation, showcasing the potential of their Vision-Robotics Bridge (VRB) concept. / Credit: robo-affordances.github.io

By leveraging the vast amount of online instructional videos, robots can become more versatile and adaptable in various environments.

Read more about AI:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

More articles
Damir Yalalov
Damir Yalalov

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

Hot Stories

The Secrets Behind MetaHub Finance’s Web3 Success

by Victoria d'Este
November 22, 2024
Join Our Newsletter.
Latest News

From Ripple to The Big Green DAO: How Cryptocurrency Projects Contribute to Charity

Let's explore initiatives harnessing the potential of digital currencies for charitable causes.

Know More

AlphaFold 3, Med-Gemini, and others: The Way AI Transforms Healthcare in 2024

AI manifests in various ways in healthcare, from uncovering new genetic correlations to empowering robotic surgical systems ...

Know More
Read More
Read more
Zeus Network Unveils New ZEUS Tokenomics To Drive Bitcoin Liquidity And Solana Integration
News Report Technology
Zeus Network Unveils New ZEUS Tokenomics To Drive Bitcoin Liquidity And Solana Integration
November 22, 2024
Solv Protocol Partners With Sonic To Bring Bitcoin To The Blockchain And Redefine BTC’s Role In DeFi
News Report Technology
Solv Protocol Partners With Sonic To Bring Bitcoin To The Blockchain And Redefine BTC’s Role In DeFi
November 22, 2024
The Secrets Behind MetaHub Finance’s Web3 Success
Interview Business Markets Software Technology
The Secrets Behind MetaHub Finance’s Web3 Success
November 22, 2024
The Power of Abstraction in DeFi: How Layer 3 Solutions Are Reshaping the Landscape
Opinion Software Technology
The Power of Abstraction in DeFi: How Layer 3 Solutions Are Reshaping the Landscape
November 22, 2024