Robots aren’t competing with AI, they’re assimilating it to get a brain

So far, we have experienced AI only as a software. What can we expect when it meets physical technology as well?

Around six months ago, the Co-founder, president and CEO of Nvidia Corporation, Huang Jensen said that the next big reinvention is most likely where AI and the physical world converge.

The next generation of AI, where AI meets $100 trillion of the world’s industry would be in the physical world
Huang Jensen, Co-founder, president and CEO of Nvidia Corporation

“The next generation of AI, where AI meets $100 trillion of the world’s industry would be in the physical world,” he said, pointing at transportation, robotics surgery, warehouse management, manufacturing, energy, and fabrication plants.

Recently, NVIDIA announced plans to grow its Jetson platform for edge AI and robotics, thereby integrating generative AI into the real world. According to the media, the tech giant’s vision of bringing AI into the physical world is coming to fruition faster than expected.

“The next wave of AI is here. Robotics, powered by physical AI, will revolutionize industries,” Huang recently said at the Computex 2024 in Taipei.

So far, robots haven’t really managed to impress us yet. The main reason being they can’t do housework for us. I feel that only if/when robots enter our households will we show it the same warmth as GenAI.

It’s hard for robots to work reliably in homes around children and pets, because our homes have varying floorplans, and contain all sorts of messes. There are too many variations to learn.

MIT Technology Review’s Melissa Heikkilä points out the Moravec’s paradox, a popular observation among roboticists that says “what is hard for humans is easy for machines, and what is easy for humans is hard for machines.” But she adds that things are changing, and the reason is AI.

“Robots are starting to become capable of doing tasks such as folding laundry, cooking and unloading shopping baskets, which not too long ago were seen as almost impossible tasks,” she says. That’s good news for those of us stuck with these tedious house tasks.

AI is indeed helping build the brain part of robots. Because of AI there is a shift from aiming at physical dexterity to “building “general-purpose robot brains” in the form of neural networks. This means the robot will learn from its environment like GenAI.

For example, last summer, Google launched a vision-language action model called RT-2. This model learns from online text and images it’s trained on and its own interactions. It then translates that data into robotic actions.

Tesla’s Optimus, which could only wave at the audience last year, can now pick up and sort objects, do yoga, and navigate through surroundings. Its learning from AI is astounding. The aim for the Tesla bot is a “bi-pedal, autonomous humanoid robot capable of performing unsafe, repetitive or boring tasks.”

In August last year, Carnegie Mellon and Meta released a robotics dataset that teaches an AI agent like an infant. The agent learns by watching, imitating, and replaying, teaching itself toddler-level manipulation skills. One researcher likened the development to human baby learning. They say it’s the largest publicly accessible robotics dataset on standard hardware.

Researchers at Oregon State University taught a humanoid robot called Digit V3 to stand, walk, lift a box, and move it from one place to another. Another group of researchers from the University of California, Berkeley, taught Digit to walk in unfamiliar environments while carrying different loads, without toppling over. Both groups used an AI technique called sim-to-real reinforcement learning, a method that trains two-legged robots like Digit. Researchers say this method will result in a stronger, more reliable two-legged machines that can interact with their surroundings more safely, while learning quickly.

Last year, the Toyota Research Institute (TRI) announced a generative AI approach based on Diffusion Policy to efficiently teach robots new, dexterous skills. This development substantially upgrades robot utility and leans toward building “Large Behavior Models (LBMs)” for robots, corresponding to the Large Language Models (LLMs) that have recently revolutionized conversational AI.

With this new approach, TRI has already taught robots over 60 difficult, dexterous skills, which includes pouring liquids, using tools, and manipulating objects. The amazing part, not a single line of code was written. TRI aims to teach 1,000 new skills by the end of 2024.

Covariant, a robotics startup has also built a multimodal model called RFM-1 that uses GenAI to act on prompts in the form of text, image, video, robot instructions, or measurements.

Last year, Northwestern University researchers developed an AI that rapidly designs functioning robots, compressing evolution’s billions of years into seconds. Running on a lightweight PC, it generates novel designs without human bias or labeled data. The lead researcher previously made headlines for developing xenobots, the first living robots made entirely from biological cells.

As robots enter the Gen AI era of LLMs and LLBs, they’re likely to wow us very soon with unprecedented abilities. The day might not be far when a C3PO or an R2D2 butler assists us in cooking or laundry folding. Not many will complaint if that becomes a reality. I know I won’t.

But while that paints a rosy picture of the future, it also opens the floodgates for the ethical implications involved for issues like privacy, data security, and job displacement.