Robotët po bëhen më të zgjuar, çfarë ka ndryshuar?

2024-07-28 09:15:00, Tech CNA

Illustrative photo

Artificial Intelligence is not just about ChatGPT, writes The Economist

"What do you see now?" a man asks a humanoid robot standing behind a desk. "I see a red apple on a plate in the center of the table, a tray of glasses and an empty plate, and you standing nearby with your hand on the table," the robot replies.

"Excellent. Can I have something to eat?” says the man. "Of course", replies the robot.

He bends his arms, picks up the apple with his right hand, passes it to his left hand, and then hands it to the man. He asks the robot to explain its actions.

"I gave you the apple because it is the only edible item I can offer you from the table," the robot replies.

This video, published in March by the robotic startup company Figure, caused great surprise. It shows a robot that can talk, recognize and move objects, solve problems and explain its actions.

The Figure firm is not the only one that has produced such impressive results. After several years of slow progress, it suddenly seems that robots are getting a lot smarter.

What changed?

The magic ingredient is Artificial Intelligence. Academic researchers, startup companies, and tech giants are making advances in the field of Artificial Intelligence, such as with large language models (LLM), speech synthesis, and image recognition, then applying them to robotics.

Great language patterns can power chatbots like the ChatGPT app, but they can also power real bots. "Algorithms can transform," says Peter Chen, chief executive of startup firm Covariant, based in Emeryville, California. "This is powering the renaissance of robotics."

The robot in the video had speech and recognition capabilities powered by artificial intelligence firm OpenAI, which is an investor in Figure. OpenAI shut down its robotics unit around 2020, preferring to invest in Figure and other startups.

But now OpenAI has changed its mind, and last month started building a new robotics team. This is a sign that feelings are beginning to change.

A key step towards the implementation of Artificial Intelligence in robots was the development of "multimodal" models. These are Artificial Intelligence models trained on different types of data.

For example, a language model is trained using a lot of text, while "vision language models" are trained using also combinations of images (still or moving) in harmony with the corresponding textual descriptions.

Thus, such models learn the relationship between the two, allowing them to answer questions about what is happening in a photo or video, or create new images based on text prompts.

VLAM robotic models

New models used in robotics take this idea a step further. These "vision-language-action models" (VLAMs) receive text and images, plus data about the robot's presence in the physical world, including data on internal sensors, rotation rates of various joints, and positions ( like the fingers of a robot).

Consequently, these models can answer questions about a particular scene, for example, the question "can you see an apple?" But they can also predict how a robotic arm should move to pick up that apple, and how that will affect the surrounding environment.

In other words, a VLAM model can act as a "brain" for robots with all kinds of bodies, whether giant stationary arms in factories or warehouses, or mobile robots with legs or wheels.

And unlike large language models, which use only text, VLAM models must fit together several different data from the outside world, whether in the form of text, images or sensor readings. In this way, the perception of the real world greatly reduces the chances of hallucinations (which is the tendency of large language patterns to make things up or make mistakes).

Covariant, Dr. Chen's company, has created a model called RFM-1, which has been trained using text, images and data from more than 30 types of robots.

Its software is also used by automated robots in warehouses and distribution centers located in suburban areas where land is cheap but labor is scarce. The company Covariant does not manufacture any of the robotic equipment, but software is used to improve the robots' "brains".

"We predict that the intelligence of the robots will improve as the speed of the software increases, because we have enabled much more data from which the robot can learn," says Dr. Chen.

Using these new models to control robots has several advantages over previous approaches, says Marc Tuscher, co-founder of Stuttgart-based robotics startup Sereact.

One of the benefits is "zero-shot" learning, which in tech jargon means the ability to do something new, such as follow the command to "pick up the yellow fruit," without being trained to do it. clear to do so.

The multimodal nature of VLAM models has made robots much more capable of perceiving the outside world, such as knowing that bananas are a type of fruit and that they are yellow.

Robotët po bëhen më të zgjuar, çfarë ka

Chat with robots

Another benefit of the new models is "learning in context," which in tech parlance means the ability to change a robot's behavior using textual prompts instead of elaborate reprogramming.

Dr. Tuscher gives the example of a warehouse robot that was programmed to sort packages, but became confused when open boxes were being placed incorrectly in the system. Previously, in order for the bot to ignore these boxes, it had to be reprogrammed.

"Whereas now it's enough to give it a text command to ignore open boxes and the robot only selects closed boxes," says Dr. Tuscher. "We can change the behavior of our robot just by giving it an order, which is crazy."

Robots can be programmed by lay people, using ordinary language, instead of computer code.

Such models can also respond. "When the robot makes a mistake, you can ask it and it responds in text form," says Dr. Chen. This is useful for debugging, because new instructions can then be given that change the robot's behavior, says Dr. Tuscher.

"You can tell the robot, 'this is wrong, please do it differently in the future.' This makes working with robots easier for those people who are not specialists.

Being able to ask a robot what it's doing and why is particularly useful in the field of self-driving cars, which are essentially a form of robot. Wayve, an autonomous vehicle startup based in London, has created a VLAM model called Lingo-2.

In addition to controlling the car, the model can understand text commands and explain the reasoning behind each decision.

"It can give explanations while driving and allow us to correct mistakes, give instructions to the system, or change its behavior to follow a certain style," says Alex Kendall, co-founder of Wayve.

For example, he says, the model can be asked what the speed limit is and what environmental cues (such as road signs) the car used. "We can control what kind of context the robot can understand and what it can see," he says.

As with other forms of Artificial Intelligence, access to large amounts of training data is of critical importance.

The Covariant firm, founded in 2017, has been collecting data for many years, which it used to train the RFM-1 robotic model. Robots can also be manually instructed to perform a certain task multiple times, and the model can then make generalizations from the resulting data. This process is known as "imitation learning".

But this is not the only possibility. A clever research project at Stanford University, called Mobile Aloha, generated data to teach a robot simple household tasks like making coffee using a process known as teleoperation, which basically means imitation.

The researchers stood behind the robot and moved its limbs directly, enabling it to sense, learn and then repeat a set of actions. They say this approach "allows humans to teach robots arbitrary skills."

Investors are showing interest. Chelsea Finn, the Stanford professor who oversaw the Mobile Aloha project, is also one of the co-founders of the firm Physical Intelligence, a startup that recently raised $70 million from backers including the firm OpenAI.

Robotics startup Skild, founded by Carnegie Mellon University, is believed to have raised $300 million in April. Figure, which focuses on humanoid robots, raised $675 million in February.

Wayve raised $1.05 billion in May, the largest funding round ever for a European AI startup.

Doctor Kendall from the company Wayve says that the growing interest in robots reflects the development of Artificial Intelligence, as progress in Artificial Intelligence software is increasingly applied to hardware that interacts with the real world.

"Artificial Intelligence is much more than chatbots," he says. "In a few decades, when people will think about Artificial Intelligence, the first thing that will come to mind will be robots that interact with people."

While software for robotic devices is improving, hardware is now becoming the limiting factor, researchers say, especially when it comes to humanoid robots. "But in terms of robot brains", says Dr. Chen, "We are making very rapid progress in intelligence"./ Monitor.al