Can machines learn to walk and act in the real world in a similar way to how children do it? Starting from scratch, watching, copying, trying, failing, retrying? Learning to move is not as easy as learning to think – but the robots are getting closer.
MYON WOULD BE A 6TH GRADER BY NOW. It was built in 2011 by Manfred Hild, a professor of neurorobotics in Berlin.1 A humanoid-looking robot, Myon was specifically designed for – nothing. It had no special skills, but with 200 sensors, 50 motors and lots of limbs and joints, the machine was supposed to develop itself through observation of its environment.
“Myon has no goal,” said its creator Hild. “But we do have a goal: to understand things.” In this case, to understand learning. “Intelligence does not come overnight. Not in children, and not in robots either. How long does it take for children to stand, to walk, to talk?”
So, instead of a program, the new robot was simply equipped with some rules to follow. For example, follow conspicuous signals. Or, once you’ve decided on something, stick with it for a while. And it was given a rather childlike design, with one overdimensioned eye in its head. The cuddle factor was supposed to lower the communication barrier and increase the patience of interacting humans. Look, it’s a child; it’s still learning.
Its moment of glory came in summer of 2015: Myon became an actor at the opera. At the Komische Oper Berlin, the Robaby played a leading role in the show “My Square Lady.” A combination of video recordings and stage action allowed the audience to participate in the project and witness Myon’s first successes, as well as its setbacks.
Mainly the latter. The “follow the signal” rule, for example, caused Myon to turn its head to the loudspeakers, not to the singer. We know that a soprano’s voice belongs to the person currently moving on stage, but the robot attributed it to the place the sound was coming from.
And Myon’s performance hasn’t improved much since then, as Benjamin Panreck, one of its co-creators, admits.4 The robot needs intensive training to learn specific actions, and shows little sign of being able to transfer a once-learned lesson into a slightly altered situation. Today Myon is often used to teach students at professor Hild’s neurorobotics lab how to train robots, but its own learning curve remains shallow at best. ROBOTS CAN LEARN THROUGH INCENTIVES But wait a minute, don’t we live in the Age of Machine Learning? Yes, we do. It started 25 years ago, when Jürgen Schmidhuber and Sepp Hochreiter published their groundbreaking work on Long Short-Term Memory (LSTM; see box page 3).5 They found a way for neuronal networks to “forget” certain outcomes and focus on the right ones. With sufficient training data, these recurrent neural networks can produce astonishing results for tasks like speech recognition or machine translation. And they can do it almost autonomously, with relatively little human training effort involved. Autonomously learning neural networks have become quite common. Autonomous robots, such as self-driving cars, are already a familiar concept. But autonomously learning robots remain only an aspiration. Robots that learn how to move and to act in the real world still rely heavily on human intervention.
So how do robots learn? The main robot education principle is one they share with children and babies: learning through incentives. The robot first behaves in random ways and then evaluates how these behaviors have worked. That can be done via feedback from the instructor, who tells the robot whether its actions were effective or not. The robot chooses the behavior that offers it the highest reward, and then turns to the next iteration. It applies a number of random variations to the chosen behavior and determines by trial and error which of the new behaviors is now the most successful, and so on.
This method is called “reinforced learning,”7 and as long as you stay in the pure world of data, it’s not that different from the “forget gate” designed by Schmidhuber and Hochreiter. But robots by definition come in touch with a physical world beyond pure data, and then things become different – and more complex. Take a robot that is learning to walk. In many cases the outcome is falling, usually a non-desired outcome. And then the robot should not simply forget the trial, but remember it so as not to do it again – like a child that touches a hot plate for the first time. So robo-learning through incentives can go both ways, reward and punishment. And speaking of falling, that’s costly. Every time the robot falls down or walks out of its training environment, it needs someone to pick it up and set it back on track. That’s a lot of manpower, because robots need a lot of training. And it also poses a risk of damaging the robot. You may try to construct a robust robot, but to be able to walk, it also has to be flexible, with a lot of moving parts – joints and motors and sensors. And even if the damage risk is low for a single event, robots learning how to walk will produce a lot of such events during their training.