Contact Us

How Deep Learning and Simulated Worlds are Leading to the Rise in Humanoid Robotics | EP.6

In episode 6 of Hidden Layers, Ron interviews DeUmbra's Principal Scientist, Dr. Jonathan Mugan. Dr. Mugan sheds light on how robots are learning situational awareness, common sense, and the ability to adapt to never-before-seen situations. Learn about the latest advancements in humanoid robotics and the potential future where robots seamlessly integrate into our daily lives. This episode is a must-listen for anyone fascinated by the convergence of AI, robotics, and our future society.

Ron Green: Welcome to Hidden Layers, where we explore the people and the tech behind artificial intelligence. I'm your host Ron Green, and I'm excited to be joined today by Dr. Jonathan Mugan. We're gonna discuss some of the significant advances made in robotics recently. Dr. Mugan is a research scientist at DeUmbra, an artificial intelligence company based in Austin, Texas. He earned his Ph.D from the University of Texas at Austin, where his thesis focused on how a robot can awaken in a world and learn about its surroundings in a manner similar to that of a human child. His research primarily centers on empowering robots to independently construct models from the surroundings. Dr. Mugan is also the author of The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion, a book for parents to help their children build curiosity by building models of their environment. Well, welcome, Jonathan. I'm excited to have you on board today.

Dr. Jonathan Mugan: Thanks for having me.

Ron Green: So we're going to talk about robotics, and this is a field that I actually am really unfamiliar with. I've never done anything professionally there. Maybe kick off a little bit with the background. What is the current state of robotics and how has it shifted pretty dramatically recently?

Dr. Jonathan Mugan: Yeah, so for decades, we've been trying to hard code what a robot should do and build control laws for how it should walk and not fall over and pick up things. And that's been slow. Obviously, we know that robots have been mostly in industrial settings where they do the same thing over and over again. And we've had a hard time putting in the subconscious intelligence they need to be able to act just like we do without ever thinking about it. But there's been a big shift recently, as we've seen in the last 10 years with natural language processing and image processing and video processing. All these deep learning neural network type stuff are now trying, are starting to make their way into the robotics world. And we're now starting to see some significant progress in robotics.

Ron Green: So robotics, it sounds like yet another area. You mentioned computer vision and natural language processing where these deep learning techniques are starting to dominate. Is the deep learning technique within robotics, what part of the process is involved in? Is it the core modeling software that drives the robot? Or is it also involved in the training? Maybe build a foundation for us here.

Dr. Jonathan Mugan: Yeah, so 20 years ago, I used to work with a professor, and he joked that robotics was a silly thing to be in, because all you had to do is have it pick up a pen, and you could earn a PhD. And the reason that was so interesting is because for us, we don't even have to think about that. But for a robot, you had to program in all of these different joints, and we don't even realize how hard it is. And you got to pick which grip you're going to use, and your hand signal, and then your hand shape, and then you have to not crush the thing when you pick it up. Like, all these things have to happen, and it's actually really hard. And so we try to write differential equations to get these things to work. And it turns out, if you just let a neural network try hundreds of thousands of times, eventually, it can work these things out. And the reason it took us so long to get these neural network things into robotics is because it's not easy to get training data. So obviously, like with ChatGPT, it just reads the internet, and it has everything we've ever said as a society, or watches YouTube and everything. And there's just so much training data. But for robotics, there really hasn't been that much training data. And so that's really slowed things down. But what we're seeing recently is people are starting to use more simulation to train robots. And so you can set up a simulated world, and then have the robot go over and over again to pick up that pen. And you can have the 1 ,000 robots training at the same time, all trying to pick up pens. And eventually, then it eventually learns to pick up a pen.

Ron Green: I want to double back on the training a little bit. So in these simulated environments, where do you think that's headed over maybe the next five years?

Dr. Jonathan Mugan: Yeah, so what we're seeing is, if you look at the way video games progress, they're getting increasingly lifelike. And people are starting to build, like Nvidia has Isaac Simp, people are starting to build simulators that are used video game technologies to raise a robot in. So you could have a robot live in a living room, tell it to move chairs around, pick up objects. And you can have it have this varied experience that is the kind of experience human children have growing up. We don't have mothers in there or parents or fathers or caregivers, but. And so you can give it this wide range of experience. And that allows us to go past kind of what we've been talking about so far, which has mostly been about skills. I keep doing this to mimic the pick up the. So there's much more that robots need to know. They need to know situational awareness. They need to know that people like waters and cups. They don't like it when the water is knocked over. You know, they need to know all these things. They need to know where do the dishes go when you clean up the house and all the stuff. And so to teach them all the stuff, people are building simulated homes with simulated stuff in it.

Ron Green: And all right, that's just fascinating. I've never heard about this at all. So within these simulated environments, the graphics are real enough now that you can take the raw computer graphics as input, and it's close enough to the real world to simulate it. That's one critical piece. What are the robots are doing in these environments? Are they given specific tasks? Are they just kind of walking around and watching? How's the training work?

Dr. Jonathan Mugan: Yeah, so they're given tasks. One task is put things back. So it's supposed to go in and note where objects are.

Ron Green: Like put things back after they've been moved around in a random way. They saw it before and now they have to get it back to that before state. Okay.

Dr. Jonathan Mugan: And you can give a task like, you know, make a cup of coffee. We have to go to the drawer, get out the coffee, or like a shelf, put it in the thing, and move it over. And you can give tasks like, just go to this position. So go find the bathroom. And it goes to the bathroom. And you give a task like, follow this other robot and do what that other robot does, where this other robot is tele-operated. And so other than a tele-operating, the great thing is that these can all be automated scripts. So they can be run millions of times while we're sleeping.

Ron Green: Right. And then you can also alter different parts of the scene. You can put in obstacles, remove things that may have been there for a million training sessions, remove them. Now it has to deal with these new scenarios.

Dr. Jonathan Mugan: That's right. That's right.

Ron Green: Oh, just fascinating.

Dr. Jonathan Mugan: Yeah, it's really cool. We're trying to build in common sense. And it still has to interpolate between things it's seen before. It has to have seen the situation. And now we can build a whole bunch of simulations. But it has to have seen it somewhere along the way, and then where my research ties in is I'm trying to build these models so that it could, in a situation I've never seen before, reason from first principles if it needed to. And so we see this a little bit in self -driving cars. We initially thought if we gave it enough data, we'd get almost there. And we did. We're 99% of the way there. But it's not quite good enough. There's still situations where the robot just doesn't know what to do because this particular is the long tail problem.

Ron Green: Yeah, long tail just too many corner cases.

Dr. Jonathan Mugan: And it needs to build up either you need to do a whole bunch more simulation so the corner cases are increasingly rare, or you need to enable the robot to build its own model, actual model of the world where it's like, okay, I am a thing that takes up space. And if I'm in space, someone else can't be in that space. And if I drive over something that's bad. And that something is a thing that takes up space. And some of things things are really bad like children but some are less bad like cones, it needs to have that kind of level of understanding. But that's really hard. And one thing we can keep doing is giving it more and more training data. So we get push more and more out and then a long tail. So I could do more and more stuff. And in fact, the next level of this kind of AI generated simulation is the simulation is generated completely on the fly, not just what I was talking about before where they they have a living room and they can add in chairs and stuff. But now you can with this image creation technology, you can create a whole other image. So you can have like a car. And you can say, Okay, it's driving down the street. Now what would happen if you turn left? And in the simulation, as the thing turns left, all it sees is generated by the computer using this generative AI. And so it opens up even more of what you can what you can simulate.

Ron Green: Oh, that's fascinating because in that scenario, you're kind of unbounded by the environment that you may be able to simulate.

Dr. Jonathan Mugan: Right what the things that the environment creator could think of. And then you can just do almost anything and the robot then needs to do it. So if it comes up in a weird case, then you could generate all the different branches in that weird case and you wouldn't have to have thought about that weird case ahead of time.

Ron Green: So the use of simulation has really unlocked the use of these deep learning techniques. What is the underlying model within these deep learning models? Like what is the design? How is it is it transformer based? Is it using classic you know back propagation gradient descent etc to learn from these simulations?

Dr. Jonathan Mugan: Yeah, so a lot of it is transformer based. And a lot of it is just feed forward neural networks. And of course with computer vision now you can look and see where the pen is. You can see where your hand is. And then there's diffusion type methods that we've seen with these image generations now starting to make its way into robotics.  

Ron Green: Yeah, okay, I want to talk about that. Let's perfect segue. I've read a little bit about diffusion policies. It sounds similar to the technique used within generative computer vision technologies. How does it work within robotics?

Dr. Jonathan Mugan: Yeah, so the way it works in vision is you take an image and then you add a little noise to it. So you get now an image with a little bit of noise, like just a little some pixels. And then you do that again. You get an image with a little more noise, and a little more noise, a little more noise, all the way, and you do that like 1 ,000 times, and you end up with complete noise. And so you do that a whole bunch of times with a whole bunch of images. And what you can do then is that each transition, you actually, it's easy to add noise, but then you get like a training thing. You get a training record. And so you say, oh, now I can learn how to de-noise from anywhere.

Ron Green: So each of those steps.

Dr. Jonathan Mugan: Each of those steps. And so you basically learn a model, and then with some conditioning, you can say what kind of image you want. That can start with pure noise, and then up with images. And it's just amazing.

Ron Green: And it is, it really is.

Dr. Jonathan Mugan: Yeah, and so for robotics, one of the things we've been wanting to do for a long time is, if I wanted to pick up a pen, I can just show it. I can just take its arm, here you go, pick up a pen. But how does it generalize that? Because the great thing about robotics, well not a hard thing about robotics, but the great thing also is that every time, the real world is different every time you look at it. Your joints are never in the exact same spot.

Ron Green: The pen is never in the same spot.

Dr. Jonathan Mugan: Never in the same spot. And so, but if we have a demonstration, that's that initial image in the, in like the image and the analogy. And so, what you can do is you can take these demonstrations and then add a little more noise to that policy that created the demonstration all the way to complete noise. And then you can go back from noise back to the policy. It's the same process, but instead of adding noise to images, you know, you're adding noise to policy.

Ron Green: Okay, so each step unlike Unlike or or maybe like in the computer vision example where you're taking an image yet a little bit of noise And then you train it to de-noise at that step the robots are being trained to to Essentially eliminate the errors from each of these different stages of the task.

Dr. Jonathan Mugan: That's right.

Ron Green: Okay. That makes perfect sense and I've seen I've also seen videos and read some papers about the use of sort of learning by watching like robots literally watching people As the basis for the training how frequently is that used these days?

Dr. Jonathan Mugan: It's used some, there's a Google just came out with RTX, which basically they had a bunch of different labs around the world generate a whole bunch of training data. And they were able to take that training data in different robot formats, similar enough that you still could use it together, and then just hand it in over to like a neural network transformer type thing. And then with all this training data, the robot can then sift through that and learn how to do the right stuff. So it's hard to do just by watching YouTube video because the morphology is not the same when it's watching a human or something. But if it's, you can train from a bunch of different robots doing more or less the same thing, even if those robots themselves are different. And we've always talked about how great it is robotics that if one robot learns something, suddenly all of the other ones know it. And we've kind of, at least with skill-based stuff, implicitly assumed it's the same exact morphology, but it doesn't need to be. That's what Google has showed, so that's pretty cool.

Ron Green: Oh, so you're saying that within these videos, there are different robots with just completely different morphologies, but.

Dr. Jonathan Mugan:

Not completely different. but they're different types of robots. Yeah. Okay. Yeah. And they have different and they can learn from that.

Ron Green: And they can learn from that despite the morphology differing.

Dr. Jonathan Mugan: Yeah, and morphology doesn't differ by a lot. It's not like one's got horns on it. So it's an octopus type robot or something. Okay. And yeah, and that's because neural networks. Because neural networks first unlocked vision, was it 2012? And then natural language processing with RNNs and then transformers. And the beauty of it is it can map the squishy world and still use it. So before, up until that time, when we were doing everything with symbols, it was either at the symbol or it wasn't. And you couldn't really generalize very well. And so what neural networks allow us to do is to generalize states. In a way we couldn't do before.

Ron Green: How much has the advancements in computer vision played a role here? Because obviously completely independent robotics there's been enormous amount of advancement in computer vision. Are these robotic systems relying on computer vision for a lot of their advancements? Or is that sort of only applicable in some systems?

Dr. Jonathan Mugan: They almost all use vision and yeah it without vision you couldn't do this stuff right you wouldn't you couldn't do it as well you wouldn't know where the pen was and all that right so it's definitely vision is a huge thing for robots which is cool because for humans it's it's our main sense as well.

Ron Green: Right right right. Okay, let's transition a little bit. I'd like to ask about some of the current research that you're doing professionally right now. Could you maybe share a little bit of that with us?

Dr. Jonathan Mugan: Yeah, I could talk about my research goals. So what I want is for a robot to autonomously learn to understand its environment by building models of the environment. So we see that with ChatGPT, it does this incredible job of understanding what you mean and can take anything on the internet and explain it to you. So the other day, I was explaining general relativity to me. Oh, I didn't quite get that before. OK, now I got it. And I could ask you questions. Do you mean this? And it's like, no, no, I mean this other thing. It was amazing. But one thing chat GPT and those types of things can't do is they can't go outside of what humans already know. So it has to be on the internet somewhere. It has to be in its training day. They can't think from first principles. And that means not only can it not invent new things, but it can't act hyper-local. So if you say, hey, the guy brought the soda, he set it down on the table, actually, it can do that pretty well. But you can't teach it new things on the fly that it hasn't seen before. And in order to do that, you need to be able to set up some sort of concrete model of the way things work so that when you get stuck, it can reason through from first principles.

Ron Green: Okay.

Dr. Jonathan Mugan: And that's what my research is. That's like the goal of my research. And so this is still using symbols. So I believe that you need to have some sort of symbolic model somewhere so you can do search over it well. But the neural networks are helpful because they allow us to map the squishy world to the symbols, which was always the problem before. So now I can say, you know, a table. I can say, hey, a table is something that you can put stuff on. It supports those things. And the robot could then understand what the word support means in a very concrete way as opposed to just interpolating between everything it learned on the internet.

Ron Green: I've never heard it put quite like that. So the use of these sort of deep learning models allow you to more easily access or model the outside world symbolically because they handle that fuzzy mapping. Is that what you're saying?

Dr. Jonathan Mugan: Mm -hmm. Yep.

Ron Green: Okay, you've talked, I've seen you give talks about the importance of sort of humanoid robotics. Let's talk about that for a little bit. Why are humanoid robotic morphology so important?

Dr. Jonathan Mugan: Yeah, so that's important because everything around us is built for us. And so if it's going to work with us in the kitchen, if it's going to work in the buildings, it needs to go upstairs, take elevators, all these things. It needs the human morphology. And the reason human morphology has been so hard is because we're so unstable. We just stand on these two feet. And when we walk, it's really they found that we're doing controlled falling. So if we take a step forward, we're not at no point is our balance totally set. We're actually falling forward. And we continue that falling forward. And that was a big advance the last 20 years that they've been able to work that out. So now we can walk. You remember the thing from the little horse thing they kicked like 20 years ago? Yeah. Ice and running around. Yeah. Boston Dynamics, I guess it was. And so now we can do that with humans. Just with two legs and they can walk. They're still not as good as you and I. But that is probably been one of the hardest parts is to get the humanoid form. And then the other part of the humanoid form that's hard is the arms. But we've been working on arms for a long time, usually tethered to a table and it picks up stuff. And then finally, the one of the hardest parts is the fingers. Right? You remember in Planet of the Apes, original one, he was doing his fingers like this. Just to show the ape that he actually was intelligent. And that's what's really hard. And now we're starting to see robots with some fingers, not just these grippers or not just these little suction cup things. Which we'll probably still have those for a while. But you know, Tesla came out with that robot where it can do fingers and there was some research that's spinning a Rubik's Cube. And so we're finally getting to the fingers. And so it's just great.

Ron Green: That's just amazing. The progression right now, what do you think, what do you think the likelihood of like humanoid robotics impacting everyday life? I mean, obviously robotics tremendously useful in manufacturing, big impact there, many decades at this point. When do you think like consumer impact on with these humanoid robots is going to happen?

Dr. Jonathan Mugan: It's still gonna be a while Maybe 10 years you can maybe buy one for a price of a car.

Ron Green: Okay, and and what's it doing?

Dr. Jonathan Mugan: It can probably set the table. It's probably not useful It'd be more of a you know novelty. Maybe 20 years they're useful. Yeah, I'm hoping that they can help take care of me when I get old

Ron Green: Well, all right, well then that maybe makes this next part a little bit less daunting. Like when we talk about how it's gonna impact society. So if it's 10 years out before you think they're really impacting the consumer world, what like, let's play out the next 20 years. How do you think it's gonna impact society?

Dr. Jonathan Mugan: Yeah, that I worry about a little bit. I don't worry about the terminator scenario and that kind of thing, but I do worry about putting a lot of people out of jobs and how are we gonna adjust? And we definitely, we need to balance these things because we certainly wanna keep going with technology because we've got all these problems, all these diseases that haven't been cured and we need to get off this planet eventually maybe an asteroid will come. We need to have the technology to do these things and I wanna hopefully live forever and all that great stuff. But our system is built so that people trade labor for food and when their labor's not needed anymore, I'm not sure. We'll have to have some sort of welfare for everybody, but welfare can make people feel bad. Everyone still needs to have, I mean, some people are gonna be artists and do great stuff, but some people are just gonna not feel great about it and they're gonna need some outlet for their creative energy for their drive and maybe they won't know what that is and that'll be a big challenge as a society.

Ron Green: I'd love to ask you, give any advice for young people thinking about entering robotics.

Dr. Jonathan Mugan: Yeah, so of course you want to read and do everything you can to learn, watch all the YouTube videos and all that. But the advice I would say is just pick something that you care about and just get started. And it obviously has to be something you can make some progress on. So if you care about mobile robots, get you a robot that you can control and have it move around. And in doing that, you'll understand some of the problems and why they're hard. And then when you read the paper about it, you're like, oh, you actually care about this. It's not just some abstract thing. And that gives you a grounding that really helps you actually really learn.

Ron Green: That makes complete sense. All right, we love to wrap up our conversations here with kind of a fun question, Jonathan. If you could have AI automate anything in your life, what would you do?

Dr. Jonathan Mugan: As silly as it sounds, I would love robot gardeners. So imagine you have a city, well, at first you start with your yard, but a city that could just plant all the cool plants and maintain them like little bonsai trees everywhere, and it would be like driving through a garden everywhere you go.

Ron Green: Just a beautifully manicured world.

Dr. Jonathan Mugan: Yeah, and it could pick the plants that are native and know when they need to be watered, and it just could be wonderful.

Ron Green: I love it. I love it. Well, Jonathan, this was so much fun. I learned so much about robotics. I really appreciate you coming on board today.

Dr. Jonathan Mugan: Well, thanks. That had a great time.

Ron Green: Thank you.