Hi, this is Kinsey Anderson, I want to pick up on strategy from the last video, and explain why it's key to machine teaching. What do we know about strategy? We know that deep reinforcement learning can learn strategy. We talked about Alpha chess in the last video, where he had learned just by practicing the 12 most common opening strategies in chess. I believe that this tells us that strategies come from the structure of the environment, not from what the human wants to do. When I first started working with deep reinforcement learning, and I hear this question a whole lot is, well, pre-existing strategies, and known skills, they're really just conveniences that the human designed to help them make decisions, like a crutch. Actually, I disagree, I couldn't disagree more, I believe that the strategy is actually dictated more about the environment. Let's explain it on our chess example. There's three basic phases and a chess match, the opening, the mid game, the end game, they each have different goals. The goal or the objective of the opening is just to survive, the goal of the mid game is to gain a material advantage, get more pieces, get in a better position, and the goal of the end game is to make your opponent, to trap, and pin their king, why is the opening so difficult and treacherous? Well, the opening in chess, I'd like to describe it as a situation where if you open your front door in the morning, and the labia tar pits were out there or there was a minefield out. There's lots of different ways to make bad decisions, and lose very quickly at the beginning of chess, which is why opening strategies are very prescriptive, they're like very prescriptive expert rules that say, do these six moves or do these two moves in, and then if this happens, make this move, and if that happens, make that move, they're providing you stepping stones through those tar pits or through that minefield in order to get out safely. The structure of those strategies or those stepping stones is not dictated by what any particular person wants to do, it's dictated by the structure of the game itself, and this is what happens in factories too. As the operators explore what to do, the strategies emerge as, in this scenario, it's best to do this, in this scenario it's best to do that, but that's based on the reality of how the equipment operates, and how the process functions, let me give you another chess example. Expertise really lies in choosing the right strategy for the right situation. I went on a quest to better understand chess even though I'm not a chess player, definitely not a great chess player, because of one, the relationship of AI to chess. Chess has been a proving ground for AI since the very beginning with Alan Turing, Claude Shannon, and others in all the way through to Alpha chess through things like Deep Blue. I wanted to learn more about strategy, and chess has a lot of strategy. It's maybe known as the quintessential strategy game, and so I picked up two books by a chess master, and chess coach named Duke Jeremy Silman. The first was the encyclopedia of chess strategy, and I flipped through hundreds of strategies, and the discussion and I didn't really get the connection I was looking for, but then I read another book by Jeremy Silman called the Amateur's Mind. The Amateur's Mind stated very clearly that an amateur plays what strategy they want to play, and a master plays the strategy the board tells them to play. I further learned, and this was from my observations, not Silman, but from reading that book that there's basically two overarching chess strategies. One is very offensive, it's very mobile, there's a lot of action in it, the pieces move long distances, and make longer strikes. The other is very defensive, a lot of building edifices controlling the center, being able to move around the center, but keep your opponent from moving into the center. Chess matches swing, back, and forth between these strategies, and no player, even a player like Bobby Fischer who very much prefers that aggressive style gets to play that way all the time because the board doesn't allow him to, as a master. That great chess masters read the board, and swing back, and forth, but applying the right strategy at the right time. Let's give an example, and let's talk a little bit about what it must be like to learn strategy by trial, and error. Trial and error is not the best way to learn strategy, let me give you an example of Pac-Man, Pac-Man is a game by Atari that's quite old now, there's actually a picture of it playing in the corner. You can get an example of what it looks like if either you're too young to know about it or maybe you've forgotten about it by now. There's about 33 strategies in Pac-Man, too many possibilities to learn via trial and error. Part of that is because there is no reasoning to this AI, just like I said before. Imagine you're learning to play this game by trial and error. The actions you're taking are you go up, down, or side to side, or the joystick. So you pick a direction to move each second or each turn that you take. Then things move on the screen, but here's the problem. You don't know what any of those things on the screen are. You're this yellow chomping mouth thing and part of your objective is to move through this maze. You want to gather pellets and you get points for gathering pellets, and if you gather all the pellets on one level you move to another stage, which has a whole bunch more pellets and allows you to accumulate even more points. It lands itself to deep reinforcement learning because you have a score that you're tracking towards. But I don't know that the chopper is me. If I'm this algorithm, I don't know that that yellow thing is me. I don't know what a pellet is, I don't know that this chomping motion makes most humans think about eating, so it makes me want to go and chop the pellet. I don't know that a ghost, your adversaries in the game that can kill you if you make contact with them are scary because I don't know what a ghost is and I certainly don't have the connotation that it goes to scary like most humans would. I don't know what dying is. Even though you die in the game when you come into contact with ghosts three times. All I know is that if that event happens three times, my ability to gain any more reward is permanently cut. Yes, I'm dead. The game's over, but I don't know what over is and I don't know what a game is. You're starting to get an idea of how difficult it might be. Let's talk about a couple of these strategies that we can easily intuitively think about. One strategy would be to run around the board and just avoid ghosts, and collect as many pellets as you can. The second strategy is a more sophisticated strategy maybe to evade ghosts, which is there's two tunnels on either side of the board; left-hand side, right-hand side. If you go through one tunnel, you're teleported, you come in the other side. I might hang out near the tunnel, wait for the ghosts to come to try to get me, pop through the tunnel, teleport to the other side of the screen where I'm home-free and I can chop away. First strategy might be there's a specific type of pellet called a power pellet, that if I eat that pellet, then I can actually kill the ghosts. I might wait for ghosts to come to me. Again, while I'm near a power pellet, eat the power pellet and then go kill the ghosts, after which, I can just home-free again, run around. These things are going to be almost impossible to figure out, I'm just by this trial and error method. When the DeepMind agents learned to play the Atari games, there's 57 of them I believe, 55 of them they achieved human competence really, really easily. There were two games though, that the AI didn't do well on at all. One was Pac-Man, and one was Montezuma's Revenge. Those are both games that have a lot of strategy in them. They require a lot of explicit strategies to be used at different times. Later, DeepMind wrote a paper called Ray Interference: a Plateaus in Deep Reinforcement Learning that explains mathematically with examples why it's so hard to learn separate skill separately. It essentially says that the more individual separate skills are required to complete a task that you try to learn together, the longer it will take to learn. If there's two skills required to play a particular game or two strategies at play, then it might take 10 to the 2 practice tries or iterations. But if there's eight strategies required and I'm trying to learn it without any of the strategies being taught then there'll be 10 to the 8 practice. It gets to be so much that with more complex tasks and most tests are more Pac-Man and Zoom as revenge than they are the simpler Atari games, then it becomes essentially impossible to learn the task at any reasonable amount of time. That's why machine teaching explicitly teaches the strategies and that's actually one of the conclusions that you can come to from reading the DeepMind ray interference paper is that maybe you should teach the strategies explicitly. I do believe that a better way is teaching, so let's give another example. What if I taught my son how to play basketball by the same trial and error reward system. In this method, I might take my son out to the basketball court, we have a hoop in front of our house. I'll give him a basketball and I'll say, son, I'll give you a cookie or I'll give you a dime every time you get the ball in the hoop. I don't teach him any strategies. I don't teach him any skills. I don't give him anything to practice. He's going to try everything under the sun. Most of the things he tries are not going to be good ways for humans with arms and legs to get the ball into the hoop. Most likely he's going to quit before he gets really good at it. Now, an AI is not going to quit. But an AI will keep trying things enough that we'll quit on the ability to be able to do something. Now, why do that when there's three tried and true ways for humans with arms and legs to get the ball into the hoop? One is the lay-up, one is the hook shot, and one is the jump shot. The lay-up is when you dribble up to the basket and when you're close to the basket, you bring the ball up and you usually bounce it off of the backboard into the rim or you lay it directly into the hoop. It's a really good strategy when you're close to the basket. The second is the jump shot. The jump shot is when you take the ball, place it on your hand, and use your other hand to guide it to the shot. There were other ways that people used to try to get the ball in the hoop. For example, basketball was invented in the 1800s and one of the older ways to try to shoot free throws was what was called the granny shot. Where you take the basketball and you toss it underhand up like that. People don't do the granny shot anymore is because the jump shot is a much more effective way to get the ball into the hook. Now, some people will say, well, you're constraining the exploration by teaching these skills. Yes. But I'm also guiding the exploration. I do have biases because in our experience with humans with arms and legs, the anatomy that we have tends to work really well with layups, with jump shots, and with hook shots, which is really a variant of the jump shot. Once I teach each of those strategies, it's not a rule. There's lots of different ways to shoot the jump shot. There's lots of different ways to shoot the layout and there's lots of different situations where you might trade-off. For example, you might want to shoot a lay-up farther from the basket, even though it's traditionally the best strategy for shooting close to the basket. You might want to shoot a jump shot closer to the basket, depending on how you're being defended, you might want to shoot that instead of the lay-up. Let me give you another example of teaching skills and strategies explicitly and how when you do that, it can facilitate the exploration of self practice learning much more quickly. It's from one of my favorite movies, the Karate Kid. It's a movie that came out in the United States in 1984. It's about a teenager played by Ralph Macchio. His name is Daniel, and he wanted to learn karate because he was getting bullied at school. It turns out that his janitor at his apartment complex was a karate expert that learned ancient Okinawa and karate before immigrating to the United States during World War II. He asked Mr. Miyagi, played by Pat Merit, to teach him karate. He brings him into his house, it's a beautiful house and he has him start doing chores. The first day, all he does is wax cars and he teaches in this motion, he calls it wax on, and wax off. Then the next day he comes and he has to paint a fence and he teaches him to paint a fence like this. Then the third day, he teaches him to sand a deck, and he teaches them to sand the deck like this. Then there's a fourth day also where he teaches him to paint the house like this. Now by the fourth day of doing what Daniel thinks are chores, he gets very frustrated and wants to quit karate because he thinks that he's not learning anything, but he is actually learning very valuable skills. This wax on movement, wax off movement is actually [inaudible] or the inward middle forearm block. The paint, the fence is actually the [inaudible] or rising block, and the sand the floor is actually the [inaudible] or circular block. There's a great scene where it all culminates near the end where Daniel's about to leave. It's late at night, Mr. Miyagi has been fishing instead of overseeing his karate practice and he's about to leave and Mr. Miyagi brings him back and he has him show each of those movements. Daniel was very quickly able to bring all of those movements together into a cohesive blocking system which is able to block a variety of kicks and punches that Mr. Miyagi throws. What does that have to do with autonomous AI? We did something very similar when we taught a robot, this robotic arm that had seven joints to grasp and stack blocks onto each other. We broke it down into five skills. In the same way as Mr. Miyagi broke the karate blocking system down into the various movements. We broke the act of grasping and stacking down into various movements. One was reaching, reaching is extending your arm out from your body. Moving is lateral movement, whether it's vertical or horizontally. Orient is orienting your hand around the block that you're about to grasp. Grasping is moving your fingers to grasp the block and then stacking is placing the block onto the other block. When we pre-trained each of those five skills and then we trained another skill, a special skill called a selector. This skill was to decide when should I reach, when should I move, when should I orient, when should I grasp, and when should I stack. When we did that, the AI fused those skills together extremely quickly, just like Daniel did outside of Mr. Miyagi's house. In fact, when the task was taught or trained as a monolith, the entire task, it took about a million tries. A million practice decisions, but the act of assembling those texts together, the selector learning how to use each of those pre-trained skills only took 22,000. You teach an autonomous AI brain by modularizing skills, defining strategies, and then using other skills to select between the strategies. In this case, you'll often have multiple different neural networks for the learned parts of the brain and different pieces of programming and different math menus and manuals for the other parts of the brain. Last, I want to talk about how teaching allows us to trust the AI. Remember how deep reinforcement learning is a black box system monolith, any machine learning that uses a neural network is a black box. Well, when you modularize AI, the modules don't just tell you what to do and execute the skills, they also tell you what skills they're using at any one time. For the robotic arm when it's reaching, the AI brain is telling me I'm reaching right now, and because you might look at it and say, a black box you might look at it and say why doesn't it grasp again? I'm not trying to grasp yet, I'm still reaching. Maybe I'm doing that mistakenly, but at least you know I'm reaching now, I'm moving now, I'm orienting now. In my opinion as an AI, it's not time to grasp yet. It may learn how to do that better, but at least you know what it's doing. I want to show you a clip. I want you to watch a clip from Grant Bristow. He's an aeronautical engineer at Bell flight and he's one of the subject matter experts that we're following through this entire specialization. Watch as he talks about how important machine teaching and explainability are to certifying autonomous AI for drones. Let's talk a little bit about some of the specific challenges to autonomy in the aerospace industry. I know you and Matt and I talked a little while ago and you were talking to me about FAA certification. You were telling me about the challenges of getting AI certified by a government agency and from a safety perspective, but you also had some views on how machine teaching and modularity and decomposition and explainability could help with that. Can you tell us a little bit about that? Sure. Certification is a fundamental part of the aerospace industry specifically when we deal on the commercial side. For obvious reasons we take great care in what we develop, because we have people's lives on the line. As part of that, demonstrating our confidence in any system we deploy in the aircraft, whether that's AI or others is paramount. Explainability is a major portion of that is do we understand why things are happening? Even if they're happening in an unforeseen way or in a way that's not desired. If we can explain why that's happening, that means we understand the system. Then from a simulation perspective we're also in an industry where relatively speaking, our volume both in actual units as well as in time, is significantly smaller than most other industries. Simulation plays a very important part in acquiring the exposure to a phenomena in such a way that we develop confidence around those phenomena. As well as for obvious reasons, there are also scenarios that are difficult to replicate in the real world in acceptable degree of safety when we're validating. Simulation is another way to evaluate those scenarios before we go and put human personnel in those situations. Makes perfect sense. With machine teaching, we teach designing and building modular brains that embed skills as the unit of competence. How does a modular brain that is outlined with modules that each take care of separate skills, potentially help with something like an FAA certification. Yeah, so you abstract out the problem. You can actually isolate changes to specific modules and that eases the certification effort when it comes to continuing to sustain. To simplify that would be as, when you want to make an improvement of any one module, you can make an improvement and reset that specific module rather than having to reset the entire monolithic codebase. I see. That makes perfect sense.