Okay so let's take a look at the code, that we're going to use in this environment. So if you're going to use this workbook, please make sure you're using a Python three environment. So we're going to go in here and say change run time type, make sure it's Python three, and I am just leaving as a GPU accelerator to speed things up. If you import TensorFlow as tf, and print out the tf version you'll see the TensorFlow version. If you're doing this course at a later date, this may be 2.0. If it is 2.0 then you do not need to enable eager execution, but because its 113, I'm going to turn on eager execution. Also, if you don't have TensorFlow datasets installed, this line will install it for you but right now I do have it installed. So the next thing I'm going to do is just import TensorFlow datasets as TFDS. On TFDS load, IMDb reviews will give me an IMDb set and an info set. I'm going to use the info set in another video, we're not going to use it here. But the IMDb set is what we're going to be looking at here. So next up is the code where I'm going to get my IMDb training and my IMDb testing and load them into training and test data. I'm going to create lists of training sentences and labels, testing sentences and labels, and I'm going to copy the contents of the tensors into these so that I can encode and pad them later. So this code will just do that. Secondly also for training, I need numpy arrays instead of just straight array. So I'm going to convert my training labels that I created into a numpy array like this. So that's all done. So next up is where I'm going to do my sentence encoding. So I've decided I'm going to do a 10,000-word vocab size. Of course you can change that. My embedding dimensions, which you'll see in a moment will be 16 dimensions. I'm going to make sure all of my reviews are 120 words long. So if they are shorter than that there'll be padded. If there longer than that they'll be truncated. I'm setting the truncation type to be post, so we'll cut off the back of the review and not the front, and then my outer vocabulary token will be OOV like this. So now if I import my TensorFlow keras preprocessing tokenizer, and pad sequences is in sequence as before. I'll instantiate my tokenizer passing it my vocabulary size, which as I said here earlier on was 10,000. Of course you can change that. Then my outer vocabulary token, the tokenizer will then be fit on the training sentences, not the testing sentences just the training ones. If I want to look at the word index for the tokenizer, all have to do say tokenizer.word_index. I will then convert my sentences into sequences of numbers, with the number being the value and the word being the key that were taken out of the training sentences when I did fit on text, and that will give me now my list of integers per sentence. If I want to pad or truncate them I use pads sequences to do that. So each of my sentences is now a list of numbers. Again, those numbers are the values in a key value pair, where the key of course is the word. The pad sequences will ensure that they're all the same length, which in this case is 120 words or 120 numbers. There'll be padded out or truncated to suit. I'm then going to do the same with the testing sequences, and I'm going to pad them in the same way. Do note that the testing sequences are tokenized based on the word index that was learned from the training words. So you may find a lot more OOVs, in the testing sequence than you would have in the training sequence, because there'll be a lot of words that it hasn't encountered. But that's what makes it a good test, because later if you're going to try out a review, you want to be able to do it to see how it will do with words that the tokenizer or the neural network hasn't previously seen. So I'll now run this code. I'll create my sequences, my padded sequences, my testing sequences, my testing patterns. This will take a few moments. So I can now explore what this looks like by running this block of code. So for example here you can see I've just taken my reverse word index, and I can decode my review by taking a look at the numbers in that review and reversing that into a word. So taking the key for that value and reverse word index flips the key value pair. So we can see here that the decoded review, this is what would be fed in. I saw this this film on true movies, which automatically made me out of vocabulary and the original text. You can see as capitalized, there is punctuation like brackets and commas in there, and the word skeptical ended up being not one of the top 1,000 that was used. So just gives us a nice way of looking at the type of data that we're going to be feeding into the neural network. So let's now take a look at the neural network itself, and it's very simple. It's just a sequential. The top layer of this is going to be an embedding, the embedding is going to be my vocab size, the embedding dimensions that I wanted to use, I had specified 16. My input length for that is 120, which is the maximum length of the reviews. So the output of the embedding will then be flattened, that will then be passed into a dense layer, which is six neurons and then that will be passed to a final layer with a sigmoid activation and only one neuron, because I know I've only got two classes. I'm just going to do one neuron instead of two. I didn't need to hard encode. I'll just do one neuron and my activation function being a sigmoid, it will push it to zero or one respectively. I can then compile that and take a look at the summary. Here's the summary, it all looks good. Again, each of my sentences, 120 characters, my embedding has 16, and out of that the flattened thing we'll have 1,920 values. They get fed into the dense. They get fed into the output layer. So let's train it. So I'm going to set just for 10 epochs and I'm going to fit it. So it's training, it's correct, 25,000 samples and validating on 25,000 samples. Let's see it training. Our accuracy starts at 73 percent on the training set, 85 on validation. Training over time is going to go up nicely. We're most likely overfitting on this, because our accuracy is so high, but even that a validation accuracy is not bad. It's in the 80s. So we can see even by epoch seven our accuracy up to a one. Our validation accuracy is still in the low 80s, 81, 82 percent. Pretty good, but this clear overfitting going on. So by the time I've reached my final epoch, my training accuracy was 100 percent, my validation accuracy at 82.35 percent. It's quite healthy but I'm sure we could do better. So now let's take a look at what we'll do to view this in the embedding projector. So first of all, I'm going to take the output of my embedding, which was modeled out layer zero, and we can see that there were 10,000 possible words and I had 16 dimensions. Here is where I'm going to iterate through that array to pull out the 16 dimensions, the values for the 16 dimensions per word and write that as out_V, which is my vectors.tsv. Then the actual word associated with that will be written to out_M, which is my meta.tsv. So if I run that, it we'll do its trick and if you're running in Colab this piece of code, will then allow me to just download those files. So it'll take a moment and they'll get downloaded. There they are, vecs.tsv and meta.tsv. So if I now come over to the embedding projector, we see its showing right now the Word2Vec 10K. So if I scroll down here and say load data, I'll choose file, I'll take the vecs.tsv. I'll choose file. I'll take the meta.tsv, then load. I click outside and now I see this. But if I spherize the data, you can see it's clustered like this. We do need to improve it a little bit but we can begin to see that the words have been clustered in both the positive and negative. So for example if I search for the word boring, we can see like the nearest neighbors for boring are things like stink or unlikeable, prom, unrealistic wooden, devoid, unwatchable, and proverbial. So if come over here we can see. These are bad words. These are words showing a negative looking review. I can see, there's lots of words that have fun in them, some positive, some negative, like unfunny, dysfunction, funeral are quite negative. So let's try exciting. So now if I come over here we're beginning to see, hey there's a lot of words clustered around positive, movies that matching exciting, that type of thing. So we can see them over really on this left-hand side of the diagram. Again, I just maybe if I search for Oscar, nothing really associated with Oscar because it's such a unique word. We could just have all kinds of fun with it like that. What if I search for brilliant? Again we can begin to see like words clustering over on this side, but there's not a lot of words that became close to brilliant. In this case, guardian, Jeffrey, Kidman, Gershwin. So these are brilliant being used as an adjective. Some good stuff in there though. So hopefully this is a good example of how you can start mapping words into vector spaces, and how you can start looking at sentiment and even visualizing how your model has learned sentiment from these sets of words. In the next video, we're going to look at a simpler version of doing IMDb than this one where we're doing writing a lot less code, we're taking advantage of stuff that was available to us TenserFlow data services.