In this video, we're going to start to work with that Pandas Datetimeindex. So fist remind ourselves what is the data that we're working with, actually this will be a bit cleaner if we write display here, which we actually imported in specific to working with Ipython Kernel. We can see that were working with that order date that category and the sales data. Now we're going to set the index using the existing variable. And the reason for this is that when we actually want to perform any type of analysis with time series data is going to be a lot easier when that time series in the index. These will add on a lot of functionality in terms of regrouping or creating differences or rolling averages and will see that in a bit. So we set our index to order date. And now when we look at the head, we see that's now the index. If we look at the index, we see that it's this datetime index right, specific type of index compared to the original index, should we had before. And if we look at the different date set we have, we also can notice here that the frequency is none. And that's going to come into play when we want to do time series modelling. Generally speaking, we want a specific frequency. Some models will work without it, but oftentimes it will make a big difference if we actually had that frequency set. Now if we have our datetime index, we now have the options to subset according to that datetime index. Now what do I mean by that? If we wanted the full year of 2011, we can just call base 2011 here. And we see here in this first one and again I'm going to change this to display. And will change this to display as well. We see here that this is just the first five values, but specific to just 2011. I can look at the tail here as well and we see that it goes through the end of 2011. And then if we wanted to subset here, we set here just using our name normal Pandas sub setting where the category is equal to office supplies. And then we're also going to just be taking from 2011 until February 2012. And when were sub setting in this sense it's a bit different than what you're normally used to in Python, and that it will actually include all of the dates within that month of February in 2012. And this will be clear if we look at the tail and we see that it actually includes all the way through 2012 to 29. Now another thing that we can use when we're trying to work with the datetime objects, and we can use this even if it's not an index. An will discuss that in just a bit. But here if we look at the index, we can look at the day, if we're working with this datetime objects. So what day of the month is it? What week of the year is it? And then we can even look at what day of the week within the week it is? Is it a Monday, Sunday, Tuesday, whatever it is? This will output a numerical value where we have zero equal to Monday, which is what we see here Monday, equal to zero Sunday, equal to six, and there's only seven days in the week. So we're going to look at this very quickly. Again, I'm going to change this to display. And we see we have that day of the week where January 4th, 2011 was a Tuesday. And you can look that up, and that can often add a lot of information when you're doing time series model. You can imagine if you have some type of store and you're trying to predict sales, it's very likely that the weekend have higher sales in the week days. So having this day of week can be very powerful. There's also going to be, Well, let's just leave that as is. We then have standardizing the datetime index, so while the data from the existing variables may be sufficient. Again, as I mentioned earlier, time series applications require that data contain all the periods and have her frequently assigned. So right now we're missing many time periods we see here it has all of them. We also have duplicates, which will be a problem working with time series data, so we're going to want to get rid of those duplicates in some fashion. As well as ensuring there's no missing values and you can't see this here by just looking at the first five values, but there are missing values as well in regards to those dates. So that's what we have highlighted here. We want to transform the data set that we're working with, so that there's no duplicate values, and there's no missing values. So in order to deal with the duplicate values. We can pivot our data so that we have rather than this long format, where we have here the date for furniture and the date for office supplies, we can move that furniture in office Supplies into a column variable. So that's similar to what we would do with just our pivot tables if we are working in Excel or our group I objects, which hopefully you're familiar with at this point. But what we do is we have our index set in place. We pull that order date back out and then we pivot so that our rose are the order date, our columns are the categories and the values are the sales. And if we look at this, we can now see that we have pivoted our data so that we have a unique value for each row, and that's going to be each of the unique dates. And then we have the sales of furniture, office supplies and technology. And know that there's going to be missing values, as if we look up here on January 4th. There's only data for office supply, so only office supplies are 16.4 and the other two will be null. And we can easily fill those in by calling fill in A with 0. Now another method for doing what we just did, that may be simpler at times. Is that we can use the unstack or stack methods, well here specifically the unstacked method. Which will transform this long data set, which is the original data set we are looking at into the wide data that we have here with our pivot table. And we can tell pandas that the date and the category values should be part of the index and then use the unstack function to generate separate columns for our category. So we set this first to our index. And then we can say from that index what do we want to unstack? And we want to unstack specifically the category, which again is what we have up here, and then this time we're going to fill in A with 0, but this we'll do the same thing. I'm going to pull this out real quickly. And let's just look at the output. And you see here that we have the same thing. The only thing that's different is that we have this multi level index and we don't need this multi level index. So we're just going to rename those columns, 2 we're going to call out the levels. This is a multi index of we were to look at, set sales equal to this. And then look at the actual sales columns. We see that it's multi index with sales furniture, sales, office supplies and sales technology. We just want all the values from this row furniture, office supplies and technology. So we say we just want the first one and we can see what that looks like as well. And we say we have furniture, office supplies and technology, then this has a name of category and we're just going to rename that to not. We don't need any naming for our columns. And then we look at sales that I had. It's the same as what we had above, except we filled our NA values with zero. Now we figured out how to remove duplicates by pivoting our data. Next thing that we want to do is ensure that if there are any gaps, we fill in those gaps. So here we're going to generate a complete index. Just to make clear that we are actually missing days, we can take the length of our actual index, so how many unique values we actually have? We can then take our Max value so salesindex.max subtract that Min value, in order to get some time Delta. And figure out how many days are in that time Delta by calling out our date range and then the days. And we see that in our data set we have 12138 unique dates, whereas there's 1457 days between our minimum date and our maximum date. So we have to fill in those values in some way. And the way that we're going to do that is we're going to create a new index. Using this pd.daterange it just goes from our Min to our Max. And we print this out we see this is just a day for every single day from our Min to Max value. And then we call sales.reindex with this new index. That new index will be again longer than our original index, which was just 1238 days. So we have to have some fill value. We fill that value with zero. And now when we call that index, we see that we now have a frequency built in. And we have that frequency equal today and that will be very powerful as we try to do many of our time series modelling later on. Now that closes out this video in regards to ensuring that you had that full datetime index. In the next video, we'll leverage this to do some form of reset. Point and we'll see what that means in terms of upsampling to weeks or downsampling down from weeks back down to days. As well as looking at some differences in plotting that you can do using this time series data. All right, I'll see you there.