Hi everyone, welcome to our lecture where we're going to continue modeling data. Instead of a small fake dataset with six values, I now have a larger spreadsheet. This is actual data of residential average monthly kilowatt-hours from the city of Detroit. This is per their website, and this is 195 data points. We're going down to row 196 and we have a header row. A couple of things about the dataset, whenever you get a new dataset like this, you want to look at the data. We have our date. So I have the dates going from the present to the past. We're starting with 3/1/2016 all the way down to 2000. So nice large dataset. Take the first of the month, the average kilowatt-hour and the average bill. So we have a nice dataset here. I'd like to do some analysis on this. I'd like to model this and see what's going on and maybe we can look for trends in the data, we can find some patterns, we can make some predictions. We want to know what the average kilowatt usage will be, what the cost will be. If we're trying to do some financial savings, we can really have some fun with this dataset. So I want to graph this thing. People are sometimes always afraid to test the data. When we graph this thing, I want to make a scatter plot. When I have to think about a scatter plot, what's going to be my x-axis, what's going to be my y-axis? The first thing I might want to do is reorder. Usually, when you're looking at x-axis for time, you go in the future is you go to the right. If I graph this as is, we're going to a problem because this date is moving backward. So let's see if we can do that. Let's sort. We go to Data tab, Sort. Again, it may look different from yours, but that's the idea. Let's do date, let's do values, and we're going to order from the oldest to the newest. That should switch the order and it does. There's my first switch. That looks a little better. I guess we could really do an analysis on the bill or the kilowatt-hours. Maybe we'll play around with the bill. So let's do this. Let's look at the bill. I'm going to ignore the kilowatt-hours. I know you could totally replace the analysis. We're going to do in this lecture with kilowatt-hours and maybe can relate them, but let's just do this. Let's do month and what I want to do along my x-axis is I want to have a date or a month, but I just want to go like 1, 2, 3. I want to start this off. It's pretty common to reorder, to relabel. I don't want to start on 2000. I don't really want to start like 1/1/2000. Let's just start with the first month, the second month, the third month. Think of it like your first observation, you're a scientist in a lab, second observation, third observation, so on, and so on. I'm going to introduce a new x variable that we're going to call month. Then our bill is, well, it is what it is. Let's do a little financial analysis and let's visualize all that. We're going to highlight our columns C and D. We're going to get all of it. You could use a mouse to do it or a keyboard shortcut, whatever you want, and we're going to insert, and we're going to insert a scatter plot. My goodness, look at that. That's quite a scatter plot. Stare at that for a minute. Average bill, don't like the name, so maybe we can do like scatter plot, well, it's all right, I guess. On the left side, on my y-axis, we have the dollar amount. On the x-axis, we have our months, really just the number of observations. We just have to keep track that we're going from 2000 to 2016. Let's go through the process. Let's add some things. Let's do a trendline, I don't know. Let's start linear. Linear, there it is. It is definitely there. Let's see if it's a good trendline. Let's look at the linear correlation coefficient. We've done this a few times, so I'm going to go quickly. Of course, it puts it in a little bit of an awkward spot, but now let's move it nicely. So we have R-squared is 0.0357. If I really want to know what r was equal to, this is going to be bad though, so let's see what that's going to equal to. Let's do equals square root 0.0357 and what do I get? 0.188944. Remember how to interpret this number. The closer you are to zero, the worst fit this. This is not a great fit. I mean, other r's values we had like 0.997. What this is telling you is that this data has a weak correlation. Modeling this with a linear function is going to be pretty bad. We can still go through the process for the other ones. We'll do maybe one more. If we were to jump to the log-log plot or something like that, it's probably going to be bad as well. It doesn't look like it's doing some other things. But this is where you play around then you have fun with it. You'll say, "Well, let's see. What would happen if it were exponential? What does that give me?" You can add the chart element and you can just keep going. Exponential R-squared, 0.0357. Pretty bad, pretty close to zero. If we were to keep the square root of that, also not great. So just because you're learning about exponential modeling or power function modeling or linear modeling, it doesn't mean that every datasets can follow these. Think about how many functions there are? There are so many functions and there's so many different datasets out there that you really want to be flexible with what you do. So what I'd like to do is move over to perhaps a more robust modeling tool. Excel is great, but it does have a finite limit of what options I have. Exponential, linear, logarithmic, that's a new one. Polynomial, power, moving average. There's only 1, 2, 3, 4, 5. It only has these six default trendlines. Let's look at Desmos where you can build your own model as you go. This scatter plot is new for us and I think it's a good example. All right. How do we go to Desmos? Well, we're going to grab our data. What's nice about Desmos is that you can copy. I'm just going to grab the data. I'm not going to grab the titles. We're going to highlight my entire data that I'm studying, trying to get some sense for, and we have 200 rows, give or take, and there's our scatter plot. We'll copy this and we're going to head over to Desmos. I go to desmos.com, click "Start Graphing", and I'm just going to paste into Desmos the table, and it labels column 1, x_1, it labels column 2, y_1 and it gives us the same random scatter plot that we saw before. Now we're going to use our brain here a little bit and we're going to think, what are we actually modeling? We're modeling the electric bill every month. Now, think about your personal usage for electric bill. Sometimes you use more. Maybe in the summer or the winter, you have your air conditioning going, every heat going. Sometimes you use less, you keep the windows open in the fall and the spring. Then what happens when we have 16 years of data? What happens? It repeats. It cycles. When you have data that repeats or cycles, we're looking for a function that repeats or cycle. Can you think of a function that repeats and cycles through? This is where we're going to use a little bit of trig. The common function to use when you're modeling something that cycles or cyclic is a trig function and that's sine. We'll build this up slowly here. If I just did sine of x_1, and why it's breaking? Because I put equals. Remember when you model in Desmos, you've got to do a little tilde. It puts a little baby side down here way far from our data. So we're building our own model. What is interesting here is that this is clearly a terrible model and we remember why. The sine curve it goes between negative one and one, and our data, the smallest value looks to be about 40 to perhaps 188. It looks to be 40, 188. So what we need to do is we need to shift this sine curve. We need to shift this sine curve up a little bit and perhaps play with it to get a better fit. But the key here is that we're using a sine curve. What are some things that we can do to a sine curve? Well, first and foremost, let's change the amplitude. How you change the amplitude? The amplitude is the number in front of a sine curve. How large and how gets. Sine itself just gets from one or negative one. So if I put like 10 in front, it gets a little bigger. If I put 12 in front, it goes from 12 to negative 12, or 15 in front. The beautiful thing about Desmos is like, I'm just going to put a variable up front and let Desmos play around with it. Desmos is still struggling to fit this curve, so it still needs some help here. But let's see what else we can do. It's not enough just to change the amplitude. Probably also we could tell the curve does not go through the scatter plot. So why don't we help this curve out by shifting it up a little bit? I'm going to shift it up. How do you shift up a function? You add a number to it. So let's add 100 to it. Remember if you subtract 100, if you subtract five, it pulls it down, if you add something. One hundred looks to be generally in the middle of my scatter plot. I could've said 90 or 105 or who knows? But what I could do is also raise this thing up a little bit, and maybe I'll even let Desmos trying to fit the best curve to it. I could give it a number and fix it or I can say, "Hey Desmos, you figure it out," and I'll call it plus C. The variables I'm picking are completely random. You can pick plus D, E, F, and G. At least Desmos now has a little bit of help of trying to get this thing to be right in the middle of it. How else can we help out with a trig function? What are some of the things? We changed the amplitude. We shifted the graph up a little bit to match our graph. It's also odd. What I want to do is also not worry about values that I don't care about. So the way that we can restrict the domain is with the curly bracket x_1 greater than or equal to zero. We're not looking at past time, so we'll just restrict it to quadrant one here. Clearly still a bad fit. So next thing we can do is the period of sine is 2Pi. Up and down, one period crest-to-crest, trial to trial, 2Pi. That's a full cycle. We are dealing with months. What is a full cycle? We want 12 months. We want 12 months. So how do we change the period? That's the coefficient on the x_1. In general, the period is 2Pi divided by, you can adjust this here, B, whatever your B is. Here we go. Now we're starting the curve, it's fitting B to negative 630, but in general, B is your period that you want to address. I could even force this. Maybe I won't let Desmos pick for me. I'll force this with a 12th because I know what my data is doing. I know it's a 12-month cycle, so I will force this with a 12. So 2Pi is always the constant, just the default. So again, the default is one. So 2Pi is for sine and then whatever you put down here, that will adjust the sine curve so that 12 becomes your new period. You can start to see that it's getting a little better. Slowly, but surely getting a little better. Well, we can also do a play around this. Again, there's no right or wrong answer. We're just playing here. You can pick any coefficient in front you want and you could even do a shift. Remember, plus C shifts it up or down. You could even do a left right shift or let it to line up a little better and Desmos is doing his best to get you these numbers that best fit the curve. This is a tough dataset. They don't follow normal pattern. I do like modeling this with the sine function because it moves in a periodic fashion but I don't love that the amplitude, A, is so small. So what I might do now is force some of these variables to just get a little higher. So maybe I'll set my amplitude, A. What's a good number here? So we're going from one to 15. Maybe we'll set it for 50 and will force the period as well. We're just having fun. Well, look at that. That's scary. 2Pi over 12, there we go. That's not too bad. When you view the graph as lining up to a sine curve, you see how the cycle probably matches what you expect. Use a little less maybe in the winter and use a little more in the summer, up and down and up and down This probably models your electric bill or electric usage at least. Yes, it's not going to perfectly fit the curve, but we don't necessarily have to perfectly fit the curve.This is 16 years and this is cost. So maybe there's little changes due to inflation or just energy costs, but from what one tell like a random smattering of dots on a scatter plot, there is now some rhyme or some reason to it. So while we could keep playing with this all day and maybe we can adjust and put some more coefficients on things, the key takeaways here is that you know a lot of functions, we've gone through a lot of functions in this class and I want you to use them and don't limit yourself to what Excel has. Excel has the very basics, introductory level, the popular ones; linear, exponential power, but you have other ones at your disposal. Anything that you use, logarithm, sine even just rational functions, whatever you want. You can put them inside of Desmos. You can copy data in, use the tilde, and play around with it till you get a model and it will always introduce and tell you the variables that make it a best fit for you. You can see our r-squared value is 0.619, better than anything else we had with the other ones. Is it the world's greatest model? It's pretty good but it's probably not the greatest one, but it's clearly better. We got at least further away from zero and closer to one than we did with the other models linear exponential. I think modeling this cyclic data with a sine function is a good fit. Different models, the more modeling you do, you'll see the uses for different functions. But just keep this in mind, if you're data moves in a pattern, try sine, play around with sine, play around the amplitude, and try to get that r-squared nice and high. All right, have fun with the data. No wrong answers. Play around with it. Do your best and always be able to justify the model and don't pick things at random. I'll say it one more time just because correlation does not imply causation. [inaudible] great job on this video, I'll see you next time.