Welcome back. With any software development team, you'll build new things because you have a hypothesis. It's your job to take that hypothesis and phrase it as a question you can answer with data. Part of your job might be to decide how to measure the success of a new thing. To do that, you'll need to figure out how to measure the user behaviors you're trying to affect. You'll use SQL to figure out how many times users did certain actions. In the exercises for this section, you'll be asked to create a metric that is pretty clearly tied to business value. It relates to users buying things from our imaginary store. Okay. Let's get started creating a new metric. We are going to talk about test metrics. Why? Because your team is always building new things. They build each new thing because they had a hypothesis. It's your job to take that hypothesis and phrase it as a question you can answer with data. Part of your job might be to decide how to measure the success of a new thing. In order to do that, you'll need to figure out how to measure the user behaviors you were trying to affect, and you'll use SQL to figure out how many times users did each of those actions. In the exercises for this section, you'll be asked to create a metric that is pretty clearly tied to business value. It relates to users buying things from our imaginary store. What we're hoping to do is measure a change in the user experience that is caused by this change in the feature. We have a record of when the user saw or would have seen the new experience and in order to get any meaningful conclusion out of this analysis, we only want to look at the things that were different about the populations after being exposed to the treatment. But, in exploring the test assignment event in 4.3, you might have noticed that the treatment events don't all occur at precisely the same time. This is expected. Remember that a user needs to be eligible to see the new feature, and then they need to reach a trigger where we decide which treatment they get. We didn't expect that every user would be online reaching the trigger at the same time. We also have timestamps for the interactions we'd like to measure. So, it isn't such a big deal that users are in the experiment and not all at the same time. We can use SQL to do all of this heavy lifting for us. In order to analyze an AB test, you'll need to capture all three categories to get a full picture. Feature level engagement, did users interact with the product you altered in a new way? Overall engagement, did the feature change or alter the way users interact with the whole site? Business metrics, how does the company make money? Did this change affect the business costs or ability to collect revenue? In our AB testing primer, I hinted that sometimes it might make sense to look at a binary metric. A question we wanted to measure was, will I get a seat on the train? So it made more sense to ask how often is there at least one seat open? More sense than asking what's the average number of seats on the train. You may also find that some metrics are not very responsive to small changes. In monthly subscription, businesses changes to revenue may take a full month to be visible. Whereas metrics like cancellation might be more immediate. Keep this in mind when you pick your metrics. Let's go through our example case studies from 4.3 and talk about the metrics we might care about. In our simple case, we've got this new welcome email. The engagement metrics that we care about are email opens, email link clicks, and site visits. For business metrics, we care about the order placed binary. Did a user place an order? Yes or no? Or one or zero. The total number of orders placed and also the revenue. In our slightly more complicated case with push notifications, we care for the engagement metrics about push notification opens, mobile app visits, and item views. The business metrics are pretty much the same. Order completed, total orders placed, revenue et cetera. Okay. Now that we have some examples we can reference, let's talk about some of the tools that we can use to improve our analysis. One tool is a time box both features in my example represent a ephemeral experience. The user receives an extra bit of information that's emailed or pushed to them. But after being viewed, we can expect that that information is dismissed. The changes from the feature like this are short-lived. They might affect the user's actions in the days after the exposure. But the effect after a month or a year might be less dramatic. We can look at changes in behavior that occur in a short window after the exposure i.e we can time box the metrics. We can also trim metrics. For the welcome email, the goal is to get users to browse items. If we used items viewed as a metric, we could end up with a few users in each group who browsed enormous quantities of different products that could potentially move the average for the whole group, and will certainly increase the standard deviation. For a metric like this, you could cap the values or you could trim off the top percentile of users in order to get a read on the changes occurring for the typical user.