Observing participants performing tasks is the most important part of a user test. However, there are a few other things you'll want to do as part of your test, to make sure you've fully understood your participant's experience. Administering questionnaires to capture participant's reactions is something you'll do most, if not all of the time, and it will give you data that you can't get through observation alone. An important thing to keep in mind is that questionnaires are for quantification. You have a lot of questions that you'd like to ask your users, but a lot of these questions should be asked in an interview, which we'll talk about in other lectures. A questionnaire can't follow up on an answer that a user gives. A questionnaire can't clarify what it is that it meant to say, if the participant doesn't understand the question. A questionnaire can't ask why they answered a particular way, and a questionnaire can't read the body language or tone of voice. These are all things that you get from a conversation with a participant. Questionnaires that are used in user studies should be short, because what you want to do is, you want to maximize the time that your participants spend doing the tasks that you're assigning to them, and that they spend talking to you about their experience. What kind of things might we want to quantify through a questionnaire? Well, one thing we might be interested in, is the dimensions of diversity or the ways in which our participants differ from one another. We might be interested in how they differ, in terms of expertise, whether that's general computer expertise or expertise related to their particular system. We might be interested in how they differ, in terms of demographics like age, or attitudes about how they feel about technology, or how they feel about the particular domain that our system is operating in. Or we might be interested in how they differ, according to behaviors. So, for example, how frequently do they purchase online items if this is a shopping site, and those kinds of things. We also often want to quantify certain types of responses that we can measure numerically. We can measure numerically responses related to perceived usability or how usable did our participants feel that the system was. We might also be interested in measuring their perceived usefulness. How useful do they think this system would be for the things that they actually need to do? We might be interested in measuring other aspects of preference or desirability, such as aesthetic appeal or preference relative to competitive products, and so forth. For these types of data, you'll often also want to ask about them in the interview. Because remember, one of the most important things that you can learn from a user test is why did people have the reactions that they did. So, it may not be enough to ask, how usable did you feel the system was? But, why did you feel that way? That's something you can only do in an interview. It's okay to ask some of the same questions twice, because you're really trying to get at different aspects of the answers. In user tests, questionnaires are usually administered either before or after the tasks are completed, and sometimes both. When we administer questionnaires before the tasks, we're usually doing it to learn about the participants expertise, behaviors, attitudes, and demographics. For pre-test questionnaires, especially, you really want to keep them short, because you really want to get people moving on to performing the tasks, which is what you really want them to be spending their time doing. In order to keep it short, you want to make sure you're only asking about things that you will actually use in analysis, things that you actually think might impact the way that different people approach or accomplish the different tasks. So, while it might be easy to ask things like age, and gender, and education, and income level, and things like that, you should ask yourself, "Does it actually matter? Am I actually likely to see different behaviors based on those characteristics?" If not, leave them off of your questionnaire. These types of questions are often taken care of early in a user test, because you don't want to waste time later while the tasks are still fresh. So, after the tasks have been accomplished, you want to use that time to follow up on what users reactions were to the task, not other information like their age, or gender, or those kinds of things. Asking this information early in the test can sometimes help you interpret what you see during the tasks. It can help you to understand where your user might be coming from a little bit better and tune your eye to the types of things that you might see during the accomplishment of the tasks. Also, these types of questions are easy for the participants to answer and can help with warming up and getting them into the mode of being ready to perform the tasks. On the other hand, pre-test questionnaires are sometimes not done at all. You might not do them, because you have all the information you need from a screener that you deployed, in order to figure out if people were eligible for participating in the test. Or you might not do it, because you really don't expect to see differences among users that would impact performance based on things that you could measure through a questionnaire. Post-test questionnaires or questionnaires that are administered after a participant has finished the tasks, are much more common and almost always used in user testing. Because we're interested in quantifiable data, we're generally not going to use free-text responses in a post-test questionnaire. So, free-text responses are where you ask a question and then leave a blank text box or blank lines on a paper questionnaire that let people write whatever they want. These types of responses are not easily quantifiable. So, it's not going to be easy to turn them into numbers that you can use for computing averages, or standard deviations, or correlating with other data. They also take too long to fill out. So, if you compare writing text to clicking a check box or something like that, it's obvious that writing is going to take longer. You really want people to move through the questionnaires as quickly as possible, so that you can get to other parts of the test. Generally speaking, anything that would require a free response is better suited for an interview, where you can follow up and you can ask further questions, and engage in conversation with your participant. You're going to get much better data that way. An example of a type of question that you would be likely to see on a post-test questionnaire is one like this, "It was easy to learn to use this system." The user is presented with a range from strongly disagree to strongly agree. This question will give you a number between one and five, that indicates their level of agreement. Across a number of user test sessions, this will give you a series of numbers that can be averaged, sorted, sliced, and so on and so forth. An example of a question that we don't want to use on a post-test questionnaire would be a free text one like this, "How easy was it to learn to use this system?" We don't want to use this because, first of all, it's likely to yield ambiguous answers. You might get answers like, "Kind of." Also, this kind of response can't be aggregated easily. It's not going to be easy to summarize the responses that you get to a question like this, at least compared to a numerical response like in the previous example. The good news about post-test questionnaires, is that there are a number of examples that have been developed and used by many researchers over many years, that ask many of the questions that we would like to ask in a user test. So, several of these usability questionnaires have been developed by research labs, by companies. They vary in terms of the length, the level of detail about user's responses to usability, the level of modularity. That means the ability to ask different types of questions about different usability issues. The level of validation that they've received and also, in some cases, the cost. So, here's just a small example. There's actually many more than this but this is a good example to just indicate the range of commonly used usability questionnaires that are already available and that you could incorporate into your user test without having to do the extra work to develop the questions yourself. So these examples range from SUMI, which has 50 questions, has a reliability of 0.92 which is very high. One is as high as it gets. Reliability refers to how consistently people answer the questions over multiple instances of taking the questionnaire. So, high reliability is good. It means that the questions actually make sense and actually get at what they're trying to ask. The downside to SUMI is that it costs quite a bit of money to get a license to use it. However, there is a version that is free for students. QUIS is a standard usability questionnaire that was developed at the University of Maryland. There's two versions, a short and a long version. Short version has 41, the long version has 122 questions. Reliability is in the 90s as well, and also it's available for a reduced fee for students. A couple of free options are the PSSUQ which has 19 questions, and the SUS which has 10 questions. I'm going to say a bit more about the SUS or the System Usability Scale. This is a questionnaire that was developed in the 80s and it's been used many thousands of times in many many user tests. It consists of 10 questions that get at different aspects of user's reaction based on the usability of a system, and essentially it measures the perceived usability of a system based on somebody's use of that system. The 10 questions are shown here. I think that I would like to use this system frequently, I found the system unnecessarily complex, I thought the system was easy to use and so forth. One thing to notice about this questionnaire is that the questions alternate in terms of whether a positive response is desired or a negative response is desired. So, I think that I would like to use this system frequently for good usability, you would expect a positive response. I found the system unnecessarily complex for good usability, you'd expect a negative response. Alternating between positive and negative framing is good questionnaire design because it checks to make sure that the respondent is paying attention and thinking about each answer. It does, however, make it a little bit more complicated to score this questionnaire. So, the way scoring for the SUS works is that odd questions are positive, and so the score is the answer that the user gave, minus one, so you end up with a number between zero and four. Even questions are negative, so the score is five minus the answer that they gave. Again, you end up with a number between zero and four. To get a nice round number between zero and 100, you then multiply each of those scores by 2.5 and add them up, you end up with a score between 0 and 100. Across many many usability tests, the ballpark numbers that you're looking for are 68 being an average usability score, below 50 means that you've got some pretty big problems, and above 80 means you're probably doing pretty well. You could think of that as being an A. So, here's an example of how you would score an SUS. So, shown on the left is a hypothetical set of responses. So, you can see that the user gave a four for I would like to use this system frequently, a two for I found the system unnecessarily complex and so forth. If we look at the table on the right, we see the formula for each of the questions. So, R here means the response that was given. So, in the first question, that's R minus 1, and the second question that's 5 minus R. The adjusted score is shown in the middle column and the weighted score where we multiplied the score by 2.5 is shown in the right column. You take all the numbers in the right column and add them up and you get your total usability score according to the SUS, which in this case is 67.5, which is about average. So as I said, the SUS measures perceived usability. It's useful, it's simple, it's free and it's extremely widely used. It's worth noting that perceived usability correlates weakly with task performance. So, people can perform poorly on tasks but still think that a system is usable, and they can also perform well on tasks but still think a system is not usable. So, it's useful to measure the perception of usability in conjunction with the actual usability that's measured from task performance. Perceived usability is linked to adoption decisions. So, perceptions that people have about the usability of a system will influence whether or not they start using the system in the first place, or whether they use it after their first experience with this system. Actual usability or the ability to accomplish tasks and to use it efficiently is probably more likely linked to abandonment decisions. So if something is not usable but perceived to be usable, they might start using it but then abandon it after a little while once they realize that it's not actually helping them get done what they need to get done. In this lecture, we've looked at the role of questionnaires in user testing. You could administer questionnaires before or after your participants perform tasks. Your questionnaires can include both custom and standardized questions. Using questionnaires is a great way to get some hard data about users' subjective reactions to using a system. You're going to want to include questionnaires in most of your user studies.