There’s a well-known truth in usability testing that in order to catch the vast majority of usability issues in a given software you only need 5-8 participants. The reasoning is that the value of testing beyond that becomes increasingly less the more you test. The same number has been applied to user interviews.
My feeling is that the research behind the 5-8 participant number is far too often academic, which means that they are likely to be done on a homogenous sample of users (usually college students) and that ignores the heterogeneity of the general population. And we have some truly bad examples of what happens when product testing is too limited.
Mostly this number is cited when people are making the case that doing user research isn’t such a huge task, rather something that need to be incorporated into the process and done frequently.
Which I totally agree. Many people are daunted with the idea of user research because they envision a lot of effort and time and for many it is difficult to understand the value beforehand. Which leads to too little user research being done too late. So helping them overcome the hesitation is very important so that they can embrace user research as a part of their process. The 5-8 participant number goes a long way with that.
BUT (and that is a big but) that puts a really big weight on who these participants are.
Let me bring you (back) to a basic concept of research methodology: Validity.
Whenever we do research we need to be concerned about the validity of our research. Research validity is often discussed in two parts, internal and external. I’ll leave internal validity for now because it’s the external validity that comes into question when recruiting for user research.
External validity is whether you can generalise your findings over your population. The big question here is this: Using just 5-8 participants, can we assume the findings apply to our target audience? Yes, you may find issues, yes you’ll have insights, but are they relevant to the majority of your target audience?
What I am saying is, when we’re developing software for residents in elderly assisted living homes, testing it out on our office mates is not going to tell us anything about how the residents will respond to it. Even if our office mates have excellent relationships with their grandparents.
Taking another example, when we're developing an app to help people understand what to do in a weather emergency, testing it out on our office mates will give us very limited insight into how members of the general public (who may already be experiencing adverse weather effects) will respond to it. Our office mates probably know too much about the app and their education and literacy level is above average, to name just a few things which makes them less than ideal participants.
In short: The quality of the insights you gather from user research is directly proportional to the quality of your participants.
If we are not careful with picking these participants we are effectively diminishing the value of the insights we are gathering. We are performing UX theatre. We convince ourselves that we are working with user insights and therefore that we are user centric, but the insights we have gathered tell us little about what our actual users think and do.
What happens all too often is:
Convenience sampling: Picking participants that are easy to find (family, friends, colleagues) and they may not be representative of your target audience. I’ll go out on a limb and say that they usually are not, they are in all likelihood too much like the person who asked them to participate. They are convenient to find because they are inside our bubble.
Repeat sampling: Representative users are made available to the team as participants, but they are used over and over again. By which time they become too familiar with the product to provide valuable insights.
Undersampling: Testing 5-8 participants even if the target audience includes 2 or more independent user groups (say patients vs healthcare professionals). In such cases you need participants from each group and you need to make sure that you have representative samples of each. You don’t need to go full 5-8 for each group, 3-4 will suffice.
The flag that I am raising is this: Yes, you can use just five to eight participants, but WHO THEY ARE is key.
Be very careful in defining who is your target audience and picking participants that are truly representative of your audience.
Pro-tip in defining your target audience:
Think of the value that your product brings. Who is looking for that value? What distinguishes them from everyone else? How would you recognize them if you started chatting with them at a conference or family gathering?
Recruiting based on demographics (age, gender, income, etc) will ensure some diversity in your test group but the main focus should on their commonalities with the defining factors of the target audience (experience with product, role, knowledge of problem domain, etc).
If you have a very wide definition of target audience (for example if you're working on eGovernment software), focus on marginal groups who need most support. This will gather insights that help you serve a wider audience.
If you have truly integrated user research into your process and you do it frequently, you may want to define your target audience narrowly for the sake of a particular test, for example if you want to understand why your product is not being well received with a particular group or to verify accessibility to take a couple of examples.
Comments