Tom, Tapestry’s Director of Analytics, is a nerd. He likes nothing better than diving deep into a topic and getting lost in its eccentricities. Someone once made the mistake of asking him about research theory and 15 years later he’s still going. We challenged him to condense these ideas into a series of blog posts: He is geeking-out so you don’t have to.
Something we get asked a lot is “does size matter?”. Rather than a very personal question, what our clients and partners mean is ‘how many points should there be on an answer scale’.
First, let’s be clear what we’re talking about when we say an ‘answer scale’. This is the number of answers we offer a respondent when they answer our questions. Let’s look at an example.
How much do you agree with the statement “Ice cream is cool”?
1. Strongly Disagree
2. Slightly Disagree
3. Neither Agree nor Disagree
4. Somewhat Agree
5. Strongly Agree
This is known as a rating scale (not a Likert scale, that’s a whole bunch of rating scales – more on that another time). It allows respondents to answer easily and allows us to quantify an inherently qualitative opinion. This is one of the basic tools of the questionnaire writer and you’ll see them in many different contexts.
Some researchers use a 5 point scale, many swear by 4, a few even go rogue and suggest 10. But is there a best length?
Yes. 5 or 7.
You can take my word for it and stop reading there but if you want to stick around, we’ll have a look at why. There are actually 2 parts to this: midpoints (that awkward ‘Neither’ in the middle) and scale length (how many options there are). We’ll look at each one in turn.
Like all good storytellers, I’ll start in the middle.
The big reason to include a mid-point is that it gives you a meaningful 0. This 0 helps limit the elephant in the room: In using scales for statistical analysis, we assume the gap between each point is the same, making it interval data when it is, in fact, only ordinal (at least, we do in marketing and social sciences. ‘Proper’ statisticians will tell you it’s only ever ordinal, but they have no joy in their lives). Established wisdom is that a mid-point, and the 0 it supplies, ensures the interval step. It’s not perfect, but it has worked for the best part of a century.
One common argument against the midpoint is that, by excluding it, you ‘get people off the fence’. However, a study by O’Muircheartaigh et al (2000– don’t make me try to pronounce that name though) found that by excluding the mid-point you don’t force people to tell you what they really think, you simply make them select a point on either side at random.
Probably the best argument against them is that the ‘neutral’ answer is ambiguous: it could be neutral, a ‘don’t know’ or an ‘undecided’. If you give people the ‘I don’t know’ option as well as a mid-point in the scale then the number of people selecting the middle goes down. However, leaving out a midpoint won’t solve this either and dumps all that ambiguity into one of the actual answers unless you also include a ‘don’t know’ option, which defeats the purpose of leaving out the mid-point.
That brings us to the other common reason for leaving out a midpoint: that it makes the ‘yes’ numbers bigger. People choose randomly and people like to agree with things (acquiescence bias). 2+2 = 4: the random answers will naturally be positively skewed. For some research you may well want a nice, big ‘yes’, but you should know it’s artificial. Instead of collecting valid data, a mid-point free scale collects knowingly inflated numbers, designed to exaggerate an effect.
Excluding the midpoint increases people just saying ‘yes’ or, at best, makes people who don’t care or don’t know randomly choose an answer. Including a midpoint protects your data from noise, validates further analysis and preserves the accuracy of your results. It really is a no-brainer.
Now, let’s turn to the number of options in a scale. Remember, because we are including a midpoint, it will be an odd number. Almost all research I’ve read on this suggest 5 or 7 is optimal.
One paper (Revilla et al., 2014) reports a consistent decline in validity as the items went from 5 to 7 to 11. Now, there are some issues with their methodology but the general trend is echoed in most research. As a general guide, the choice for rating scales should be 5 or 7. Very, very few come down in favour of 11. As in a review in 2010: “Research confirms that data from [rating scales] becomes significantly less accurate when the number of scale points drops below five or above seven (Johns, 2010, p6)”.
People sometimes claim that, by giving respondents longer scales, you get more granular data and therefore more insight. However, research shows that people’s answers are paradoxically less reliable on a longer scale. People tend to aggregate their responses and merge numbers together.
You can show this by asking someone to rate how happy they feel on a scale of 1 to 10. Conversationally, the result is you get an answer that is “about a 7, maybe an 8” (happiness scores at Tapestry are much higher than average). The point is that, rather than giving more granularity, you’re actually making the task harder for the respondent and so they’re naturally chunking the scale into sections to help them answer. Their choice of 7 or 8 is, according to research, effectively random.
Now, having said all that, these results tend to show a fairly small effect on the things we usually care about in market research, namely, reliability in measuring opinion rather than stability over time. But by going for fewer than 5 or more than 7 points you are introducing needless noise into your data for little to no gain.
Always include a mid-point and don’t go above 7 or below 5 points in your answer scale.
O’Muircheartaigh, C., Krosnick, J. & Helic, A. (2000). Middle Alternatives, Acquiescence, and the Quality of Questionnaire Data. Harris School of Public Policy Studies, University of Chicago, Working Papers.
Revilla, M. A., Saris, W. E., & Krosnick, J. A. (2014). Choosing the Number of Categories in Agree–Disagree Scales. Sociological Methods & Research, 43(1), 73–97
Johns, R (2010), Likert Items and scales, SQB Methods Fact Sheet 1 (March, 2010).