Tom, Tapestry’s Director of Analytics, is a nerd. He likes nothing better than diving deep into a topic and getting lost in its eccentricities. Someone once made the mistake of asking him about research theory and, 15 years later, he’s still going. We challenged him to condense these ideas into a series of blog posts: He is geeking-out so you don’t have to.
People who join Tapestry at the very start of their careers are always full of questions. “How do I access my email?” they ask. “What’s a ‘deck’?” they cry. “Do I have to laugh at Tom’s jokes?” they complain.
(Yes, you do.)
Another common question we get is “what on earth is weighting?”. When I started my own career back in the mists of time no one explained weighting to me, I had to figure it out on my own. So, in the interest of not repeating the mistakes of my forebears, I’m going to walk you through it now.
In my last post I talked about sampling and ways we can make our sample more like the population it’s supposed to represent. If you’re not clear on the ideas behind sampling, it’s worth going and reading that post. I’ll wait.
Clear now? Great.
One of the ways (aside from sample size) we can make a sample more like a population is called stratification. This process chunks the population into smaller groups which have known characteristics and then samples within those groups so that our sample matches the population’s own strata. It’s probably clearer with an example.
Let’s take our bag full of balls out again, but this time we’ll mix up the shapes. As before we have 20 red shapes and 5 blue shapes, but this time 10 of those shapes are cubes, and 15 are spheres. We could ignore shape and take a random sample of 5 objects, and we’ll get a mixture of reds, blues, cubes, and spheres, but this isn’t a great representation of the population. Instead, we could account for shape and match our sample to the population: take 2 cubes (40% = 10/25 = 2/5) and 3 spheres (60% = 15/25 = 3/5). By matching these groups our sample is a better representation of the population of our bag.
In market research, an easy and common stratum (oh yes, strata are plural, like data) to set is age. For instance, we know from the 2021 census that, in England and Wales, 13.5% of the population is aged between 25 and 34 (data available here). Therefore, 13.5% of any sample we take from England and Wales should also be 25-34 years old.
Sometimes, however, that’s not possible. If our sample was 100 people, we couldn’t have 13.5 of them aged 25-34 as, while you might know someone 25-34 who you wouldn’t mind chopping in half, we probably couldn’t ask them many questions afterwards due to the blood loss. More commonly, our fieldwork is slightly off – we’ll get, for instance, 14% rather than 13.5% 25-34s, or we’ll sample 50% male/female while the population is 51% female (from the census again).
When this happens, or when violence is ruled out, we can use weighting to adjust the way we count our sample.
Unweighted, we count each of our shapes as one. One cube counts as one cube. Pretty simple, pretty obvious. Weighting changes all that and allows us to count respondents as fractions or multiples. If we have 1 cube but we know we should have 2 to match the population, we can weight our sample so that that 1 cube is counted twice. We then have to down-weight the spheres to 0.75 of a shape so that they, together, are counted as 3 objects rather than 4.
Our natural sample might have been 1 cube, 4 spheres, but our weighted sample now has 2 cubes and 3 spheres and matches the population we drew from.
Unweighted Sample | Weighting | Weighted Sample | |
Cubes | 1 | 2 | (1 x 2) 2 |
Spheres | 4 | 0.75 | (4 x 0.75) 3 |
Weighting is, very simply, a way of balancing a sample so that they reflect a desired distribution. If you get into statistics and start looking at weighted models and other things it gets a bit more nuanced, but the philosophy is always the same: adjusting the balance to match known information.
Pretty simple right?
When you have clear counts like this (we know 135 of our 1000 sample should be 25-34) it’s straightforward. Similarly, when you have 2 variables which are interlocked, and you know the figures. So, if we wanted to add in gender to our age division, we can find out from the census that 6.9% of the population is Female 25-34 and 6.6% is Male 25-34. We know the targets and can calculate the weights as needed. This is known as cell weighting or target weighting.
However, if we don’t know the relationship between variables things get harder. If, in the above example, we knew the gender split in the population and the age, but not the exact split for each age band, we can’t just use cell weighting. In these cases, we use a technique called RIM weighting (Random Iterative Method) or Raking. The underlying stats get complicated quickly, but it basically tries to find a weighting solution which distorts each variable (age, gender) as little as possible from the observed counts (from your data) in order to get to the target counts (in the population).
Now RIM weighting has its problems – if there is a strong relationship between two variables you want to weight (e.g. knowing about weighting and being great fun at parties), the algorithm won’t reflect that – but it is a pretty powerful, common and useful technique which you’ll be using from your first week in some way or another.
You do need to be aware of one other side-effect of RIM weighting: effective sample size. Remember that the weighting algorithm distorts the observed data to get to the target counts. While the average weight should be 1 and the total sum equal the sample size, what the algorithm is doing has the effect of reducing your sample size.
This is important, because all your assumptions about margin of error and statistical significance (a topic for another day) are based on this new sample size. Too big a distortion and you end up with a lot less sample than you started with and, importantly, have paid for.
Let’s say we have an original sample of 1000 but that is an imperfect representation of the population: someone wasn’t paying attention in fieldwork, so we have massively under sampled women, so we go ahead and weight it. Our weighting scheme distorts our data so that it matches the population, but in doing so delivers an effective sample size of 500. Suddenly, we have half the data we anticipated, and the margin of error is much wider than we hoped.
The point is that, like most things in life, weighting isn’t free. It’s powerful, common and very, very useful, but it has a cost. Now you understand what it’s doing and why we use it, you can decide whether it’s a price worth paying.