Why Small Data Sets Show More Volatility

A client just asked me to write down for them, on an email, a concept that I had explained in our meeting. I figured, maybe other people could benefit from this explanation too so here we go, here’s a little blog post on it!

The Concept

When probabilities are involved, the smaller your dataset, the greater the level of volatility in the data.

Real World Application For Us Marketers

A very real-world way this applies to us as digital marketers: when you look at your day-to-day conversions, you will see more volatility. For example, maybe one day conversions are 15, then the next day 5, then the next day 25, and so on. Compared to the the first day, conversions are varying by 10 out of 15 which is two thirds or 67% variability! Seems like a lot when you put it this way.

But, if you zoom out to your weekly level, you might see that week-on-week volatility is lower. Maybe one week gets 105 conversions, then the next week gets 90, then the next week gets 110, etc. You’re not varying by two thirds anymore, in this example the maximum difference, between 105 and 90, is only 14%.

And then if you zoom out even more to the monthly level, the volatility might be much lower and your month-on-month conversions volumes might look a lot more stable. The percentage difference between month-on-month conversions often ends up looking lower.

What’s Going On? The Coin Toss Experiment

To illustrate what’s going on, let me first get you to imagine a normal coin. I will do this as a thought experiment, but you can if you want actually do this for real and verify what I say for yourself.

What is the probability, when you toss this coin, that it will land heads?

50%, right?

OK.

So, say you toss the coin two times. That mean’s you will get 50% x 2 = 1 heads and 1 tails result, right?

Ummmm no. Well, maybe, yes. But also maybe, no.

You might actually get 2 heads or you might get 2 tails. If you got 2 heads you might conclude this coin does not have a 50% chance of heads, it has a 100% chance. But if you got two tails, you might conclude that this coin has 0% chance of heads.

Of course, if you concluded that you would be wrong. And, in order to get a more accurate estimate of the probability, simply toss the coin more times!

If you toss it 10 times, the number of heads you get will likely be much closer to 50%. If you toss it 100 times, the number of heads will be even closer to 50%. And if you tossed it 1,000,000,000 times, the number of heads you get would be so close to precisely 50%, that the difference would be negligible.

So, in my original language, we could say that when we use a small data set (toss the coin 2 times), we get volatile data. Sometimes it tells us the chances are 50%, but equally sometimes it tells us the chances are 100% or 0%. If we toss it a billion times, we increase the data set a lot, and now we can get an estimate of the probability that is much much closer to the truth.

Bringing It Back To PPC

The number of conversions per day that you get is dependent on a bunch of probabilities. Namely, the probability that someone clicks your ad (AKA the Click Through Rate or CTR) and the probability that once they have clicked, they convert. When you only look at a small data set, e.g. daily conversions, it’s like you didn’t toss the coin many times and so your volatility might be high. When you look at a larger data set, e.g. you expand your date ranges to see more conversion data per grouping, it’s like tossing the coin lots more times. You get a more and more accurate estimate of the actual chances, and this smooths out the data somewhat.

Some Fun With Probabilities

I’m a huge Derren Brown fan boy, not only is he very entertaining he’s also a super smart, reason-based thinker who can help you understand more about how probability works in a fun way! Here’s a “trick” he did some years ago. On a one-take camera shot, he predicted and then tossed a coin and landed it ten times in a row on heads. The coin was a perfectly normal coin. It was not weighted. It was a one-take shot, no camera tricks, he REALLY did this, no faking. How did he do it?

 
 

Maybe now that I have gotten you thinking about coin toss probabilities and data volatility… you can figure out how this was done?

If not, don’t sweat it. I couldn’t either, until I saw his explanation. Here it is:

 
 

It is marvellously simple once you know how it is done. He just stood there, pretty much all day, tossing the coin thousands upon thousands of times, until eventually he got a ten streak of heads.

If you toss the coin enough times, even with a completely even-weighted coin, eventually it becomes very probable that somewhere along the line, you toss ten in a row. Heck, if you tossed it enough times, you could make it likely to eventually, somewhere in those (probably millions) of tosses, to get 100 heads in a row.

In Conclusion

Bringing this back to the marketing world again, every day when you look at your campaign data you are dealing with probabilities, exactly like tossing a coin. The only difference is we are not so sure what the true probabilities are, we have to guess at them based on the data (coin tosses, conversions, clicks, etc) that we have.

The more impressions and clicks we have, the better our estimate of the CTR probability. The more conversions we have, the better our Conversion Rate probability estimate. Because these are merely estimates and we can never know the true probability (we can only ever know increasingly accurate estimates as we get more data). We are always playing a guessing game, to a degree.

When we look at things like our conversion data, we have to remember that this data is dependent on probabilities, and thus the more data we collect, the more certain we can be that it is a good estimate and the less volatility we are likely to see.