How To Set Up Your First Website Experiment
Experimentation is of the most under-used tools in business today, but those who do use it often lead the pack.
Today my goal is to establish a simple framework for how to run an experiment on your website.
This can apply to any channel, paid, organic/SEO, direct, you name it. I’ve spent most of my experimentation experience working with paid and organic channels to improve performance, but I’ve also used testing to validate new feature launches, drive new users to a new product and collect previously unseen data.
Here’s what I’m looking for when I run an experiment on my website:
A minimum of 1,000 users per test cell
This means at least 2,000 users in the test, but ideally more. If you can’t get at least 2,000 you might have a hard time getting significant results.
One very specific success metric
This can be number of clicks, levels of engagement, leads created, form fills, whatever. But it has to be clearly identified at the beginning of the test, typically it’s highlighted in your If, Then hypothesis.
Your hypothesis
What are you testing? If you do something, then something happens. If I replace a button with a form, then more users will fill out the form. If I move the form higher on the page, then more users will fill it out. You should also consider your null hypothesis, but for testing basics I wouldn’t get hung up on it.
A very specific timeline
Depending on the channel this is a variable number, but usually I don’t like a test to run for longer than 4 weeks, and usually that applies only to organic tests because that experience is less controllable. If I’m working on a paid channel test, 2 weeks is generally enough time to get the right data. Don’t let your test go longer than the timeline established at the beginning, that can make your results less statistically viable. There are a lot of reasons why, but I won’t dive into them here, but I’ll link some resources:
A target confidence level
Having a target confidence level is crucial to experimentation success, you need to know how likely your results are to be correct. When we talk about confidence levels, we’re basically saying “In this scenario, with all these specific variables, we expect to see this result repeated [confidence level]% of the time”.
This is an important footnote because that means even after running a test, it’s still possible that something else happens.
Generally we shoot for a 90-95% confidence level, which means we are saying that in 90-95% of the times a user engages with this page, this result will happen, but that means that 5-10% of the time, something else could still happen.
I love these tools for simple numbers.
Having both a specific target timeline and confidence level is crucial
Sometimes testers let a test run longer to try to achieve the target confidence level, but that’s not technically the best way to test. By having a strict timeline and target confidence level you can explain what happened during the time you were testing and how confident you are during that timeframe.
It’s possible that your two week test finishes and you’ve only hit 75% confidence. That’s crucial, because you are now 75% confident that this result is correct, and 25% unsure.
This data is super valuable because maybe it teaches you that you need a retest or maybe the confidence you have is enough to make a decision. Jeff Bezos has a quote that is something like this “you should make the decision as soon as you have 70% of the information you need to make that decision” so maybe 70% confidence is enough. He also says that high quality decisions should be at 90% of the info.
So when you make your call is up to you.
Conclusion
With this framework and these loose rules you can at least start to test your way into success.
I’ll add this footnote. There are a lot of testing professionals out there, in business and academia and all of them will have different standards for testing. I don’t think the framework I’ve suggested here today would tap into the nuance of testing that those professionals would suggest, but I do think you need to start somewhere.
In my personal experience, just getting started has brought other questions to the surface and allowed me to do much deeper dives into specific areas of testing and really refine and improve.
Let me know what you think about this framework, I’ll also add some links to other testing resources that I’ve found to be really helpful.
SEO experiments at Airbnb, super insightful, but a little advanced.