A/B Testing, But Without the Guesswork
Running an A/B test is easy. Knowing if it actually worked? That’s where things get messy. You could eyeball the results and hope for the best, or you could use actual statistics.
That’s why I built an A/B Test Calculator. It cuts through the noise and tells you whether your test results mean anything. If you’re curious about how it works, here’s the breakdown.
The Simple Version: “Did My Test Work or Not?”
You test two versions of something. The original (control) and the new version (variant). Each has a conversion rate.
The calculator checks if the difference between them is big enough to matter. If it is, it picks a winner. If it’s not, you probably need more data. That’s it.
If that’s all you need to know, go run your test. If you want a bit more, keep reading.
The Medium Version: “I Like Stats, But Let’s Keep It Light.”
Here’s what happens behind the scenes.
Estimate Conversion Rates
For each group, conversion rate = conversions / total visitors
Example: If 50 out of 1,000 users converted, the conversion rate is 50/1000 = 5%
Build Confidence Intervals
A 95% confidence interval estimates where the true conversion rate might fall:
Formula:
CI = conversion rate ± 1.96 * sqrt((conversion rate * (1 - conversion rate)) / total visitors)
This accounts for randomness in the data.
Run a Z-Test
The test checks:
Null hypothesis (H0): No difference between control and variant
Alternative hypothesis (H1): A real difference exists
First, we calculate the pooled proportion, which estimates an overall conversion rate
pooled proportion = (conversions in control + conversions in variant) / (total visitors in control + total visitors in variant)
Then, we calculate the Z-score, which measures how unusual the difference is:
Z = (conversion rate of variant - conversion rate of control) / sqrt(pooled proportion * (1 - pooled proportion) * (1 / visitors in control + 1 / visitors in variant))
Check if the Difference is Real
The calculator looks at the p-value, which tells us whether the difference is just luck.
If p < 0.05, the difference is statistically significant.
If multiple variants meet this threshold, we pick the one with the highest relative lift:
Lift = (conversion rate of variant - conversion rate of control) / conversion rate of control
The PhD Version: “Give Me the Details, Including the Flaws.”
This Method Assumes a Few Things:
Large Enough Sample Size
The normal approximation works well when sample sizes are big enough. For small tests, exact methods or Bayesian approaches are better.
No Multiple Testing Corrections
If you test several variants at once, the false positive rate increases. Adjustments like Bonferroni or Benjamini-Hochberg help, but this calculator doesn’t apply them.
Independent Observations
If the same users see multiple variants, or if their behavior influences each other, the results may not be valid.
Not Built for Peeking
Checking results mid-test increases the chance of false positives. If you need a method that accounts for this, sequential analysis is a better fit.
Why This Calculator Uses Frequentist Stats
Frequentist methods are simple, widely used, and don’t require prior assumptions. Bayesian stats are great but introduce subjective priors that can influence results.
Try It for Yourself
No need to take my word for it. Try the A/B Test Calculator and see how it handles your data. Whether you’re testing a new design, a product feature, or the best way to make coffee, this tool gives you an answer based on numbers instead of intuition.
Good luck testing.