Question 1

Can I peek at results before the test ends?

Accepted Answer

Looking is fine; making decisions is not. Each peek with intent to stop early inflates the false-positive rate. If you must monitor for catastrophic regressions, use sequential analysis methods (like SPRT or Bayesian boundaries) that explicitly account for repeated looks. Otherwise commit to full duration.

Question 2

What's a reasonable minimum detectable effect?

Accepted Answer

Whatever effect would change your business decision. If a 2% lift is too small to bother shipping, don't power for 2%. Common defaults are 5–10% relative lift on the primary metric. Smaller effects exist but require dramatically larger samples that often aren't feasible.

Question 3

Should I split traffic 50/50?

Accepted Answer

Yes for two-variant tests — equal allocation maximizes statistical power for a given total sample. Unequal splits (e.g., 90/10) trade power for a quicker rollback if one variant is disastrous. Most tests benefit from 50/50; product-launch tests with high downside risk sometimes justify unequal allocation.

Question 4

What if I can't get the sample size I need?

Accepted Answer

Three options: relax the minimum effect (target 10% lifts instead of 2%), accept lower power (60% instead of 80%), or accept that some questions can't be answered with your traffic and fall back on qualitative methods. Pretending the sample is enough is the worst option.

Question 5

Is this calculator for one-tailed or two-tailed tests?

Accepted Answer

Two-tailed by default — you want to detect both lifts and drops. One-tailed tests halve the required sample but commit you to ignoring effects in the unexpected direction, which is rarely what you actually want.

Question 6

What if my baseline rate changes during the test?

Accepted Answer

Common in e-commerce — baseline shifts with seasonality, marketing spend, traffic source mix. Mid-test baseline drift doesn't invalidate the test if both variants experience it equally (random assignment ensures this). What does invalidate is if assignment becomes correlated with baseline — e.g., variant A serves only mobile users one day. Audit the assignment for balance, not just the totals.

Question 7

Can I test more than two variants?

Accepted Answer

Yes, with caveats. Multiple comparisons inflate false-positive rates unless you correct (Bonferroni, Holm-Sidak, or other methods). Each additional variant adds traffic requirements. For early-stage tests, prefer 2-variant — A/B is cleaner than A/B/C/D unless you have abundant traffic and a specific reason to test multiple alternatives.

Question 8

What's a Bayesian A/B test?

Accepted Answer

An alternative to frequentist hypothesis testing. Bayesian tests express results as posterior probabilities ("95% chance variant B is better") rather than p-values. They allow rigorous early stopping, which fixes the peeking problem, but require more sophisticated tooling and prior selection. For most teams, frequentist tests with strict no-peeking are easier to operate correctly.

Question 9

How do I handle low-traffic experimentation?

Accepted Answer

Three options: relax the MDE (you can detect 30% lifts but not 5%), use proxy metrics with higher base rates (clicks instead of conversions), or skip A/B testing for qualitative methods (user interviews, expert review). Trying to run statistically rigorous tests on 100 visitors a day produces noise. The calculator's output of '300 days needed' is a real signal, not a problem to ignore.

Question 10

Should I test interactions between features?

Accepted Answer

Sometimes yes. Multivariate testing (MVT) tests combinations of changes simultaneously. Useful for finding interactions but requires substantially more traffic than serial A/B tests. For most product teams, serial A/B testing of independent changes is the right default; MVT is a specialized tool for specific questions.

A/B Test Duration Calculator

Related Tools

About This Tool

Frequently Asked Questions