## The statistics of A/B testing pt 2.

Calculating Statistical Significance

Before I describe the hypothesis test used in Part I, I want to lay a foundation.

Let’s begin with the phrase hypothesis test, itself. A hypothesis test is a statistical procedure designed to test a claim. There are two parts to any claim being examined – the null hypothesis which is what is currently known and the alternative hypothesis which is what you are testing.

When I was learning about statistical testing, the term null hypothesis used to confuse me, but the more I began to wrap my head around it the more I realized that the null is simply the status quo. Using the example in part I, the two claims being tested are Page A and Page B. The null hypothesis for this is simply that both A and B have the same efficacy, that is they result in statistically the same number of conversions. Or put another way, there is no statistical difference between them. The alternate hypothesis on the other hand is the idea that we are concerned with. Again going back to the example in part I -we are looking at the data to determine if Page B converts better than page A, so the alternative hypothesis is Page B is > Page A at conversion (or comparably page A is < Page B – it performs worse.)

Now that we have established the null hypothesis- Page A is Equal to Page B, and the alternative hypothesis- Page A< Page B, we can begin our test.

We need to the following items to arrive at our result (*Warning a little bit of math below*):

1) A Z score to percentile conversion table. (The chart here is for IQ tests, but it has the data we need. Ignore all columns except Z score and percentile). What is a Z score you ask? The Z score, also called the standard score, is the relative position of a single value on the bell curve of all values. Anyone who has ever been in a college class is familiar with the idea of a bell curve, or normal distribution. In statisical studies that are conducted correctly, the data also tends to follow a bell shape:

The Z score (or standard score) is the number of standard deviations away from the center of the bell curve (the mean), that a particular data point falls and can be correlated to a percentile. Standard scores are great because you don’t need to know the specifics of the data once you have calculated them. 2 standard deviations above the mean or the 97^{th} percentile means the same thing to everyone.

2) The number of people participating in each segment of the the A/B test (n1 and n2) for Page A and Page B respectively.

- n1 = 31500, n2= 33500

3) The sample proportion of each sample. In our case this is conversion rate for page A (p1) and Page B (p2)

4) The overall sample proportion which is the total number of individuals from each sample who have converted. In the A/B test from part one this can be calculated by dividing the total *number *of conversions for both tests by the total number of people who saw both pages:

5) The following formula for calculating the test statistic for the two population proportions which is used to calculate the Z-score:

From substituting the correct values into the test statistic equation we get

In this particular case, where the alternate hypothesis is Page A < Page B the test statistic is the Z-score. And it corresponds to a percentile of 11.51% But what does that mean?

If we subtract 100 from the percentile we get 88.5%. This is probability that the null hypothesis is false- or that Page A is not equal to Page B. While it is high it does not quite meet the threshold of statistical significance (95% certainty), therefore conclude that the two tests are not statistically different.

So there you have it – we plowed through an uncertain situation, and using statistics came up with a definitive business decision.