Cheese of the Month Club Test

A consultant ran an experiment for the cheese of the month club. Version B added the following: users who did not complete a purchase and visited a different produce page saw a button in the top-right corner labeled “Don’t Forget Your Club”. Clicking this button resumed the checkout process at the point where they left off, without requiring the same information to be re-entered. The consultant measured the spend per user to determine success.

A
A
B
B

Summary

head(df)
##   visit_id user_spend variant
## 1    44222          0       A
## 2    59016          0       A
## 3    74729          0       A
## 4    96396          0       A
## 5    11771          0       A
## 6    33865          0       A
## # A tibble: 2 × 4
##   variant  mean    sd count
##   <chr>   <dbl> <dbl> <int>
## 1 A        4.07  13.9  3401
## 2 B        5.33  16.9  3426

Analysis

t.test(user_spend~variant, data=df)
## 
##  Welch Two Sample t-test
## 
## data:  user_spend by variant
## t = -3.3702, df = 6594, p-value = 0.0007555
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -1.9935691 -0.5272799
## sample estimates:
## mean in group A mean in group B 
##        4.071233        5.331658

For variant B only:

dfB<-df %>% filter(variant=="B")
t.test(dfB$user_spend)
## 
##  One Sample t-test
## 
## data:  dfB$user_spend
## t = 18.475, df = 3425, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  4.765840 5.897476
## sample estimates:
## mean of x 
##  5.331658

Costco Ad Test

Costco ran an ad test on MSN.com. Version A (current version) showed a carousel of 12 items. Version B (test version) increased that carousel to 16 items. They measured the click through rate on the ad (if a user clicked on the ad or not).

A
A
B
B

Preliminary Sample Size Determination

Costco wishes to increase their click through rate on these ads from about 2% to 3%. Given that, they ran the following preliminary tests to determine the appropriate sample sizes. Costco uses a company wide power=0.85 standard.

power.t.test(n=NULL, power=0.85, delta=0.01, sd=0.15, sig.level = 0.05)
## 
##      Two-sample t test power calculation 
## 
##               n = 4041.239
##           delta = 0.01
##              sd = 0.15
##       sig.level = 0.05
##           power = 0.85
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.prop.test(n=NULL, power=0.85, p1=0.02, p2=0.03, sig.level=0.05)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 4375.416
##              p1 = 0.02
##              p2 = 0.03
##       sig.level = 0.05
##           power = 0.85
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Analysis

Here is the head of the data they collected.

head(df)
## # A tibble: 6 × 3
##   version click_yes user 
##   <chr>       <int> <chr>
## 1 a               0 26138
## 2 b               0 37266
## 3 a               0 90905
## 4 b               0 32635
## 5 a               0 03776
## 6 b               0 75106

Here are some data summaries.

df %>% group_by(version) %>% summarise("mean"=mean(click_yes), "sd"=sd(click_yes), "count"=n())
## # A tibble: 2 × 4
##   version   mean    sd count
##   <chr>    <dbl> <dbl> <int>
## 1 a       0.0329 0.178  5014
## 2 b       0.0221 0.147  5014
df %>% group_by(version, click_yes) %>%  summarise(n = n()) %>%
  mutate(freq = n / sum(n))
## # A tibble: 4 × 4
## # Groups:   version [2]
##   version click_yes     n   freq
##   <chr>       <int> <int>  <dbl>
## 1 a               0  4849 0.967 
## 2 a               1   165 0.0329
## 3 b               0  4903 0.978 
## 4 b               1   111 0.0221

Here is the analysis.

t.test(click_yes~version, data=df)
## 
##  Welch Two Sample t-test
## 
## data:  click_yes by version
## t = 3.2976, df = 9675.5, p-value = 0.0009788
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
##  0.004367777 0.017171912
## sample estimates:
## mean in group a mean in group b 
##      0.03290786      0.02213801
prop.test(x=c(170, 115), n=c(4844+170, 4899+115))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(170, 115) out of c(4844 + 170, 4899 + 115)
## X-squared = 10.531, df = 1, p-value = 0.001174
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.004268717 0.017669855
## sample estimates:
##     prop 1     prop 2 
## 0.03390507 0.02293578

Etsy Tests

Etsy ran a simple test on its “Add to basket” button, changing it from black to yellow. Etsy measured the clicks on each button color (if a person clicked the button or not) hoping to see a increase with a more colorful button.

A
A
B
B

Summary

head(df)
##   variant click  user
## 1       a     0 24967
## 2       a     0 75361
## 3       a     0 96956
## 4       a     0 66919
## 5       a     0 75555
## 6       a     0 11708
## # A tibble: 4 × 4
## # Groups:   variant [2]
##   variant click     n   freq
##   <chr>   <int> <int>  <dbl>
## 1 a           0  5798 0.966 
## 2 a           1   202 0.0337
## 3 b           0  6405 0.976 
## 4 b           1   155 0.0236

Analysis

iter=100000
#version a
a=186+1 
b=5814+1 
#version b
a1=157+1 
b1=6403+1
count<-c()
for (i in 1:iter){
A<-rbeta(1, a, b)
B<-rbeta(1, a1, b1)
count[i]<-ifelse(A>B, 1, 0)


}
pdiff<-sum(count)/iter
pdiff
## [1] 0.99222
prop.test(x=c(186, 157), n=c( 5814 +186, 6403+157))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(186, 157) out of c(5814 + 186, 6403 + 157)
## X-squared = 5.629, df = 1, p-value = 0.01767
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.001170649 0.012963497
## sample estimates:
##     prop 1     prop 2 
## 0.03100000 0.02393293

Individual intervals for variant A.

a=186+1
b=5814+1
qbeta(0.975, a, b)
## [1] 0.03569723
qbeta(0.025, a, b)
## [1] 0.02691118
prop.test(x=186, n=186+5814)
## 
##  1-sample proportions test with continuity correction
## 
## data:  186 out of 186 + 5814, null probability 0.5
## X-squared = 5277.2, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.02682818 0.03578336
## sample estimates:
##     p 
## 0.031