A consultant ran an experiment for the cheese of the month club. Version B added the following: users who did not complete a purchase and visited a different produce page saw a button in the top-right corner labeled “Don’t Forget Your Club”. Clicking this button resumed the checkout process at the point where they left off, without requiring the same information to be re-entered. The consultant measured the spend per user to determine success.
## visit_id user_spend variant
## 1 44222 0 A
## 2 59016 0 A
## 3 74729 0 A
## 4 96396 0 A
## 5 11771 0 A
## 6 33865 0 A
## # A tibble: 2 × 4
## variant mean sd count
## <chr> <dbl> <dbl> <int>
## 1 A 4.07 13.9 3401
## 2 B 5.33 16.9 3426
##
## Welch Two Sample t-test
##
## data: user_spend by variant
## t = -3.3702, df = 6594, p-value = 0.0007555
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -1.9935691 -0.5272799
## sample estimates:
## mean in group A mean in group B
## 4.071233 5.331658
For variant B only:
##
## One Sample t-test
##
## data: dfB$user_spend
## t = 18.475, df = 3425, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 4.765840 5.897476
## sample estimates:
## mean of x
## 5.331658
Costco ran an ad test on MSN.com. Version A (current version) showed a carousel of 12 items. Version B (test version) increased that carousel to 16 items. They measured the click through rate on the ad (if a user clicked on the ad or not).
Costco wishes to increase their click through rate on these ads from about 2% to 3%. Given that, they ran the following preliminary tests to determine the appropriate sample sizes. Costco uses a company wide power=0.85 standard.
##
## Two-sample t test power calculation
##
## n = 4041.239
## delta = 0.01
## sd = 0.15
## sig.level = 0.05
## power = 0.85
## alternative = two.sided
##
## NOTE: n is number in *each* group
##
## Two-sample comparison of proportions power calculation
##
## n = 4375.416
## p1 = 0.02
## p2 = 0.03
## sig.level = 0.05
## power = 0.85
## alternative = two.sided
##
## NOTE: n is number in *each* group
Here is the head of the data they collected.
## # A tibble: 6 × 3
## version click_yes user
## <chr> <int> <chr>
## 1 a 0 26138
## 2 b 0 37266
## 3 a 0 90905
## 4 b 0 32635
## 5 a 0 03776
## 6 b 0 75106
Here are some data summaries.
## # A tibble: 2 × 4
## version mean sd count
## <chr> <dbl> <dbl> <int>
## 1 a 0.0329 0.178 5014
## 2 b 0.0221 0.147 5014
## # A tibble: 4 × 4
## # Groups: version [2]
## version click_yes n freq
## <chr> <int> <int> <dbl>
## 1 a 0 4849 0.967
## 2 a 1 165 0.0329
## 3 b 0 4903 0.978
## 4 b 1 111 0.0221
Here is the analysis.
##
## Welch Two Sample t-test
##
## data: click_yes by version
## t = 3.2976, df = 9675.5, p-value = 0.0009788
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
## 0.004367777 0.017171912
## sample estimates:
## mean in group a mean in group b
## 0.03290786 0.02213801
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(170, 115) out of c(4844 + 170, 4899 + 115)
## X-squared = 10.531, df = 1, p-value = 0.001174
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.004268717 0.017669855
## sample estimates:
## prop 1 prop 2
## 0.03390507 0.02293578
Etsy ran a simple test on its “Add to basket” button, changing it from black to yellow. Etsy measured the clicks on each button color (if a person clicked the button or not) hoping to see a increase with a more colorful button.
## variant click user
## 1 a 0 24967
## 2 a 0 75361
## 3 a 0 96956
## 4 a 0 66919
## 5 a 0 75555
## 6 a 0 11708
## # A tibble: 4 × 4
## # Groups: variant [2]
## variant click n freq
## <chr> <int> <int> <dbl>
## 1 a 0 5798 0.966
## 2 a 1 202 0.0337
## 3 b 0 6405 0.976
## 4 b 1 155 0.0236
iter=100000
#version a
a=186+1
b=5814+1
#version b
a1=157+1
b1=6403+1
count<-c()
for (i in 1:iter){
A<-rbeta(1, a, b)
B<-rbeta(1, a1, b1)
count[i]<-ifelse(A>B, 1, 0)
}
pdiff<-sum(count)/iter
pdiff## [1] 0.99222
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(186, 157) out of c(5814 + 186, 6403 + 157)
## X-squared = 5.629, df = 1, p-value = 0.01767
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.001170649 0.012963497
## sample estimates:
## prop 1 prop 2
## 0.03100000 0.02393293
Individual intervals for variant A.
## [1] 0.03569723
## [1] 0.02691118
##
## 1-sample proportions test with continuity correction
##
## data: 186 out of 186 + 5814, null probability 0.5
## X-squared = 5277.2, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.02682818 0.03578336
## sample estimates:
## p
## 0.031