Amazon Tests

Amazon ran a price styling experiment. Notice they increased the font size and coloring of the pricing. The experiment ran on multiple product pages. Amazon measured the click through rate on the add to cart button (click the add to cart button or not).

Preliminaries

First the Analysts at Amazon completed a preliminary sample size study. They were interested in finding a difference in click through rates of 5%. Their current click through rate is 3.45%

power.t.test(n=NULL, sig.level = 0.05, power=0.8, delta = 0.05)
## 
##      Two-sample t test power calculation 
## 
##               n = 6280.064
##           delta = 0.05
##              sd = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.prop.test(n=NULL, sig.level = 0.05, power=0.8, p1=0.0345, p2=0.036225)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 179954.9
##              p1 = 0.0345
##              p2 = 0.036225
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
A
A
B
B

Summary

head(df)
##   a b
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
df %>% group_by(variant) %>% summarise("mean"=mean(click), "sd"=sd(click), "count"=n())
## # A tibble: 2 × 4
##   variant   mean    sd count
##   <chr>    <dbl> <dbl> <int>
## 1 a       0.0329 0.178  5014
## 2 b       0.0215 0.145  5014
df %>% group_by(variant, click) %>%  summarise(n = n()) %>%
  mutate(freq = n / sum(n))
## # A tibble: 4 × 4
## # Groups:   variant [2]
##   variant click     n   freq
##   <chr>   <int> <int>  <dbl>
## 1 a           0  4849 0.967 
## 2 a           1   165 0.0329
## 3 b           0  4906 0.978 
## 4 b           1   108 0.0215

Analysis

t.test(df$click~df$variant)
## 
##  Welch Two Sample t-test
## 
## data:  df$click by df$variant
## t = 3.4995, df = 9628.5, p-value = 0.0004682
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
##  0.005000443 0.017735895
## sample estimates:
## mean in group a mean in group b 
##      0.03290786      0.02153969
prop.test(x=c(165, 108), n=c(5014,5014))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(165, 108) out of c(5014, 5014)
## X-squared = 11.809, df = 1, p-value = 0.0005896
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.004802437 0.017933902
## sample estimates:
##     prop 1     prop 2 
## 0.03290786 0.02153969

Etsy Button Color Test

Etsy ran this simple button experiment in February 2025 on their product page where they challenged their existing black add-to-cart button against a yellow button. Etsy measured the sales over a certain time period to determine if the color change helped.

A
A
B
B

Summary

head(df)
##   visit_id user_spend variant
## 1    44222          0       A
## 2    59016          0       A
## 3    74729          0       A
## 4    96396          0       A
## 5    11771          0       A
## 6    33865          0       A
## # A tibble: 2 × 4
##   variant  mean    sd count
##   <chr>   <dbl> <dbl> <int>
## 1 A        4.61  15.2  3401
## 2 B        5.89  18.3  3426
boxplot(user_spend~variant, data=df)

Analysis 1

t.test(user_spend~variant, data=df)
## 
##  Welch Two Sample t-test
## 
## data:  user_spend by variant
## t = -3.1353, df = 6618.9, p-value = 0.001724
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -2.0748544 -0.4784323
## sample estimates:
## mean in group A mean in group B 
##        4.614099        5.890743

For variant B:

dfB<-df %>% filter(variant=="B")
mean(dfB$user_spend)
## [1] 5.890743
sd(dfB$user_spend)
## [1] 18.30756

For variant B:

5.890743+qnorm(0.975)+18.30756/sqrt(3426)
## [1] 8.163485
5.890743 -qnorm(0.975)+18.30756/sqrt(3426)
## [1] 4.243557

For variant B:

5.890743 +qnorm(0.95)+18.30756/sqrt(3426)
## [1] 7.848375
5.890743-qnorm(0.95)+18.30756/sqrt(3426)
## [1] 4.558668

For variant B:

5.890743 +qt(0.975, 3425)+18.30756/sqrt(3426)
## [1] 8.164178
5.890743-qt(0.975, 3425)+18.30756/sqrt(3426)
## [1] 4.242865

For variant B:

5.890743+qt(0.95, 3425)+18.30756/sqrt(3426)
## [1] 7.84882
5.890743-qt(0.95, 3425)+18.30756/sqrt(3426)
## [1] 4.558223

Analysis 2

sub<-df %>% filter(user_spend>0)
boxplot(user_spend~variant, data=sub)

sub %>% group_by(variant) %>% summarize("mean"=mean(user_spend), "sd"=sd(user_spend), "count"=n())
## # A tibble: 2 × 4
##   variant  mean    sd count
##   <chr>   <dbl> <dbl> <int>
## 1 A        41.7  23.3   376
## 2 B        52.3  23.4   386
t.test(user_spend~variant, data=sub)
## 
##  Welch Two Sample t-test
## 
## data:  user_spend by variant
## t = -6.2301, df = 759.71, p-value = 7.717e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -13.872498  -7.224794
## sample estimates:
## mean in group A mean in group B 
##        41.73551        52.28416

Airbnb Test

Airbnb tested the idea of surfacing the next available dates near their calendar picker (see version b). They measured if the property was booked or not.

A
A
B
B

Summary

head(df)
##   variant click
## 1       a     0
## 2       a     0
## 3       a     0
## 4       a     0
## 5       a     0
## 6       a     0
## # A tibble: 4 × 4
## # Groups:   variant [2]
##   variant click     n   freq
##   <chr>   <int> <int>  <dbl>
## 1 a           0  1217 0.969 
## 2 a           1    39 0.0311
## 3 b           0  1350 0.974 
## 4 b           1    36 0.0260

Analysis

iter=100000
a=39+1
b=1217+1
a1=36+1
b1=1350+1
count<-c()
for (i in 1:iter){
A<-rbeta(1, a, b)
B<-rbeta(1, a1, b1)
count[i]<-ifelse(A>B, 1, 0)


}
pdiff<-sum(count)/iter
pdiff
## [1] 0.78395
prop.test(x=c(39, 36), n=c( 1217 +39, 1350+36))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(39, 36) out of c(1217 + 39, 1350 + 36)
## X-squared = 0.44543, df = 1, p-value = 0.5045
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.008415338  0.018569197
## sample estimates:
##     prop 1     prop 2 
## 0.03105096 0.02597403

Individual intervals for variant A.

a=39+1
b=1217+1
qbeta(0.975, a, b)
## [1] 0.04217153
qbeta(0.025, a, b)
## [1] 0.02282927
prop.test(x=39, n=1217)
## 
##  1-sample proportions test with continuity correction
## 
## data:  39 out of 1217, null probability 0.5
## X-squared = 1064.1, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.02318092 0.04397676
## sample estimates:
##          p 
## 0.03204601

High Speed Bike Helmet Effectiveness Study

In the Spring of 2025, the United States Cycling Safety Institute conducted a helmet effectiveness study to evaluate the impact of wearing a high-tech helmet in high-speed cycling crashes. The study involved 20 randomly selected students from ISA/STA 365, who were randomly assigned to wear either:

A standard helmet (Control group)

A newly designed impact-resistant helmet (Treatment group)

To maintain the integrity of the study, both groups wore helmets that appeared identical from the outside. The 30 trial observers were also blinded to which helmets were the high-tech versions. Each student rode a controlled downhill course at a high speed and was instructed to navigate a difficult turn known to cause crashes. The primary outcome was whether the student sustained a serious head injury (binary response: 1 = Injury, 0 = No Injury). Note this is just a hypothetical study :).

Random Sample from Population

Name
Amick, Tiffany
Appelwick, Evan
Bizon, Ashley
Boksha, Jenna
Brookshire, Kole
Dawson, Kendra
set.seed(2025)
study<-df[1:20,] #first 20 rows in the data frame
Name
Amick, Tiffany
Appelwick, Evan
Bizon, Ashley
Boksha, Jenna
Brookshire, Kole
Dawson, Kendra
Deaton, Mallory
Guevara, Peter
Hemm, Lilly
Judson, Nathaniel
Kahle, Adam
Krieg, Alex
Likitin, Danila
McClurg, Marissa
Metzler, Aj
Molnar, Cole
Nguyen, Thao
Phillips, Elaina
Ray, Benjamin
Rohlfs, Matt

Random Assignment

treatment<-c(rep("new helmet",10), rep("old helmet",10))
treatment<-sample(treatment)
experiment<-data.frame(study, treatment)
Name treatment
Amick, Tiffany old helmet
Appelwick, Evan old helmet
Bizon, Ashley new helmet
Boksha, Jenna new helmet
Brookshire, Kole new helmet
Dawson, Kendra old helmet
Deaton, Mallory old helmet
Guevara, Peter new helmet
Hemm, Lilly old helmet
Judson, Nathaniel old helmet
Kahle, Adam old helmet
Krieg, Alex new helmet
Likitin, Danila old helmet
McClurg, Marissa new helmet
Metzler, Aj new helmet
Molnar, Cole new helmet
Nguyen, Thao old helmet
Phillips, Elaina new helmet
Ray, Benjamin old helmet
Rohlfs, Matt new helmet