Amazon Tests

Amazon ran a price styling experiment. Notice they increased the font size and coloring of the pricing. The experiment ran on multiple product pages. Amazon measured the click through rate on the add to cart button (click the add to cart button or not).

Preliminaries

First the Analysts at Amazon completed a preliminary sample size study. They were interested in finding a difference in click through rates of 5%. Their current click through rate is 3.45%

power.t.test(n=NULL, sig.level = 0.05, power=0.8, delta = 0.05)
## 
##      Two-sample t test power calculation 
## 
##               n = 6280.064
##           delta = 0.05
##              sd = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
power.prop.test(n=NULL, sig.level = 0.05, power=0.8, p1=0.0345, p2=0.036225)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 179954.9
##              p1 = 0.0345
##              p2 = 0.036225
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
A
A
B
B

Summary

head(df)
##   a b
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
df %>% group_by(variant) %>% summarise("mean"=mean(click), "sd"=sd(click), "count"=n())
## # A tibble: 2 × 4
##   variant   mean    sd count
##   <chr>    <dbl> <dbl> <int>
## 1 a       0.0329 0.178  5014
## 2 b       0.0215 0.145  5014
df %>% group_by(variant, click) %>%  summarise(n = n()) %>%
  mutate(freq = n / sum(n))
## # A tibble: 4 × 4
## # Groups:   variant [2]
##   variant click     n   freq
##   <chr>   <int> <int>  <dbl>
## 1 a           0  4849 0.967 
## 2 a           1   165 0.0329
## 3 b           0  4906 0.978 
## 4 b           1   108 0.0215

Analysis

t.test(df$click~df$variant)
## 
##  Welch Two Sample t-test
## 
## data:  df$click by df$variant
## t = 3.4995, df = 9628.5, p-value = 0.0004682
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
##  0.005000443 0.017735895
## sample estimates:
## mean in group a mean in group b 
##      0.03290786      0.02153969
prop.test(x=c(165, 108), n=c(5014,5014))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(165, 108) out of c(5014, 5014)
## X-squared = 11.809, df = 1, p-value = 0.0005896
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.004802437 0.017933902
## sample estimates:
##     prop 1     prop 2 
## 0.03290786 0.02153969

Etsy Button Color Test

Etsy ran this simple button experiment in February 2025 on their product page where they challenged their existing black add-to-cart button against a yellow button. Etsy measured the sales over a certain time period to determine if the color change helped.

A
A
B
B

Summary

head(df)
##   visit_id user_spend variant
## 1    44222          0       A
## 2    59016          0       A
## 3    74729          0       A
## 4    96396          0       A
## 5    11771          0       A
## 6    33865          0       A
## # A tibble: 2 × 4
##   variant  mean    sd count
##   <chr>   <dbl> <dbl> <int>
## 1 A        4.61  15.2  3401
## 2 B        5.89  18.3  3426
boxplot(user_spend~variant, data=df)

Analysis 1

t.test(user_spend~variant, data=df)
## 
##  Welch Two Sample t-test
## 
## data:  user_spend by variant
## t = -3.1353, df = 6618.9, p-value = 0.001724
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -2.0748544 -0.4784323
## sample estimates:
## mean in group A mean in group B 
##        4.614099        5.890743

For variant B:

dfB<-df %>% filter(variant=="B")
mean(dfB$user_spend)
## [1] 5.890743
sd(dfB$user_spend)
## [1] 18.30756

For variant B:

5.890743+qnorm(0.975)+18.30756/sqrt(3426)
## [1] 8.163485
5.890743 -qnorm(0.975)+18.30756/sqrt(3426)
## [1] 4.243557

For variant B:

5.890743 +qnorm(0.95)+18.30756/sqrt(3426)
## [1] 7.848375
5.890743-qnorm(0.95)+18.30756/sqrt(3426)
## [1] 4.558668

For variant B:

5.890743 +qt(0.975, 3425)+18.30756/sqrt(3426)
## [1] 8.164178
5.890743-qt(0.975, 3425)+18.30756/sqrt(3426)
## [1] 4.242865

For variant B:

5.890743+qt(0.95, 3425)+18.30756/sqrt(3426)
## [1] 7.84882
5.890743-qt(0.95, 3425)+18.30756/sqrt(3426)
## [1] 4.558223

Analysis 2

sub<-df %>% filter(user_spend>0)
boxplot(user_spend~variant, data=sub)

sub %>% group_by(variant) %>% summarize("mean"=mean(user_spend), "sd"=sd(user_spend), "count"=n())
## # A tibble: 2 × 4
##   variant  mean    sd count
##   <chr>   <dbl> <dbl> <int>
## 1 A        41.7  23.3   376
## 2 B        52.3  23.4   386
t.test(user_spend~variant, data=sub)
## 
##  Welch Two Sample t-test
## 
## data:  user_spend by variant
## t = -6.2301, df = 759.71, p-value = 7.717e-10
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -13.872498  -7.224794
## sample estimates:
## mean in group A mean in group B 
##        41.73551        52.28416

Airbnb Test

Airbnb tested the idea of surfacing the next available dates near their calendar picker (see version b). They measured if the property was booked or not.

A
A
B
B

Summary

head(df)
##   variant click
## 1       a     0
## 2       a     0
## 3       a     0
## 4       a     0
## 5       a     0
## 6       a     0
## # A tibble: 4 × 4
## # Groups:   variant [2]
##   variant click     n   freq
##   <chr>   <int> <int>  <dbl>
## 1 a           0  1217 0.969 
## 2 a           1    39 0.0311
## 3 b           0  1350 0.974 
## 4 b           1    36 0.0260

Analysis

iter=100000
a=39+1
b=1217+1
a1=36+1
b1=1350+1
count<-c()
for (i in 1:iter){
A<-rbeta(1, a, b)
B<-rbeta(1, a1, b1)
count[i]<-ifelse(A>B, 1, 0)


}
pdiff<-sum(count)/iter
pdiff
## [1] 0.78395
prop.test(x=c(39, 36), n=c( 1217 +39, 1350+36))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(39, 36) out of c(1217 + 39, 1350 + 36)
## X-squared = 0.44543, df = 1, p-value = 0.5045
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.008415338  0.018569197
## sample estimates:
##     prop 1     prop 2 
## 0.03105096 0.02597403

Individual intervals for variant A.

a=39+1
b=1217+1
qbeta(0.975, a, b)
## [1] 0.04217153
qbeta(0.025, a, b)
## [1] 0.02282927
prop.test(x=39, n=1217)
## 
##  1-sample proportions test with continuity correction
## 
## data:  39 out of 1217, null probability 0.5
## X-squared = 1064.1, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.02318092 0.04397676
## sample estimates:
##          p 
## 0.03204601

High Speed Bike Helmet Effectiveness Study

In the Fall semester of 2025, the United States Cycling Safety Institute conducted a helmet effectiveness study to evaluate the impact of wearing a high-tech helmet in high-speed cycling crashes. The study involved 20 randomly selected students from the population of current ISA/STA 365, who were randomly assigned to wear either:

A standard helmet (Control group)

A newly designed impact-resistant helmet (Treatment group)

To maintain the integrity of the study, both groups wore helmets that appeared identical from the outside. The 30 trial observers were also blinded to which helmets were the high-tech versions. Each student rode a controlled downhill course at a high speed and was instructed to navigate a difficult turn known to cause crashes. The primary outcome was whether the student sustained a serious head injury (binary response: 1 = Injury, 0 = No Injury). Note this is just a hypothetical study :).

Random Sample from Population

Name
Anderson, Will
Archer, Nick
Banks, Madison
Cannon, Kim
DeFazio, Mia
Fuenfer, Braeden
set.seed(2025)
study<-df[1:20,] #first 20 rows in the data frame
Name
Anderson, Will
Archer, Nick
Banks, Madison
Cannon, Kim
DeFazio, Mia
Fuenfer, Braeden
Gall, Boden
Gigantelli, Elina
Gualtieri, Lauren
Harmon, Sydney
Huhn, Cameron
Lamanna, Kathleen
Leatherman, Madeline
Miller, Drew
Morrison, Lydia
Neu, Logan
Ona Jauregui, Felipe
Schneeberger, Matthew
Shroder, Andrew
Southworth, Paige

Random Assignment

treatment<-c(rep("new helmet",10), rep("old helmet",10))
treatment<-sample(treatment)
experiment<-data.frame(study, treatment)
Name treatment
Anderson, Will old helmet
Archer, Nick old helmet
Banks, Madison new helmet
Cannon, Kim new helmet
DeFazio, Mia new helmet
Fuenfer, Braeden old helmet
Gall, Boden old helmet
Gigantelli, Elina new helmet
Gualtieri, Lauren old helmet
Harmon, Sydney old helmet
Huhn, Cameron old helmet
Lamanna, Kathleen new helmet
Leatherman, Madeline old helmet
Miller, Drew new helmet
Morrison, Lydia new helmet
Neu, Logan new helmet
Ona Jauregui, Felipe old helmet
Schneeberger, Matthew new helmet
Shroder, Andrew old helmet
Southworth, Paige new helmet