Walmart ran this larger A/B test on their product detail pages which leaped them from a 2 column to a 3 column layout. The evolution seems inline with a similar finding that Amazon discovered a few years back (with an inverse setup). The A/B test contained at least the following 5 changes changes (shown in version B):
The first test was done on the sales per user. The analysis is below.
## test spend
## 1 A 33.81
## 2 A 21.30
## 3 A 52.13
## 4 A 28.31
## 5 A 42.64
## 6 A 31.73
##
## Welch Two Sample t-test
##
## data: spend by test
## t = -3.087, df = 25518, p-value = 0.002024
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -1.0735104 -0.2397069
## sample estimates:
## mean in group A mean in group B
## 17.71779 18.37440
Another data scientist at Walmart decided that it would be better to test the purchase rate (purchase or not) than the spend.
## test spend purchase
## 1 A 33.81 1
## 2 A 21.30 1
## 3 A 52.13 1
## 4 A 28.31 1
## 5 A 42.64 1
## 6 A 31.73 1
## # A tibble: 4 × 4
## # Groups: purchase [2]
## purchase test n freq
## <dbl> <chr> <int> <dbl>
## 1 0 A 4757 0.513
## 2 0 B 4513 0.487
## 3 1 A 8244 0.506
## 4 1 B 8047 0.494
set.seed(13)
iter=100000
a=8244+1
b=4757+1
a1=8047+1
b1=4513+1
count<-c()
for (i in 1:iter){
A<-rbeta(1, a, b)
B<-rbeta(1, a1, b1)
count[i]<-ifelse(A>B, 1, 0)
}
pdiff<-sum(count)/iter
pdiff
## [1] 0.1365
##
## 1-sample proportions test with continuity correction
##
## data: c(8047) out of c(4513 + 8047), null probability 0.5
## X-squared = 993.8, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.6322119 0.6490711
## sample estimates:
## p
## 0.6406847
## [1] 0.649031
## [1] 0.6322512
The Anheuser-Busch beer company wanted to determine how much money to spend on advertising. They tested the following: (i) 50% increase, (ii) no change and (iii) 25% decrease in advertising expenditure over a 12 month period. They studied the changes in three different markets. They wished to make a general conclusion about the advertising expenditure, regardless of market. They measured the total sales by month.
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 2 1.581e+11 7.905e+10 65.27 <2e-16 ***
## Residuals 105 1.272e+11 1.211e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 2 1.581e+11 7.905e+10 64.167 <2e-16 ***
## marketing 2 2.774e+08 1.387e+08 0.113 0.894
## Residuals 103 1.269e+11 1.232e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A retail company is testing the impact of three factors on sales performance. Each factor has two levels:
Factor A (Pricing Strategy): Low level: Regular pricing (no discount). High level: 20% discount.
Factor B (Advertising Channel): Low level: Social media ads only. High level: Social media + email campaign.
Factor C (Store Layout): Low level: Traditional layout. High level: Modern layout with improved navigation.
The company runs the experiment at a single location, systematically testing each combination of factor levels over eight distinct weeks. The weekly sales revenue (in dollars) is measured for each combination of factors. They assume that the weekly sales over this
A | B | C | y |
---|---|---|---|
1 | 1 | 1 | 1888.92 |
-1 | 1 | -1 | 1952.50 |
-1 | -1 | 1 | 1387.50 |
1 | -1 | -1 | 1959.80 |
-1 | -1 | -1 | 1388.47 |
1 | -1 | 1 | 1972.04 |
-1 | 1 | 1 | 1936.77 |
1 | 1 | -1 | 1916.06 |
##
## Call:
## lm.default(formula = y ~ A * B * C, data = df)
##
## Residuals:
## ALL 8 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1800.257 NaN NaN NaN
## A 133.947 NaN NaN NaN
## B 123.305 NaN NaN NaN
## C -3.950 NaN NaN NaN
## A:B -155.020 NaN NaN NaN
## A:C 0.225 NaN NaN NaN
## B:C -6.768 NaN NaN NaN
## A:B:C -3.078 NaN NaN NaN
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 7 and 0 DF, p-value: NA
library(DoE.base)
hncoeff<-halfnormal(reg, alpha=0.95)$coef
pse<-ME.Lenth(hncoeff)$PSE
abline(a=0, b=1/pse)
##
## Call:
## lm.default(formula = y ~ A * B * C - A:B:C, data = df)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -3.078 -3.078 -3.078 -3.078 3.078 3.078 3.078 3.078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1800.257 3.078 584.974 0.00109 **
## A 133.947 3.078 43.525 0.01462 *
## B 123.305 3.078 40.067 0.01589 *
## C -3.950 3.078 -1.284 0.42136
## A:B -155.020 3.078 -50.372 0.01264 *
## A:C 0.225 3.078 0.073 0.95354
## B:C -6.768 3.078 -2.199 0.27171
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.704 on 1 degrees of freedom
## Multiple R-squared: 0.9998, Adjusted R-squared: 0.9988
## F-statistic: 1007 on 6 and 1 DF, p-value: 0.02411
Social Media Ad Effectiveness
Vuori tested ads on Instagram. They ran an a/b test with two different versions of an ad. The response was a measure of user engagement on a scale of 0 to 100. The company also paid for user data from Instagram so they could have insight on who was engaging with the ad. A new analyst has run the following tests. They showed two different Ads (A) or (B) and also collected covariates like age and device type.
Test 1
Test 2
Test 3