Walmart Test

Walmart ran this larger A/B test on their product detail pages which leaped them from a 2 column to a 3 column layout. The evolution seems inline with a similar finding that Amazon discovered a few years back (with an inverse setup). The A/B test contained at least the following 5 changes changes (shown in version B):

  1. Centered Headline
  2. Visible Product Descriptions
  3. Bullet Product Descriptions
  4. Multiple Buy Box (Right Column) Changes
  5. Product Thumbnail Photos
Walmart A
Walmart A
Walmart B
Walmart B

Frist Test: Sales per User

The first test was done on the sales per user. The analysis is below.

head(df)
##   test spend
## 1    A 33.81
## 2    A 21.30
## 3    A 52.13
## 4    A 28.31
## 5    A 42.64
## 6    A 31.73
boxplot(spend~test, data=df)

t.test(spend~test, data=df)
## 
##  Welch Two Sample t-test
## 
## data:  spend by test
## t = -3.087, df = 25518, p-value = 0.002024
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -1.0735104 -0.2397069
## sample estimates:
## mean in group A mean in group B 
##        17.71779        18.37440

Second Test: Purchase Rate

Another data scientist at Walmart decided that it would be better to test the purchase rate (purchase or not) than the spend.

df$purchase<-ifelse(df$spend>0, 1, 0)
head(df)
##   test spend purchase
## 1    A 33.81        1
## 2    A 21.30        1
## 3    A 52.13        1
## 4    A 28.31        1
## 5    A 42.64        1
## 6    A 31.73        1
## # A tibble: 4 × 4
## # Groups:   purchase [2]
##   purchase test      n  freq
##      <dbl> <chr> <int> <dbl>
## 1        0 A      4757 0.513
## 2        0 B      4513 0.487
## 3        1 A      8244 0.506
## 4        1 B      8047 0.494
set.seed(13)
iter=100000
a=8244+1
b=4757+1
a1=8047+1
b1=4513+1
count<-c()
for (i in 1:iter){
A<-rbeta(1, a, b)
B<-rbeta(1, a1, b1)
count[i]<-ifelse(A>B, 1, 0)


}
pdiff<-sum(count)/iter
pdiff
## [1] 0.1365

Individual confidence intervals for version B:

prop.test(x=c(8047), n=c( 4513+8047))
## 
##  1-sample proportions test with continuity correction
## 
## data:  c(8047) out of c(4513 + 8047), null probability 0.5
## X-squared = 993.8, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.6322119 0.6490711
## sample estimates:
##         p 
## 0.6406847
a1=8047+1
b1=4513+1
qbeta(0.975, a1, b1)
## [1] 0.649031
qbeta(0.025, a1, b1)
## [1] 0.6322512

Anheuser-Busch

The Anheuser-Busch beer company wanted to determine how much money to spend on advertising. They tested the following: (i) 50% increase, (ii) no change and (iii) 25% decrease in advertising expenditure over a 3 month period. They studied the changes in three different markets. They wished to make a general conclusion about the advertising expenditure, regardless of the market. They assumed the sales were not affected by the month over this time period. They measured the total sales.

Analysis

##             Df    Sum Sq   Mean Sq F value Pr(>F)   
## treatment    2 3.233e+10 1.617e+10   13.33 0.0062 **
## Residuals    6 7.279e+09 1.213e+09                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df    Sum Sq   Mean Sq F value  Pr(>F)   
## treatment    2 3.233e+10 1.617e+10  23.850 0.00599 **
## marketing    2 4.567e+09 2.284e+09   3.369 0.13877   
## Residuals    4 2.711e+09 6.778e+08                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Retail Experiment

A retail company is testing the impact of three factors on sales performance. Each factor has two levels:

Factor A (Pricing Strategy): Low level: Regular pricing (no discount). High level: 20% discount.

Factor B (Advertising Channel): Low level: Social media ads only. High level: Social media + email campaign.

Factor C (Store Layout): Low level: Traditional layout. High level: Modern layout with improved navigation.

The company runs the experiment at a single location, systematically testing each combination of factor levels over eight distinct weeks. The weekly sales revenue (in dollars) is measured for each combination of factors. They assume that the weekly sales over this

A B C y
1 1 1 1888.92
-1 1 -1 1952.50
-1 -1 1 1387.50
1 -1 -1 1959.80
-1 -1 -1 1388.47
1 -1 1 1972.04
-1 1 1 1936.77
1 1 -1 1916.06

Analysis Step 1

reg<-lm(y~A*B*C, data=df)
summary(reg)
## 
## Call:
## lm.default(formula = y ~ A * B * C, data = df)
## 
## Residuals:
## ALL 8 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1800.257        NaN     NaN      NaN
## A            133.947        NaN     NaN      NaN
## B            123.305        NaN     NaN      NaN
## C             -3.950        NaN     NaN      NaN
## A:B         -155.020        NaN     NaN      NaN
## A:C            0.225        NaN     NaN      NaN
## B:C           -6.768        NaN     NaN      NaN
## A:B:C         -3.078        NaN     NaN      NaN
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 7 and 0 DF,  p-value: NA

Analysis Step 2

library(DoE.base)

hncoeff<-halfnormal(reg, alpha=0.95)$coef
pse<-ME.Lenth(hncoeff)$PSE
abline(a=0, b=1/pse)

Analysis Step 3

reg<-lm(y~A*B*C-A:B:C, data=df)
summary(reg)
## 
## Call:
## lm.default(formula = y ~ A * B * C - A:B:C, data = df)
## 
## Residuals:
##      1      2      3      4      5      6      7      8 
## -3.078 -3.078 -3.078 -3.078  3.078  3.078  3.078  3.078 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1800.257      3.078 584.974  0.00109 **
## A            133.947      3.078  43.525  0.01462 * 
## B            123.305      3.078  40.067  0.01589 * 
## C             -3.950      3.078  -1.284  0.42136   
## A:B         -155.020      3.078 -50.372  0.01264 * 
## A:C            0.225      3.078   0.073  0.95354   
## B:C           -6.768      3.078  -2.199  0.27171   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.704 on 1 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9988 
## F-statistic:  1007 on 6 and 1 DF,  p-value: 0.02411

Analysis Step 4

Analysis Step 5

Analysis Step 6

Assumptions Plot 1

qqnorm(reg$residuals)
qqline(reg$residuals)

Assumptions Plot 2

plot(reg$fitted.values, reg$residuals )

Assumptions Plot 3

plot(run_order, reg$residuals)

Plots

library(sjPlot)
plot_model(reg, type="int")

Comcast Customer Service

Comcast customer service monitors the the time it takes to service a customer. Below are the results of the monitoring of random daily samples of size 6 of the number of seconds for customer service calls.

library(qcc)
q<-qcc(df, type="R")

q<-qcc(df, type="xbar")