ISA 365 Output Packet Version A

Walmart Test

Walmart ran this larger A/B test on their product detail pages which leaped them from a 2 column to a 3 column layout. The evolution seems inline with a similar finding that Amazon discovered a few years back (with an inverse setup). The A/B test contained at least the following 5 changes changes (shown in version B):

Centered Headline
Visible Product Descriptions
Bullet Product Descriptions
Multiple Buy Box (Right Column) Changes
Product Thumbnail Photos

Walmart A

Walmart B

Frist Test: Sales per User

The first test was done on the sales per user. The analysis is below.

head(df)

##   test spend
## 1    A 33.81
## 2    A 21.30
## 3    A 52.13
## 4    A 28.31
## 5    A 42.64
## 6    A 31.73

boxplot(spend~test, data=df)

t.test(spend~test, data=df)

## 
##  Welch Two Sample t-test
## 
## data:  spend by test
## t = -3.087, df = 25518, p-value = 0.002024
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -1.0735104 -0.2397069
## sample estimates:
## mean in group A mean in group B 
##        17.71779        18.37440

Second Test: Purchase Rate

Another data scientist at Walmart decided that it would be better to test the purchase rate (purchase or not) than the spend.

df$purchase<-ifelse(df$spend>0, 1, 0)
head(df)

##   test spend purchase
## 1    A 33.81        1
## 2    A 21.30        1
## 3    A 52.13        1
## 4    A 28.31        1
## 5    A 42.64        1
## 6    A 31.73        1

## # A tibble: 4 × 4
## # Groups:   purchase [2]
##   purchase test      n  freq
##      <dbl> <chr> <int> <dbl>
## 1        0 A      4757 0.513
## 2        0 B      4513 0.487
## 3        1 A      8244 0.506
## 4        1 B      8047 0.494

Test 1

set.seed(13)
iter=100000
a=8244+1
b=4757+1
a1=8047+1
b1=4513+1
count<-c()
for (i in 1:iter){
A<-rbeta(1, a, b)
B<-rbeta(1, a1, b1)
count[i]<-ifelse(A>B, 1, 0)


}
pdiff<-sum(count)/iter
pdiff

## [1] 0.1365

Individual confidence intervals for version B:

prop.test(x=c(8047), n=c( 4513+8047))

## 
##  1-sample proportions test with continuity correction
## 
## data:  c(8047) out of c(4513 + 8047), null probability 0.5
## X-squared = 993.8, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.6322119 0.6490711
## sample estimates:
##         p 
## 0.6406847

a1=8047+1
b1=4513+1
qbeta(0.975, a1, b1)

## [1] 0.649031

qbeta(0.025, a1, b1)

## [1] 0.6322512

Anheuser-Busch

The Anheuser-Busch beer company wanted to determine how much money to spend on advertising. They tested the following: (i) 50% increase, (ii) no change and (iii) 25% decrease in advertising expenditure over a 12 month period. They studied the changes in three different markets. They wished to make a general conclusion about the advertising expenditure, regardless of market. They measured the total sales by month.

Analysis

##              Df    Sum Sq   Mean Sq F value Pr(>F)    
## treatment     2 1.581e+11 7.905e+10   65.27 <2e-16 ***
## Residuals   105 1.272e+11 1.211e+09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##              Df    Sum Sq   Mean Sq F value Pr(>F)    
## treatment     2 1.581e+11 7.905e+10  64.167 <2e-16 ***
## marketing     2 2.774e+08 1.387e+08   0.113  0.894    
## Residuals   103 1.269e+11 1.232e+09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Retail Experiment

A retail company is testing the impact of three factors on sales performance. Each factor has two levels:

Factor A (Pricing Strategy): Low level: Regular pricing (no discount). High level: 20% discount.

Factor B (Advertising Channel): Low level: Social media ads only. High level: Social media + email campaign.

Factor C (Store Layout): Low level: Traditional layout. High level: Modern layout with improved navigation.

The company runs the experiment at a single location, systematically testing each combination of factor levels over eight distinct weeks. The weekly sales revenue (in dollars) is measured for each combination of factors. They assume that the weekly sales over this

A	B	C	y
1	1	1	1888.92
-1	1	-1	1952.50
-1	-1	1	1387.50
1	-1	-1	1959.80
-1	-1	-1	1388.47
1	-1	1	1972.04
-1	1	1	1936.77
1	1	-1	1916.06

Analysis Step 1

reg<-lm(y~A*B*C, data=df)
summary(reg)

## 
## Call:
## lm.default(formula = y ~ A * B * C, data = df)
## 
## Residuals:
## ALL 8 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1800.257        NaN     NaN      NaN
## A            133.947        NaN     NaN      NaN
## B            123.305        NaN     NaN      NaN
## C             -3.950        NaN     NaN      NaN
## A:B         -155.020        NaN     NaN      NaN
## A:C            0.225        NaN     NaN      NaN
## B:C           -6.768        NaN     NaN      NaN
## A:B:C         -3.078        NaN     NaN      NaN
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 7 and 0 DF,  p-value: NA

Analysis Step 2

library(DoE.base)

hncoeff<-halfnormal(reg, alpha=0.95)$coef
pse<-ME.Lenth(hncoeff)$PSE
abline(a=0, b=1/pse)

Analysis Step 3

reg<-lm(y~A*B*C-A:B:C, data=df)
summary(reg)

## 
## Call:
## lm.default(formula = y ~ A * B * C - A:B:C, data = df)
## 
## Residuals:
##      1      2      3      4      5      6      7      8 
## -3.078 -3.078 -3.078 -3.078  3.078  3.078  3.078  3.078 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1800.257      3.078 584.974  0.00109 **
## A            133.947      3.078  43.525  0.01462 * 
## B            123.305      3.078  40.067  0.01589 * 
## C             -3.950      3.078  -1.284  0.42136   
## A:B         -155.020      3.078 -50.372  0.01264 * 
## A:C            0.225      3.078   0.073  0.95354   
## B:C           -6.768      3.078  -2.199  0.27171   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.704 on 1 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9988 
## F-statistic:  1007 on 6 and 1 DF,  p-value: 0.02411

Analysis Step 4

Analysis Step 5

Analysis Step 6

Assumptions Plot 1

qqnorm(reg$residuals)
qqline(reg$residuals)

Assumptions Plot 2

plot(reg$fitted.values, reg$residuals )

Assumptions Plot 3

plot(reg$residuals)

Plots

library(sjPlot)
plot_model(reg, type="int")

Comcast Customer Service

Comcast customer service monitors the the time it takes to service a customer. Below are the results of the monitoring of random daily samples of size 6 of the number of seconds for customer service calls.

library(qcc)
q<-qcc(df, type="R")

q<-qcc(df, type="xbar")

Social Media Ad Effectiveness

Vuori tested ads on Instagram. They ran an a/b test with two different versions of an ad. The response was a measure of user engagement on a scale of 0 to 100. The company also paid for user data from Instagram so they could have insight on who was engaging with the ad. A new analyst has run the following tests. They showed two different Ads (A) or (B) and also collected covariates like age and device type.

ad	engagement	age	device
A	85.51612	24	Apple
A	62.30687	45	Apple
A	69.48172	46	Apple
A	54.56918	31	Apple
A	98.10323	36	Apple
A	50.54667	17	Apple

Test 1

t.test(engagement~ad, data=df)

## 
##  Welch Two Sample t-test
## 
## data:  engagement by ad
## t = -2.6043, df = 1977.9, p-value = 0.009274
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -2.8685058 -0.4041082
## sample estimates:
## mean in group A mean in group B 
##        75.01723        76.65354

Test 2

apple<-filter(df, device=="Apple")
t.test(engagement~ad, data=apple)

## 
##  Welch Two Sample t-test
## 
## data:  engagement by ad
## t = -2.2691, df = 423.78, p-value = 0.02377
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -4.0867848 -0.2929041
## sample estimates:
## mean in group A mean in group B 
##        75.01723        77.20708

Test 3

old<-filter(df, age>30)
t.test(engagement~ad, data=old)

## 
##  Welch Two Sample t-test
## 
## data:  engagement by ad
## t = -2.0636, df = 1181.5, p-value = 0.03927
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -3.23106493 -0.08156228
## sample estimates:
## mean in group A mean in group B 
##        74.72864        76.38495

ISA 365 Output Packet Version A

Last complied on December 09, 2024

Walmart Test

Frist Test: Sales per User

Second Test: Purchase Rate

Test 1

Individual confidence intervals for version B:

Anheuser-Busch

Analysis

Retail Experiment

Analysis Step 1

Analysis Step 2

Analysis Step 3

Analysis Step 4

Analysis Step 5

Analysis Step 6

Assumptions Plot 1

Assumptions Plot 2

Assumptions Plot 3

Plots

Comcast Customer Service

Social Media Ad Effectiveness

Test 1

Test 2

Test 3