A company wants to compare 4 email subject lines on open rate.
Treatments (subject lines):
A: “Quick question about your account…”
B: “Your January update is here”
C: “A tip to save you time today”
D: “New features you’ll like”
The company knows that its 4 different customer segments behave in different ways but they wish to find the treatment that works regardless of the segment.
A company wants to compare 4 email subject lines on open rate.
Treatments (subject lines):
A: “Quick question about your account…”
B: “Your January update is here”
C: “A tip to save you time today”
D: “New features you’ll like”
The company knows that its 4 different customer segments behave in different ways but they wish to find the treatment that works regardless of the segment. They also know that day of the week affects the open rate. The come up with the following plan. T
Etsy A/B/C tested at least 2 variations of their navigation, placing
traditional category links (A version) against fly-out categories and
breadcrumbs. The B variation had both the fly-out categories and
breadcrumbs. The C variation had just the bread crumbs. In this test,
they measured the click through rate on the “Add to Basket” button.
Below is the head() of the data.
## user_id variant clicked
## 1 U004497 B 0
## 2 U006653 B 0
## 3 U007402 B 0
## 4 U011386 C 0
## 5 U000557 A 1
## 6 U009756 C 0
## Df Sum Sq Mean Sq F value Pr(>F)
## variant 2 2.5 1.2461 11.58 9.48e-06 ***
## Residuals 11997 1291.2 0.1076
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Pearson's Chi-squared test
##
## data: table(df$variant, df$clicked)
## X-squared = 23.117, df = 2, p-value = 9.556e-06
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = clicked ~ variant, data = df)
##
## $variant
## diff lwr upr p adj
## B-A 0.01200 -0.005195014 0.029195014 0.2306023
## C-A -0.02275 -0.039945014 -0.005554986 0.0054841
## C-B -0.03475 -0.051945014 -0.017554986 0.0000065
##
## 0 1
## A -0.2419896 0.6464156
## B -1.0523736 2.8111561
## C 1.2943632 -3.4575717
A streaming company runs an A/B test of a new “Recommended for You” module on the home page.
Treatment A: current module
Treatment B: new module
Outcome: converted = 1 if the user starts a paid trial within 24 hours, else 0
They also collected other variables including: device (Mobile/Desktop), prior_purchases (count), tenure_days, and email_member (0/1).
| user_id | treatment | device | prior_purchases | tenure_days | email_member | high_intent | converted |
|---|---|---|---|---|---|---|---|
| U009238 | B | Mobile | 0 | 14 | 0 | 0 | 0 |
| U005887 | A | Mobile | 1 | 11 | 0 | 0 | 0 |
| U007292 | B | Desktop | 1 | 37 | 1 | 0 | 0 |
| U004500 | A | Mobile | 2 | 31 | 1 | 0 | 0 |
| U006728 | A | Desktop | 1 | 15 | 1 | 0 | 0 |
| U010333 | A | Mobile | 1 | 12 | 1 | 0 | 0 |
prior<-df %>% filter(prior_purchases==1)
prior %>% group_by(treatment, converted) %>% summarize(n=n()) %>% mutate(prop=n/sum(n))## # A tibble: 4 × 4
## # Groups: treatment [2]
## treatment converted n prop
## <chr> <int> <int> <dbl>
## 1 A 0 1917 0.918
## 2 A 1 172 0.0823
## 3 B 0 1950 0.922
## 4 B 1 166 0.0784
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(172, 166) out of c(1917 + 172, 1950 + 166)
## X-squared = 0.16541, df = 1, p-value = 0.6842
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.01302693 0.02079921
## sample estimates:
## prop 1 prop 2
## 0.08233605 0.07844991
device<-df %>% filter(device=="Mobile")
device %>% group_by(treatment, converted) %>% summarize(n=n()) %>% mutate(prop=n/sum(n))## # A tibble: 4 × 4
## # Groups: treatment [2]
## treatment converted n prop
## <chr> <int> <int> <dbl>
## 1 A 0 3571 0.927
## 2 A 1 281 0.0729
## 3 B 0 3568 0.924
## 4 B 1 293 0.0759
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(281, 293) out of c(3571 + 281, 3568 + 293)
## X-squared = 0.20086, df = 1, p-value = 0.654
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.014911175 0.009035258
## sample estimates:
## prop 1 prop 2
## 0.07294912 0.07588708
## # A tibble: 4 × 4
## # Groups: email_member [2]
## email_member converted n prop
## <int> <int> <int> <dbl>
## 1 0 0 7226 0.923
## 2 0 1 606 0.0774
## 3 1 0 3758 0.902
## 4 1 1 410 0.0984
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(606, 410) out of c(7226 + 606, 3758 + 410)
## X-squared = 15.201, df = 1, p-value = 9.666e-05
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.03198292 -0.01000438
## sample estimates:
## prop 1 prop 2
## 0.07737487 0.09836852
Poshmark tested three different footer designs that directed users to a customer help center page. One version was a long-form footer that included multiple paragraphs describing the company’s mission, background, and community values, while the other versions were shorter and more utilitarian.
The company ran the experiment across four geographic regions in the United States, but the primary objective was to identify a footer design that performed well consistently across regions.
User sales were measured for each footer variation and compared to determine which design led to the strongest overall performance.
## Df Sum Sq Mean Sq F value Pr(>F)
## Footer 2 507 253.60 3.91 0.0205 *
## Residuals 717 46500 64.85
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Region 1 30 30.35 0.464 0.496
## Residuals 718 46977 65.43
## Df Sum Sq Mean Sq F value Pr(>F)
## Footer 2 507 253.60 3.908 0.0205 *
## Region 1 30 30.35 0.468 0.4943
## Residuals 716 46470 64.90
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Footer 2 507 253.6 3.932 0.0200 *
## Region 3 450 150.0 2.325 0.0736 .
## Residuals 714 46050 64.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## diff lwr upr p adj
## B-A 0.5979167 -1.1238928 2.319726 0.69348414
## C-A 2.0024583 0.2806489 3.724268 0.01773421
## C-B 1.4045417 -0.3172678 3.126351 0.13491986
An online retailer wants to understand how pricing strategy and shipping offer affect average order value (AOV). They choose to run an experiment with two factors, Price and Shipping and they choose a high and a low level for each factor. The full experimental data is displayed below.
| price_level | shipping_level | average_order_value |
|---|---|---|
| -1 | -1 | 40.83 |
| -1 | 1 | 46.99 |
| 1 | -1 | 49.57 |
| 1 | 1 | 59.99 |
| -1 | -1 | 41.84 |
| -1 | 1 | 48.30 |
| 1 | -1 | 44.15 |
| 1 | 1 | 62.14 |
| -1 | -1 | 45.53 |
| -1 | 1 | 43.27 |
| 1 | -1 | 47.64 |
| 1 | 1 | 62.81 |
##
## Call:
## lm(formula = average_order_value ~ (price_level + shipping_level)^2,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9700 -1.7183 0.5067 1.4008 2.7967
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.4217 0.6869 71.949 1.55e-12 ***
## price_level 4.9617 0.6869 7.223 9.04e-05 ***
## shipping_level 4.4950 0.6869 6.544 0.00018 ***
## price_level:shipping_level 2.7683 0.6869 4.030 0.00379 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.379 on 8 degrees of freedom
## Multiple R-squared: 0.9329, Adjusted R-squared: 0.9077
## F-statistic: 37.08 on 3 and 8 DF, p-value: 4.85e-05