Maven Landing Page A/B Test Analysis
Analyzed a landing page experiment with traffic filtering, bounce-rate testing, guardrails, and treatment-effect cuts.
Published April 2026
Case Brief
The situation: You work as a Marketing Analyst for Maven Fuzzy Factory, an online retailer that sells children's toys.
The Assignment: The Website Manager noticed that the homepage bounce rate was unusually high (60%) and decided to test a new landing page for a segment of traffic.
The task is to analyze the results of the A/B test to see if the new landing page drove a statistically significant improvement.
The Objectives:
- Filter the website traffic
- Calculate the bounce rates
- Verify the significance
Case provdied by Maven Analytics: mavenanalytics.io/guided-projects/landing-page-a-b-test
Data
The dataset provided by Maven Analytics is a simulated E-Commerce database based on real-world data from an online toy retailer. It includes detailed information on website sessions, pageviews, orders, and refunds, allowing for comprehensive analysis of user behavior and marketing performance.
Dataset: Toy Store E-Commerce Database or Kaggle
Data Model
Pre-Analysis
Objective: Improve bounce rate by testing a new landing page.
Hypothesis:
- Null Hypothesis: The new landing page does not drive a statistically significant improvement in bounce rate.
- Alternative Hypothesis: The new landing page drives a statistically significant improvement in bounce rate.
Targeting:
Users coming from a specific campaign ('google - g_ad_1') were randomly assigned to either the control group (Home) or the treatment group (new landing page).
Evaluation Metrics:
-
Primary Metric:
user-level Bounce Rate
(the percentage of visitors who navigate away from the site after viewing only one page).
-
Guardrail Metrics:
Conversion Rate
(the percentage of visitors who complete a desired action, such as making a purchase).
AOV (Average Order Value)
(the average amount of money spent per order).
Rationale: To protect against false positives on the primary metric, and avoiding the scenario where the new landing page improves bounce rate but causes a significant drop in conversion rate (e.g. by causing distractions or confusion) or AOV (e.g. by promoting lower-value items), we would need to reconsider the decision to ship the change.
Analysis Overview
I assume the following Experiment Validity Assumptions are met:
- Users are randomly assigned to variants
- Sessions are independent
- No major external changes during the test period
- Tracking is consistent across variants
Experiment Identification & Traffic Filtering
As stated in the case briefed, first goal is to identify the A/B test in the data and filter the traffic accordingly.This step is necessary as no experiment design or data collection information was provided, so we need to reverse-engineer the test setup from the data itself.
In order to do that, we will start by visualizing the campaign timeline (landing page × traffic source) to see how different combinations of landing pages and traffic sources evolved over time. This will allow us to detect where an A/B test is hiding in the data and then apply the necessary filters to isolate the relevant sessions for our analysis.
This bird's-eye view of how Maven Fuzzy Factory ran its campaigns allow us detect where an A/B test is hiding in the data. A valid A/B test requires three conditions to be met simultaneously:
in the highlighted section of the chart above, we can see that g_ad_1 is the only traffic source that served two different landing pages at the same time (/home and /lander-1). The overlap window — when both pages were live simultaneously for the same traffic source — runs from 2012-06-19 to 2012-07-29.
The filtered dataset following these criteria is available in the data/processed folder, with the corresponding filtering code in the data_cleaning.ipynb notebook.
Exploratory Data Analysis (EDA)
The filtered dataset contains 4,645 sessions from users who came through g_ad_1 during the overlap window.
We will start by checking the distribution of sessions between the two groups to confirm that the randomization worked as expected.
distribution of session, bounce rate and conversion rate over time will also be plotted to check for any temporal trends or anomalies that could indicate a novelty effect or pre-experiment bias.



Experiment Validity Checks
-
Data Leakage: No users were found in both control and treatment groups simultaneously. Randomization unit is confirmed at user level.
-
Sample Ratio Mismatch (SRM): We fail to reject the null hypothesis — the observed split is consistent with the expected 50/50 randomization. ==> No Sample Ratio Mismatch detected. (P-value of 0.42 >> 0.05)
-
Novelty Effect & Pre-Experiment Bias: Metric trendlines (bounce rate, conversion rate) were plotted over the test window for both groups. Both groups follow similar temporal trends, with no unusual spike in the treatment group at the start of the test — suggesting no novelty effect is present.
-
Device Breakdown: Mobile sessions are fewer than desktop in both groups, but the ratio is consistent across groups — this is a traffic characteristic, not a bias introduced by the experiment.
✅ All validity checks passed. The experiment is clean and results can be trusted.
Metric Calculation Methodology
-
Bounce Rate:
To ensure the same randomization unit for metric calculation and hypothesis testing, we will calculate bounce rate at the user level by first averaging within each user and then averaging across users.
Where:
So substituting:
Notation:
- : Total number of users in the group
- : Number of sessions for user
- : 1 if session of user bounced, 0 otherwise
- : Bounce rate for user across their sessions
The key insight is the two-step aggregation: First average within each user → (each user gets one number regardless of how many sessions they had), Then average across users → (each user counts equally)
-
Conversion Rate
The same logic applies to all other metrics as well.
-
Average Order Value
Where is the number of orders in session for user , and is the value of order .
In this dataset, for all users, meaning each user has exactly one session. Therefore, and all metrics simplify to simple means. While the formulas appear complex, they collapse to basic averages under this constraint. However, the two-step aggregation framework above is essential for ensuring the same randomization unit (user-level) is used for both metric calculation and hypothesis testing, maintaining statistical validity.
Statistical Testing Results
Statistical result summaries is as follows:
Interpretation
-
Bounce Rate 95% CI: (2.56pp, 8.26pp) absolute or (4.37%, 14.11%) relative improvement:
the entire interval is above zero, confirming both statistical and practical significance
-
Guardrail metrics are clean: conversion rate trended positively but was underpowered (too few conversions to detect a real effect). AOV showed zero variance — spending behavior was completely unaffected by the page change. No guardrail was violated.
Heterogeneous Treatment Effect Analysis
To further understand the results, we did break down the bounce rate by device type (desktop vs mobile) to see whether the treatment effect is consistent across segment. The results are summarized in the table below:
Interpretation:
-
The new landing page drives a strong, significant improvement for desktop users but has no meaningful effect on mobile users.
-
This is a heterogeneous treatment effect; the two segments respond differently to the same intervention. Critically, the new page does not harm mobile users (the bounce rate is nearly identical), but it also does not help them.
-
The mobile bounce rate of ~69% in both groups signals a separate, pre-existing UX problem on mobile that this test was not designed to solve and did not address.
Final Verdict
Rationale:
- Statistical significance: ✅ Strong (p=0.0002 overall, p=0.0001 desktop)
- Practical significance: ✅ 6.65pp desktop improvement — a reasonable positive effect (9.24% relative)
- Confidence interval: ✅ Entirely above zero (2.90pp, 10.41pp) on desktop
- Guardrail metrics: ✅ No negative movement in CR or AOV
- Mobile effect: ⚠️ Neutral — no harm, but no benefit
- Sample ratio mismatch: ✅ Clean 50/50 split confirmed
- Data leakage: ✅ Zero overlap between groups
Other Projects
View all →
Project
Superstore Sales Performance Dashboard
Built an executive Power BI dashboard on the Superstore dataset for quarterly sales, regional performance, categories, and customer segments.

Project
Predicting Advertising Recall from Brain Signals
My M.Sc. thesis at Politecnico di Milano, I built a machine learning pipeline that predicts whether a TV advertisement will be remembered - before it ever airs - using EEG brain signals recorded from viewers.

Project
Marketing Performance Analytics for a Beverage Client
Designed a KPI framework and Power BI dashboard for a global beverage client, integrating retail, search, and social data into a single performance view for marketing leadership.
CONTACT
Want to compare notes on a project?
I'm always up for a sharp data or product conversation.
Get in touch