ANDRES GARCIA
SENIOR PRODUCT MANAGER
Deep-Dive Case Study — AI Personalization Platform · 500K+ Users
Building a System
That Learns —
Not a Feature Set.
0%
Learner Progression
System-driven outcome
0%
Engagement Increase
Relevance at scale
0%
Revenue Growth
Progression to conversion
0K+
Active Learners
Learning architecture
AI PersonalizationML GovernanceExperimentation InfrastructureEdTech
14 slides · arrow keys
Case Study — Executive Summary
AI Personalization: At a Glance
THE PROBLEM
Platform delivered a one-size-fits-all experience. Engagement and progression were inconsistent, and the team had no experimentation capability to validate decisions at scale. Every user received the same content regardless of where they were in their learning journey.
MY ROLE
Reframed roadmap from "ship AI features" to "build a system that learns." Defined personalization requirements, experiment success metrics, and UX behavior. Partnered with ML and engineering to ship adaptive recommendations.
HOW I DID IT
Built experimentation infrastructure first — A/B testing across onboarding, content, and triggers — before expanding features. Balanced model complexity against latency. Delivered incrementally, not big-bang.
THE RESULT
+25% engagement, +65% learner progression, +35% revenue across 500K+ users. Product success came from building a learning system, not just shipping AI features.
System Outcome Scorecard
Building the learning system before shipping AI features was the most important product decision.
What Made This Uniquely Difficult
This was not a feature build. Four compounding constraints.
1
ML Models Make Probabilistic Decisions — Not Deterministic Ones
Wrong is not binary
"Did the recommendation work?" is not a yes/no question. Defining acceptable model behavior — across edge cases, new users, and changing content — is a product judgment call, not an engineering threshold.
2
Personalization That Feels Helpful vs. Intrusive — Separated by a Threshold the Model Cannot Define
Product owns this — not data science
The ML model optimizes for the signal you give it. If you give it engagement signals, it will maximize engagement. If that harms learning progression, the model will never know. That calibration point is a product decision with downstream user consequences.
3
Experimentation Infrastructure Had to Exist Before Any Model Could Be Validated
Sequencing is the product decision
Building the test infrastructure before expanding AI features is a sequencing decision most teams get backwards — they ship AI and then discover they cannot measure whether it worked.
4
A Learning System Compounds — Early Design Decisions Are Irreversible
500K users cannot be undone
The signal architecture you choose on day one determines what the model can learn across 500K users over months. Getting it wrong produces a model trained on noise that performs worse over time, not better.
Core Reframe — The Signature Move
Not "ship AI features." Build a system that learns faster than competitors can copy.
BEFORE — FEATURE MINDSET
"Ship AI personalization features to improve engagement." Every sprint asked: what AI feature should we ship next?
AFTER — SYSTEM MINDSET
"Build a system that learns faster than competitors can copy features." The question became: does the system recommend better content this week than last week?
Experimentation First
No feature expansion until measurement infrastructure existed. The test system came before the AI system.
Signal Architecture Governed
Defined what behavioral data to collect, weight, and act on — before 500K users began generating it.
Model Performance Owned as Product Requirements
Defined precision, latency, and retraining governance — not left to engineering judgment.
Learning Speed as the KPI
Does the system recommend better content this week than last week? This reframe prevented optimizing for short-term engagement at the expense of long-term retention.
The Shift: Feature vs. System Thinking
The Reframe Impact
Feature delivery → System design
Why This Mattered
Features are copied in months. A system that learns from 500K users takes years to replicate. The architecture decision — not the feature roadmap — created the compounding competitive advantage.
Case Study — Scaling AI Personalization to 500K+ Users
The challenge. The approach. The outcome.
THE CHALLENGE
The platform delivered a generic, one-size-fits-all experience. Engagement and progression were inconsistent, and the team lacked an experimentation engine to validate decisions at scale.
How I Approached It
Prioritized experimentation infrastructure before feature expansion — no more shipping blind
Infrastructure First
Balanced model complexity against latency and real-world usability — depth never at the cost of UX
Latency-First Design
Delivered incrementally, not big-bang launches — reduced risk, accelerated learning cycles
Incremental Rollout
Defined KPIs for engagement, progression, retention, and feature performance from day one
Metric-Governed
What I Did — Execution Moves
Reframed roadmap: from "add AI features" to "build a system that learns"
Defined personalization requirements, UX behavior, and experiment success metrics. Partnered with ML + engineering to ship adaptive recommendations and conversational flows.
Introduced A/B testing across onboarding, content, and triggers
Experimentation before expansion — no AI feature shipped without a defined experiment to validate it.
Key insight: Product success came from building a learning system, not just shipping AI features.
AI Personalization — System Architecture
The five-layer learning system I owned and governed — every decision traced back here.
L1
Behavioral Data
• User actions & clicks
• Content completion
• Time-on-content
• Drop-off signals
Signal Contract
L2
Signal Processing
• Normalization
• Quality scoring
• Weighting by recency
• Feature engineering
Quality SLA
L3
ML Engine
• Recommendation model
• Engagement prediction
• Cold-start handling
• Retraining governance
KPI Owned
L4
Decision Layer
• Content selection
• Onboarding routing
• Trigger logic
• A/B variant assignment
Threshold Owned
L5
Product Experience
• Recommendations UI
• Adaptive onboarding
• Engagement triggers
• Progression feedback
User Outcome
↻ Feedback Loop: product behavior feeds back into training data — signals excluded to prevent feedback loops
Signal Contract
Defined what behavioral data each layer required, quality score acceptable, and what happened when data was incomplete or noisy.
ML Model KPIs
Owned precision/recall targets, inference latency SLAs (<200ms), and biweekly retraining cadence with the data science team.
Cold-Start Design
Explicitly designed for new users with no behavioral history — defaulting to onboarding flows, not random recommendations.
Feedback Loop Governance
Defined how product behavior fed back into training data — and what signals to exclude to prevent model feedback loops.
AI Personalization — Operating Model | Specific Decisions I Owned
How I led this — not generic PM activity.
1
Sequenced Experimentation Infrastructure Before Any AI Expansion
Measurement before models
Built A/B testing across onboarding, content recommendations, and engagement triggers before expanding any personalization feature. No model shipped without a defined experiment to validate it. This was a product sequencing decision, not an engineering task.
2
Owned the Helpful vs. Intrusive Threshold Definition
Product owns this threshold
Defined the product criteria for what made a recommendation feel helpful vs. intrusive. Set guardrails: maximum recommendation frequency, minimum confidence threshold before showing a recommendation, and user control mechanisms. The model optimized within these constraints.
3
Governed ML Model Performance as Product Requirements
Model KPIs to product outcomes
Defined precision/recall targets, inference latency SLAs (<200ms), and retraining triggers. Ran biweekly model performance reviews with data science. When engagement improved but progression declined, I flagged it as a product failure — not a model success.
4
Designed for Learning Speed — Not Just Engagement
Compounding value, not vanity metrics
Defined the North Star metric: does the system recommend better content to this user this week than last week? This single reframe prevented the team from optimizing for short-term engagement at the expense of the long-term value that drives subscription retention.
Every model decision, every experiment, every threshold — traced back to this operating model and the product outcomes I was accountable for.
AI Personalization — Critical Tradeoffs I Owned
Every AI product decision required balancing competing objectives simultaneously.
Model Complexity vs. Latency
Deeper model = higher accuracy
Deeper model = longer inference, broken UX
PRODUCT ANSWER
Selected signals by impact-to-latency ratio. Set hard <200ms SLA. Depth could never come at the cost of usability.
Personalization Accuracy vs. Cold Start
New users have no behavioral data
Generic recs erode first-session trust
PRODUCT ANSWER
Progressive trust model: new users received onboarding-guided flows. Personalization activated incrementally as behavioral signal accumulated.
Engagement Optimization vs. Progression
Max engagement drives short-term retention
Short-form content may undermine learning depth
PRODUCT ANSWER
Separated engagement KPIs from progression KPIs. When they diverged, progression won. Content recommendations had to improve learning outcomes, not just time-on-platform.
Experimentation Speed vs. Statistical Significance
Fast experiments drive faster learning
Underpowered experiments produce false signals
PRODUCT ANSWER
Defined minimum sample size and confidence thresholds before any experiment launched. A false positive at 500K users is expensive to reverse.
Every tradeoff resolved through defined criteria — not intuition. Product constraints governed the model, not the other way around.
Execution + Failure Scenario Design
Built for imperfect conditions — and real learning consequences.
Execution Model
Experimentation Governance
Ran weekly experiment reviews with data science. Every test had a defined hypothesis, minimum sample size, and success metric before launch.
ML Model Cadence
Biweekly model performance reviews. Monitored engagement, progression, and false recommendation rate. Triggered retraining when drift exceeded thresholds.
Incremental Rollout
No big-bang AI launches. Every personalization surface released to 10% of users first, validated against KPIs, then expanded. Rollout was a product decision.
Signal Quality Reviews
Monthly audit of behavioral data feeding the model. Identified and removed noise signals — accidental clicks, bots, test accounts — that polluted recommendations.
Failure Scenarios — Consequence Awareness
System optimizes for engagement signals → short-form content dominates → progression collapses → churn increases 3-6 months later. Invisible until it is expensive.
New users receive irrelevant recommendations in first session → 40-60% of churn happens in onboarding → acquisition costs wasted before personalization activates.
Recommendation service exceeds 200ms → UI shows fallback content → user interprets as broken product → drops off and does not return.
Noise data enters training set → model learns wrong patterns → recommendations degrade across entire 500K user base simultaneously.
At 500K users, a model failure is not a support ticket — it is a systemic product crisis.
AI Personalization — Impact + What This Demonstrates
What changed. What it proves. Why it compounds.
+25%
Engagement
Content relevance drove sustained session depth
+65%
Progression
Users advanced faster as system adapted
+35%
Revenue
Higher progression drove subscription conversion
All Three Metrics — The Learning Curve
+25% engagement. +65% progression. +35% revenue. Outcomes from the system — not from individual features.
Five Demonstrated Capabilities
1AI / ML Product Governance
Defined model KPIs, latency SLAs, signal contracts, and retraining governance — not just feature requirements.
2Experimentation Infrastructure
Built the measurement system before the AI system. Every product decision validated with real behavioral data.
3System-Level AI Thinking
Reframed from "ship features" to "build a system that learns." The architecture decision created compounding advantage.
4Tradeoff Mastery
Governed engagement vs. progression, accuracy vs. latency, and experimentation speed vs. validity simultaneously — through defined criteria, not intuition.
5Cross-Functional AI Leadership
Aligned product, ML engineering, and data science under one model governance framework and shared product definition of success.
System Performance — All Metrics Visualized
From feature delivery to compounding learning advantage.
Learning Curve — Metrics Over Time
Engagement vs. Progression Tradeoff Resolved
Recommendation Quality by Cohort
Revenue Waterfall — Learning System Impact
AI Personalization — Product Impact Layer
Why this mattered beyond engagement metrics — the durable advantage.
User Impact
+65% learner progression — users reached their goals faster because the system adapted to how they actually learned, not how we assumed they would.
+25% engagement — content became relevant to each learner context, reducing abandonment from generic, one-size-fits-all experiences.
Onboarding friction reduced — adaptive flows met learners where they were, not at a fixed starting point.
Business Impact
+35% revenue growth driven by higher activation, retention, and subscription conversion from improved learner outcomes.
Experimentation infrastructure built — every future product decision can now be validated with data instead of assumption.
Platform shifted from content delivery to learning intelligence — a defensible capability competitors cannot quickly replicate.
Product Insight
Building the learning system before shipping AI features was the most important product decision. Features are copied in months. A system that learns from 500K users takes years to replicate.
A learning system compounds value over time. A feature list does not.
+65% progression+25% engagement+35% revenue
How I Drove Results — Execution Details
What this proves — translated to director-level signal.
Execution Moves
Defined product acceptance criteria for every personalization surface
Including what "working correctly" meant for the ML model, the recommendation layer, and the user experience simultaneously.
Introduced A/B testing across onboarding, content, and triggers
Experimentation before expansion — no feature launched without a defined experiment to validate it against real behavioral data.
Defined KPIs for engagement, progression, retention, and feature performance from day one
Two KPI tracks, not one — preventing engagement metrics from masking progression decline.
Delivered incrementally — reduced launch risk and accelerated learning cycles
Every surface released to 10% of users first, monitored, then expanded. Rollout was a product decision.
Tradeoffs Navigated
Security vs. friction — reduced fraud without adding customer-facing barriers
Volume vs. edge cases — prioritized highest-traffic workflows for maximum impact
Speed vs. quality — incremental improvements over full rewrites: lower risk, faster value
Engagement vs. progression — two KPI tracks prevented short-term optimization at the cost of long-term outcomes
What This Proves
Comfort at the intersection of product, data science, and engineering
Ability to translate AI/ML capability into measurable, real-world user value.
Hands-on execution across full discovery to delivery to optimization lifecycle
Personalization designed to feel intuitive — never intrusive.
Experimentation capability first — no feature expansion until test infrastructure was in place
Balanced model complexity vs. latency — usability could not be sacrificed for sophistication.
AI Personalization Case Study — 500K+ Users
I build systems that learn —
not feature sets that ship.
+65%
Learner Progression
System adapted to how users actually learn.
+25% / +35%
Engagement & Revenue
Outcomes from the system — not the features.
500K+
Active Learners
Learning architecture, not feature list.
"Features are copied in months. A system that learns from 500K users takes years to replicate."
ANDRES GARCIA
SENIOR PRODUCT MANAGER
Full Portfolio · Thinkorswim Deep-Dive · Payments Deep-Dive · TDV Deep-Dive