William Gieng

Staff Data Scientist, Qualtrics

Portfolio

Decision science for ambiguous business questions.

Staff-level data scientist focused on causal inference, experimentation, and the translation of statistical evidence into senior leadership decisions.

Selected work

Case studies translating ambiguous business questions into defensible analytical decisions.

01 — Retention

Acquisition quality vs. durable value: a multi-method retention diagnosis

When a competitor disruption drove a 280K subscriber spike, was it real growth or borrowed users? Three independent lenses on the same question.

Survival analysis Segmentation LLM VOC

Decision impact

Reframed leadership conversation from acquisition volume to activation quality and retained LTV.

→ Read case study

02 — Prediction

Subscriber churn prediction: survival curves and early engagement signals

Can first-month behavior separate "binge and leave" users from habit builders? Acquisition channel, billing, and engagement decomposition.

Survival curves Cohort analysis Classification

Decision impact

Surfaced channel-level churn risk and "binge and leave" patterns informing onboarding and acquisition spend.

→ Read case study

Published writing

Long-form essays on causal identification, measurement, and the discipline behind decision-grade analysis.

Towards Data Science Article

When customers churn at renewal: was it the price or the project?

Separating overlapping churn drivers when promo expiry and use-case completion arrive together.

Read article ↗

Towards Data Science Article

LLM summarizers skip the identification step

Connecting LLM summarization failures to causal identification discipline.

Read article ↗

Towards Data Science Article

LLM themes are not observations

Why LLM-extracted variables carry a selection, timing, and measurement footprint the downstream model never sees.

Read article ↗

Applied methods

Public notebooks demonstrating the methods behind the case studies and writing.

⌘

quasi-experimental-pricing

DiD, synthetic control, RDD, and ITS on a synthetic subscription pricing dataset, with LTV translation and breakeven price calculation.

Python Causal inference DiD

View on GitHub ↗

⌘

transcript-analysis-pipeline

Three-stage LLM pipeline (extract, synthesize, audit) for decision-grade meeting analysis with explicit grounding and bounded fabrication controls.

Python LLM Evaluation

View on GitHub ↗

⌘

llm-churn-reason-mining

Pipeline that extracts and categorizes churn reasons from unstructured text using weak supervision, transformer fine-tuning, and prompt-engineered LLM summaries.

Python Transformers LLM

View on GitHub ↗

Decision science notes

A LinkedIn series on the framing decisions that determine whether analysis is causal.

View all ↗

Series · 07

Difference-in-differences is only as credible as its comparison group

Parallel trends is not a technical footnote. It is the assumption that makes the comparison believable. When it breaks, the estimate can still look precise while pointing strategy in the wrong direction.

Read on LinkedIn ↗

Series · 06

The hard part is choosing an identification strategy the data can support

Not every business problem needs an experiment, but every impact claim needs the right identification strategy. Randomized experiment, DiD, ITS, and synthetic control each carry assumptions the data has to earn.

Read on LinkedIn ↗

Series · 05

Strong causal work does not start with a favorite method

The hardest part is rarely running the model. It is knowing what kind of answer the data can actually support. That is a judgment problem, not a tooling problem.

Read on LinkedIn ↗