Study Design

Paired comparisons: For each validation, we run the identical instrument and coding on live (panel) and synthetic datasets, then compare side-by-side.

Unit of analysis: We report metrics by question type and (where powered) by subgroup (e.g., age, gender, region).

Model freeze & provenance: Each study records model version, config, data timestamp, and random seed so results are reproducible.

Performance snapshot: Current models meet the published benchmarks on ~80–90% of questions across studies; strongest on single-choice and ranking; comparatively lower alignment on open text and highly sensitive topics.

Encoding Rules & Leakage Controls

Single-choice & Likert: Harmonize labels; allow pre-declared Likert collapsing (e.g., 5-pt → 3-pt) for stability checks.

Multi-choice: Expand to per-option binaries for JS on option incidence; compute aggregate Spearman/Top-K on option frequencies.

Numeric (binned): Define validation bins from external rules or development data only—never from the live validation distribution (no peeking).

Percent allocation: Normalize to sum=100; evaluate full-composition divergence; apply Top-K to dominant allocations.

Ranking: Convert to rank vectors (ties→average rank); compute Spearman and Top-K overlap for leading items.

Text: Minimal cleaning (case/punctuation) before semantic scoring; report BERTScore F1 and Optimal Matching Score (OMS).

Weighting parity: If live data are weighted, apply equivalent weights to synthetic comparisons.

Separation of roles: Thresholds are tuned in development; validation runs use frozen settings.

Metrics & Pass/Fail Standards (by Question Type)

These are the same thresholds we publish on the Validation Studies page to ensure consistency.

Question Type Metrics Standards Notes
Single-choice KL-Divergence
JS-Divergence
KL < 0.10
JS < 0.05
Likert collapsing permitted; include CI/error checks
Multi-choice JS-Divergence
Spearman
Top-K
JS < 0.05
Spearman > 0.75
Top-K > 0.8
JS per-option binaries; tighten thresholds with more options
Numeric (binned) KL-Divergence
JS-Divergence
KL < 0.10
JS < 0.05
Bins pre-declared; no peeking at live validation distribution
Percent-allocation KL-Divergence
JS-Divergence
Top-K
KL < 0.10
JS < 0.05
Top-K > 0.8
Emphasize dominant shares; normalize to sum=100
Ranking Spearman
Top-K
Spearman > 0.75
Top-K > 0.8
Scale standards with list length; focus on order preservation
Text responses BERTScore F1
Optimal Matching
F1 > 0.75
OMS > 0.75
Semantic similarity plus response pattern checks

Uncertainty: Publish 95% CIs (or bootstrap intervals) where applicable; recommend n ≥ 300 per comparison for stable divergence estimates.

Legacy diagnostics: Chi-square and KS can be shown as secondary diagnostics when useful, but pass/fail decisions are made using the metric suite above (to match the Validation page).

Reporting Conventions

  • Side-by-side distributions: Live vs Synthetic (%) for each item
  • Metric block: KL/JS/Spearman/Top-K/BERTScore F1/OMS with pass/fail vs standards
  • Confidence intervals: 95% CIs on key metrics where applicable; show sample sizes per cut
  • Subgroups: Repeat comparisons for powered demographic segments
  • Provenance: Include model version, data window, seed, weighting, and encoding notes

Pointer to examples: See examples on our Validation Studies page.

Domain-Specific Models

Healthcare & HCP Research

Specialty and prescribing patterns; validated against medical panels.

Built on physician-level databases with prescription behavior patterns and clinical decision-making frameworks.

Consumer & Market Research

Behavioral/demographic correlations; validated against tier-one panels.

Trained on consumer behavior patterns and purchase intent correlations from validated market research studies.

Social & Political Research

Attitudes and vote intent with demographic/geographic structure; validated against major polls/census controls.

Incorporates voting behavior and opinion formation patterns with careful attention to demographic correlations.

Transparency & Limitations

Known limits: Lower alignment on highly emotional or sensitive topics; cultural coverage gaps; temporal drift that requires periodic recalibration.

Disclosure: Publish failures alongside successes; document appropriate use cases.

Intellectual Property: Technology protected by U.S. Patent Application No. 18/784,418; additional provisionals pending.

We are committed to transparent reporting of synthetic data limitations and boundary conditions to help researchers make informed decisions about appropriate applications.

Research Documentation

Explore the academic and industry research that supports our approach:

Validation Studies

Statistical validation studies, case studies, and benchmarking reports that demonstrate the accuracy and reliability of synthetic data.

View Validation Results →

Mode Effects Research

Historical analysis of how survey methods have evolved and validation studies comparing synthetic data to traditional panel approaches.

Read Mode Effects Studies →

AI vs Human Prediction

Cognitive psychology research and comparative studies examining how well AI models predict human survey responses.

Review Prediction Research →