How It Works Under the Hood
The Science Behind
Your Atlagene Report
We don't make claims we can't source. Every variant we report on is backed by peer-reviewed research, classified against established databases, and reviewed by geneticists before reaching your dashboard.
The Pipeline
Upload to Insight in 6 Steps
- 1
Parse
Auto-detect file format (23andMe, Ancestry, MyHeritage, VCF, WGS) and parse genotype calls. We verify positions against GRCh37/GRCh38 references.
- 2
Normalize
Variants are LIST-partitioned by chromosome and indexed by rsID. WGS files get reference-allele backfill so coverage scores are accurate (homozygous-ref positions count as analyzed, not missing).
- 3
Annotate
Each variant is cross-referenced against ClinVar (2.65M annotations), GWAS Catalog (312K associations), PharmGKB (2.8K drug-gene pairs), gnomAD (population frequencies), and REVEL (77.5M pathogenicity scores).
- 4
Classify
Risk scoring uses evidence-weighted polygenic models per category. VUS variants get an XGBoost classifier (AUC 0.80, retraining to ~0.84 with CADD/SpliceAI). Pharmacogenomic calls follow CPIC guidelines.
- 5
Review
Geneticist-curated variant registry — every entry has lifecycle status (proposed → reviewed → approved). Auto-discovered variants from GWAS/ClinVar updates go to the review queue, not directly to users.
- 6
Deliver
Results appear on your dashboard with disclaimers. Physician-flag variants (high-penetrance pathogenic) trigger an optional review. Helix AI explains findings without diagnosing.
Evidence Sources
Where Our Calls Come From
Every variant we report on cites at least one of these. No proprietary "secret sauce."
ClinVar
2.65M annotations
NCBI's public archive of variant-condition relationships, with clinical significance ratings (pathogenic, likely pathogenic, uncertain, likely benign, benign).
GWAS Catalog
312K associations
EBI's curated database of trait-associated SNPs from genome-wide association studies, weighted by effect size and study sample size.
PharmGKB
2.8K drug-gene pairs
Pharmacogenomics knowledge base. CPIC level A/B guidelines drive our 200+ drug interactions.
REVEL
77.5M scores
Ensemble missense pathogenicity predictor. Used for variant effect scoring; CADD and SpliceAI integration is queued for VUS classifier retraining.
gnomAD
Population frequencies
Allele frequency data across major populations — used to calibrate risk scores and reduce false-positive findings on common variants.
Continuous updates
Weekly + bi-weekly
ClinVar weekly, GWAS Catalog weekly, PharmGKB bi-weekly. Reclassifications trigger user-facing alerts when applicable.
Variant Discovery
Living Registry, Not a Static List
Our variant registry is database-driven. When ClinVar releases a weekly update, our automated discovery engine scans for newly-significant variants, scores the evidence on a 5-factor rubric (clinical significance, study count, population coverage, effect size, gene-disease association), and drafts a phenotype description using Claude.
Drafted variants land in variant_suggestions for geneticist review. Nothing reaches a user's dashboard until a credentialed reviewer approves it. Reclassifications trigger before/after audit logs and (where applicable) user-facing alerts.
See the public registry at /variants.
What Atlagene Does NOT Do
Honesty about scope is a clinical safety issue.
- We do not diagnose. Period. Findings get disclaimers; physician review is the paid product.
- We do not prescribe medications. Pharmacogenomics output is informational; your prescriber decides.
- We do not treat your genome as deterministic. Polygenic risk is probabilistic; lifestyle and environment matter.
- We do not sell or share your genetic data. Ever. (See Privacy Policy.)
- We do not handle ancestry research like 23andMe — we focus on health analysis. Ancestry composition is an included extra, not the headline product.
See It Yourself
Browse the public variant registry to see exactly what we measure and the evidence behind each call.