The field guide

Your genome is 3.2 billion letters long. Here is how a lab reads them.

Whole genome sequencing is less one technology than a chain of them — each handing off to the next, each with its own failure modes. We explain them in the order your sample moves through the lab, with diagrams you can poke.

First, a distinction

WGS is not 23andMe.

A whole genome sequenceWGSWhole Genome Sequencing — reading (nearly) all 3 billion base pairs of your DNA, as opposed to genotyping arrays which sample ~600,000 known positions. reads (nearly) every position in your DNA. A genotyping array — the method used by 23andMe, AncestryDNA and most ancestry services — reads ~600,000 pre-selected positions known to vary between people.

Arrays are cheap because they ignore 99.98% of your genome. WGS is 5–10× more expensive because it doesn't. The consequences are cumulative: WGS can detect rare or novel variantsvariantA position where your genome differs from the reference. Most are harmless; a few are medically meaningful., structural changes, and anything the array designer didn't think to include.

Positions read (scaled)

Genotyping array · 600k sites

≈ 0.02% of the genome

WGS · 3.2B bases

≈ 98% of the genome (mappable regions)

~$80

Array cost

~$250–600

WGS cost, 2026

The pipeline

From spit tube to VCF, stage by stage.

Click a stage to open it. Each one takes 1–10 days in a commercial lab, with sequencing itself the longest queue.

The two technologies

Short-read vs long-read.

Short-read sequencers (Illumina) read DNA in ~150-letter chunks, perfectly and cheaply. Long-read sequencers (PacBio, Oxford Nanopore) read 10,000+ letters at a time, with higher error rates and higher prices.

The trade-off matters in repetitive regions — stretches of DNA where short reads can't tell this copy from that copy. About 8% of the human genome falls into this category. If it contains variants that matter to your family history, only long-read resolves them.

Reading a 3,000-base region

Short-read · 150 bases

Long-read · 15,000 bases

Error rate

0.1%

vs 1–5% long-read

Cost / genome

~$200

vs ~$1,000 long-read

Coverage, explained

What “30×” actually means.

“30×” means each position in your genome is read, on average, 30 times. It's an average because the machine doesn't distribute reads evenly — some regions get 50×, some get 10×, some get 0×.

The deeper the coverage, the more confident the variant callervariantA position where your genome differs from the reference. Most are harmless; a few are medically meaningful. can be that a difference is real and not a sequencing error. 30× is the consumer standard. 100× is used for clinical variant confirmation. 500× is the floor for some cancer assays.

Confidence at depth

1×

30%

Ancestry-only

5×

55%

Trait survey

15×

82%

Most SNPs confident

30×

96%

Consumer standard

60×

99%

Clinical confirmation

100×

99.7%

Variant discovery

Glossary

Words the industry uses without defining.

ACMG

American College of Medical Genetics and Genomics. ACMG SF (Secondary Findings) v3.3 is the standard list of medically actionable genes labs are expected to report back on when they turn up.

BAM

Aligned reads mapped to a reference genome. The typical intermediate file.

CLIA

US regulatory certification for clinical laboratories. A marker of clinical-grade testing.

coverage

How many times, on average, each base is read. 30× is the consumer standard; 100× is used for hard-to-call variants and some cancer assays; 1× (low-pass) is suitable for genealogy but not for clinical variant calls.

CRAM

Newer compressed alignment format — stores only differences from the reference. ~4–10× smaller than BAM for the same data. MyHeritage's LP-WGS ships CRAM files.

FASTQ

Raw sequencing read file. The unprocessed output of the sequencer.

ISO 15189

International medical-laboratory quality standard. Some labs (Dante Labs among them) cite ISO 15189 accreditation as a clinical-quality signal.

long-read

Sequencing method (e.g. PacBio, Oxford Nanopore) producing reads of 10,000+ bases. Better for structural variants; more expensive; higher per-read error rates.

LP-WGS

Low-pass whole genome sequencing — WGS at ~1× average depth. Covers the whole genome but with much lower per-position confidence; used today for cost-efficient genealogy applications, not for clinical variant calling.

pharmacogenomics

How your genetics influence your response to specific drugs — one of WGS's most actionable outputs.

PRS

Polygenic Risk Score — a statistical estimate of disease risk summed across many small-effect variants. Useful as a probability estimate; not a medical diagnosis.

reference genome

A consensus human genome (e.g. GRCh38, or the newer T2T-CHM13) that your reads are compared against.

short-read

Sequencing method (e.g. Illumina) that reads DNA in fragments of ~150 bases. Cheap, accurate, but weak in repetitive regions.

variant

A position where your genome differs from the reference. Most are harmless; a few are medically meaningful.

VCF

Variant Call Format — the compact list of positions where your genome differs from the reference.

WGS

Whole Genome Sequencing — reading (nearly) all 3 billion base pairs of your DNA, as opposed to genotyping arrays which sample ~600,000 known positions.