Your genome is 3.2 billion letters long. Here is how a lab reads them.
Whole genome sequencing is less one technology than a chain of them — each handing off to the next, each with its own failure modes. We explain them in the order your sample moves through the lab, with diagrams you can poke.
WGS is not 23andMe.
A whole genome sequenceWGSWhole Genome Sequencing — reading (nearly) all 3 billion base pairs of your DNA, as opposed to genotyping arrays which sample ~600,000 known positions. reads (nearly) every position in your DNA. A genotyping array — the method used by 23andMe, AncestryDNA and most ancestry services — reads ~600,000 pre-selected positions known to vary between people.
Arrays are cheap because they ignore 99.98% of your genome. WGS is 5–10× more expensive because it doesn't. The consequences are cumulative: WGS can detect rare or novel variantsvariantA position where your genome differs from the reference. Most are harmless; a few are medically meaningful., structural changes, and anything the array designer didn't think to include.
From spit tube to VCF, stage by stage.
Click a stage to open it. Each one takes 1–10 days in a commercial lab, with sequencing itself the longest queue.
Short-read vs long-read.
Short-read sequencers (Illumina) read DNA in ~150-letter chunks, perfectly and cheaply. Long-read sequencers (PacBio, Oxford Nanopore) read 10,000+ letters at a time, with higher error rates and higher prices.
The trade-off matters in repetitive regions — stretches of DNA where short reads can't tell this copy from that copy. About 8% of the human genome falls into this category. If it contains variants that matter to your family history, only long-read resolves them.
What “30×” actually means.
“30×” means each position in your genome is read, on average, 30 times. It's an average because the machine doesn't distribute reads evenly — some regions get 50×, some get 10×, some get 0×.
The deeper the coverage, the more confident the variant callervariantA position where your genome differs from the reference. Most are harmless; a few are medically meaningful. can be that a difference is real and not a sequencing error. 30× is the consumer standard. 100× is used for clinical variant confirmation. 500× is the floor for some cancer assays.