How do we know the genome sequence?

Imagine someone asked you to explain how a car works. Even if you knew nothing about cars, you could take the car apart piece by piece, inspect each piece in your hand and probably draw a pretty good diagram of how a car is put together.  You wouldn’t understand how it works, but you’d have a good start in trying to figure it out.

Now what if someone asked you to figure out how the genome works? You know it’s made of DNA, but it’s the ORDER of the nucleotides that helps to understand how the genome works (remember genes and proteins?). All the time in the news, you hear about a scientist or a doctor who looked at the sequence of the human genome and from that information could conclude possible causes of the disease or a way to target the treatment. DNA sequencing forms a cornerstone of personalized medicine, but how does this sequencing actually work? How do you take apart the genome like a car so you can start to understand how it works?

As a quick reminder – DNA is made out of four different nucleotides, A, T, G, and C, that are lined up in a specific order to make up the 3 billion nucleotides in the human genome.  DNA looks like a ladder where the rungs are made up of bases that stick to one another: A always sticking to T and G always sticking to C.  Since A always sticks to T and G always sticks to C, if you know the sequence that makes up one side of the ladder, you also know the sequence of the other side.

DNA_ladder

The first commonly used sequencing is called Sanger sequencing, named after Frederick Sanger who invented the method in 1977. Sanger sequencing takes advantage of this DNA ladder – this method breaks it in half and using glowing (fluorescent) nucleotides of different colors, this technique rebuilds the other side of the ladder one nucleotide at a time. A detector that can detect the different fluorescent colors creates an image of these colors that a program then “reads” to give the researcher the sequence of the nucleotides (see image below to see what this looks like).  These sequences are just long strings of As, Ts, Gs, and Cs that the researcher can analyze to better understand the sequence for their experiments.

sanger_sequencing

This was a revolutionary technique, and when the Human Genome Project started in 1990, Sanger Sequencing was the only technique available to scientists. However, this method can only sequence about 700 nucleotides at one time and even the most advanced machine in 2015 only runs 96 sequencing reactions at one time.  In 1990, using Sanger sequencing, scientists planned on running lots and lots of sequencing reaction at one time, and they expected this effort would take 15 years and cost $3 Billion. The first draft of the Human Genome was published in 2000 through a public effort and a parallel private effort by Celera Genomics that cost only $300 million and took only 3 years once they jumped into the ring at 2007 (why was it cheaper and fast, you ask? They developed a fast “shotgun” method and analysis techniques that sped up the process).

As you may imagine, for personalized medicine where sequencing a huge part of the genome may be necessary for every man, woman, and child, 3-15 years and $300M-$3B dollars per sequence is not feasible. Fortunately, the genome sequencing technology advanced in the 1990s to what’s called Next Generation Sequencing. There are a lot of different versions of the Next Gen Sequencing (often abbreviated as NGS), but basically all of them run thousands and thousands of sequencing reactions all at the same time. Instead of reading 700 nucleotides at one time in Sanger sequencing, NGS methods can read up to 3 billion bases in one experiments.

How does this work? Short DNA sequences are stuck to a slide and replicated over and over. This makes dots of the exact same sequence and thousands and thousands of these dots are created on one slide. Then, like Sanger sequencing, glowing nucleotides build the other side of the DNA ladder one nucleotide at a time. In this case though, the surface looks like a confetti of dots that have to be read by a sophisticated computer program to determine the millions of sequencing.

NGS

So what has this new technology allowed scientists to do? It has decreased the cost of sequencing a genome to around $1000. It has also allowed researchers to sequence large numbers of genomes to better understand the genetic differences between people, to better understand other species genomes (including the bacteria that colonize us or the viruses that infect us), and to help determineexomee the genetic changes in tumors to better detect and treat these diseases. Next Generation Sequencing allows doctors to actually use genome sequencing in the
clinic. A version of genome sequencing has been developed called “exome sequencing” that only sequences the genes.  Since genes only make up about 1-2% of the genome, NGS of the exome takes less time and money but provides lots of information about what some argue is the most important part of the genome – the part that encodes proteins.  Much of the promise of personalized medicine can be found through this revolutionary DNA sequencing technique – and with the cost getting lower and lower, there may be a day soon when you too will have your genome sequence as part of your medical record.


For more information about the history of Sequencing, check out this article “DNA Sequencing: From Bench to Bedside and Beyond” in the journal Nucleic Acids Research.

Here is an amusing short video about how Next Generation Sequencing works described by the most interesting pathologist in the world.