Sequencing
Sequencing: how is DNA read?
April 2009
By Olena Morozova
Tags: basics, techniques
There are 46 DNA molecules in a human cell, and each of these molecules is essentially a long string of letters. Information encoded in these letters is used as a set of instructions that determine how a cell and the whole organism functions. Mistakes in these letters, or “mutations”, underlie many genetic diseases, such as cancer.
DNA is a long chain of letters
DNA molecule is a double helix composed of two strands that are put together based on the principle of complementary base pairing. Each DNA chain inside the double helix is a string of repeating subunits, called deoxyribonucleotides (dNTPs). An individual dNTP is composed of one of the four bases: adenine (A), cytosine (C), guanine (G) and thymine (T) that is attached to a sugar/phosphate. Complementary base pairing inside the double helix means that A only pairs with T, while C only pairs with G. Therefore, by knowing the letters on one strand, the letters on the other strand can be easily inferred.
The double-stranded helix is tightly packaged inside the small space of the cell nucleus. Did you know that if you took a single human DNA molecule and stretched it out, it would be 2 meters long? And yet you cannot even see it with your naked eye when it’s packaged in the cell!
DNA letters are organized into a genome
The total DNA content of a cell is called the genome. In humans, the genome is distributed over 23 pairs of chromosomes. The identity and the sequence of the bases in the genome contain the necessary information needed to make an organism. This information is organized into units, or genes, each of which determines a particular property or function of a cell or the organism.
Thus, genome sequence holds a key to understanding an organism, including any genetic diseases that are associated with alterations of the DNA sequence. Because of this, learning to “read” DNA has been of prime interest to scientists for many decades.
DNA molecule is “read” by making its complementary copy
While DNA seems to be “read” instantaneously using a fancy machine in many science fiction movies and TV shows, determining the identity and order of letters in the DNA is still not a trivial task in the contemporary world. Main challenges are the length of a DNA molecule (billions of letters!) and its tight packaging inside the nucleus. In fact, before a DNA molecule can be “read”, it needs to be separated from its associated proteins. In addition, due to the structural similarity of the four DNA bases, it is very difficult to determine their exact identity and order in an intact molecule. In practice, DNA sequencing is accomplished via a chemical reaction, most commonly by synthesizing a new DNA strand.
DNA sequencing, as we know it today, dates back to 1977, when a British scientist, Frederick Sanger, introduced the first efficient method known as “dideoxy sequencing”, or “Sanger sequencing”. While there were other methods proposed at the time, the Sanger method gained the most popularity because of its relative simplicity and capacity for being automated. This method relies on DNA synthesis in the presence of four deoxyribonucleotide (dNTP) analogs or copycats, called dideoxyribonucleotide terminators (ddNTPs) that, just like dNTPs, come in four flavours, A, C, T and G.
To make a new DNA chain, dNTPs are added one at a time using the letters in the complementary strand to guide which of the four dNTPs should be added at a particular time. Remember, A can only pair with T, while G only pairs with C! One analogy to this process would be to think of the four dNTPs as four Lego blocks of different colors that are stacked on top of each other using tops that are found on each Lego block.
The chemical difference between a dNTP and ddNTP is such that if a ddNTP is added to the growing chain, the chain cannot be made any longer and a chain ending with the ddNTP results. Using the Lego block analogy, if accidentally a Lego block without tops is added, no block can be stacked on top of it resulting in a shorter structure ending with the defective block. The Lego blocks without tops are the ddNTPs.
By labelling the four different ddNTPs with different colours, scientists can tell which one of the four has resulted in the termination event.
In a Sanger sequencing reaction, a new DNA strand is synthesized based on matching the original strand using the principle of complementarity. The synthesis occurs in the presence of fluorescently labelled ddNTPs that, when incorporated into the growing DNA strand, cause synthesis to stop. The length and color of the truncated DNA products can be used to infer what bases were present at each position in the original DNA sequence.
Defective lego blocks, just like ddNTPs, can be added at any position in the growing lego chain. If enough of such chains are made, we could potentially have a chain ending with the defective block (or ddNTP) at every possible position. Remember that the ddNTPs are colored, and the color can tell us which of the four letters, A, C, T, or G are present at that position. By reading the colors of all the different shortened products, the whole DNA sequence can be inferred!
The era of DNA sequencing
The Sanger technology has been the predominant DNA sequencing method for the past thirty years. Three years ago, a panel of new instruments, known as “next-generation sequencing technologies” was introduced to the market. Most of these instruments still use DNA synthesis as the basis for reading DNA; however, due to technological improvements, they promise to be faster and cheaper than Sanger sequencing
We are fortunate to be living in an era of DNA sequencing that has already begun to deliver great discoveries resulting from the whole genome sequencing of humans and other animals.



