by Yasemin Cole
Technique Name: Next Generation Sequencing
Fun Rating: 4/5
Difficulty Rating: 4/5
What is the general purpose? Scientists are interested in looking at an entire picture of the DNA (the blueprint of life) in a sample (e.g., a piece of tumor or a blood sample from an individual). One of the most valuable pieces of genetic information is genes.
Why do we use it? We are interested in the nucleotide bases (see the below image) (adenine = A, thymine = T, cytosine = C, guanine = G) in the sample and how they are arranged. Strings of A’s, T’s, C’s, and G’s form genes, and in total, the genes form a recipe book to the code of life. Scientists may be interested in whether a certain gene is present or absent in the sample and if there are any changes to the DNA (called mutations). For example, in the disease sickle cell anemia, one nucleotide change in the HBB gene causes patients’ red blood cells to deform into a shape similar to a sickle (hence the name sickle cell anemia). We can use next-generation sequencing to test patients’ blood to see if they carry the mutation explaining their sickle cells.
How does it work? You may have heard of the terms “precision medicine,” “genomic medicine,” and “genomics.” These terms have roots in the next-generation sequencing revolution that occurred in the early 2000s when the first human genome was successfully sequenced. Companies then developed technologies to sequence DNA rapidly and in tandem with one another: Instead of sequencing one strand of DNA at a time, this technology could sequence thousands of strands simultaneously. The basis of next-generation sequencing has its roots in polymerase chain reaction. The difference is that each nucleotide can be identified at a massive scale.
Step 1: DNA Isolation & Library Preparation
As described previously in our blog, DNA can be isolated from a sample through the DNA extraction process. Afterwards, the sample needs to be prepared for sequencing. DNA is fragmented into smaller pieces of around 200 base pairs (bp), and short fragments of known sequences (called adaptors) are added to the DNA. DNA is fragmented because sequencing depends on an enzyme that can properly identify and add the correct nucleotides up to around 200 bp.
The DNA “library” is now added to a physical flow cell and loaded into a DNA sequencing machine. The flow cell consists of a device with short fragments of DNA sticking up from the base. These act to catch the DNA library segments by their adaptor sequences (that were added in the first step). As DNA flows through the vessel, it will attach to the lanes with complementary DNA. For example, if there is a sequence of DNA in the DNA library of ATTGCATA, it will attach to a complementary sequence on the flow cell of TAACGTAT.
Step 2 & 3: Amplification & Sequencing
Once the DNA library has been loaded onto the flow cell, the next step is to amplify each DNA fragment. Similar to the polymerase chain reaction, to detect one piece of DNA, we need to amplify it exponentially, resulting in millions of copies of single-stranded DNA. Next, the sequencing process begins through a method called “sequencing by synthesis:” Nucleotides modified with fluorescent bases are added to the flow cell and bind to the DNA if it’s complementary. When the fluorescent nucleotide attaches to the DNA, it releases a signal that can be detected by a camera to show where it is located on the flow cell.
This process is repeated with each nucleotide (A, T, G, C) until the full 100-200 base pairs have a complementary base attached.
Step 3: Alignment & Data Analysis
The DNA sequencing instrument takes the pictures obtained from sequencing and converts them to a human-readable format. Of the millions of DNA fragments on the sequencer, the DNA sequencing machine will take each location and fluorescence signal and convert it to a nucleotide call. If in the top left corner of the flow cell, there is a red-yellow-yellow-blue… etc. color, the computer algorithm will convert this to AGGC. Next, a bioinformatician will take the raw data and perform quality control on the data to ensure that the sequencing calls are of good quality. There are many steps in the analysis stage; however, simply, the sequencing reads will be aligned and compared to a known reference so that changes in the DNA (called mutations) can be identified. As a result, scientists can examine the DNA for a gene of interest or if a gene has a mutation.
Before the 2000s, progress was slow in identifying genes and linking them to diseases. Next generation sequencing technologies have brought large advacements to the study of DNA. Within hours to days we can now obtain the whole picture of the genetic material in an organism or sample of interest.