LII:Structural Biochemistry/DNA recombinant techniques/DNA sequencing

DNA sequencing determines the order of Adenine, Guanine, Cytosine, and Thymine within a DNA molecule. Sequencing of DNA allows for the makeup of a wide variety of genetic information. It is one of the simplest techniques when it comes to DNA manipulation.

History of Genome Sequencing

About DNA sequencing: Restriction endonucleases led to recombinant DNA. Bacteria make restriction endonucleases, enzymes that cut DNA at positions determined by specific short base sequences. In nature, restriction endonucleases protect bacteria from the foreign DNA of viruses. In the test tube, purified restriction endonucleases were used to "cut and paste" DNA from two unrelated organisms, generating recombinant DNA. The construction of artificially recombinant DNA, by "gene cloning," ultimately made it possible to transfer genes between the genomes of virtually all types of organisms, using processes derived from natural phenomena of bacterial transformation.

The first successful method of sequencing is known as Sanger Sequencing after its creator, Frederick Sanger. The method was revolutionary and opened up untold avenues of research. However, it was and remains inefficient in terms of time, cost, and materials. Bacteriophage fX174, a virus with only 5,368 base pairs, was the first to be sequenced in 1978, with many more to follow. Then, in 1983, the PCR method was developed, which allows for amplification of specific DNA fragments. Later, in 1986, the Leroy E. Hood's laboratory at the CIT and Smith announced the first semi-automated DNA sequencing machine. Next, the shotgun method was created by Craig Venter in 199. This method allowed for much faster sequencing.

Another method of sequencing is fluorescent detection. In this method, a fluorescent tag is used to label each dideoxy analog a different color. An electrophoresis is then performed, and the separate bands of DNA come through. The bases are labeled different colors, making their identities obvious and their sequences very easy to determine.

Perhaps the largest proposal came in 1990, when the Human Genome Project was proposed and launched in an effort to catalog the entire human genome. Later, a team led and created by Craig Venter sequenced the largest bacteria with a total of 1.8 Mb at TIGR's facilities. Such a feat was possible due to the improvement in computational software that was created to work with the extremely large sets of data the Shotgun method created. Instead of shotgunning very small bits of the genome at a time, they did it all at once with the entire genome. Then the TIRG assembler (name of the software) identified the 24,000 base pairs into one whole genome. This new method of sequencing cut the cost and time required nearly in half. The open-access policy of the Human Genome Project allowed any academic facility to use all the date collected, and helped transform our working knowledge of the genome. However, Celera, Craig Venter's project, initially refused to give away their data until he was pressured into it. The group now remains one of the most influential leaders in the industry. Late in 2003, save for a few gaps, the entire human Genome was announced as complete, heralding the entrance of a new era of science.

Fundamental Method

Sequencing DNA can be elucidated by applying chemical or enzymatic methods. The fundamental enzymatic method is based on the ability of a DNA polymerase to extend a primer and hybridized to the single-stranded DNA to be sequenced until a chain-terminating dideoxyribonucleotide triphosphate (ddNTP) is incorporated. ddNTPs are missing a hydroxyl group on carbon 3' to which the next dNTP of the growing DNA chain is added. ddNTP is useful as a chain terminator because without the hydroxyl group on carbon 3', no more nucleotides can be added, and DNA polymerase falls off; the DNA chain is stopped at a labeled G, A, T, or C. The resulting fragments share a common origin, but are terminated at different nucleotides, resulting in DNA chains that are a mixture of lengths. Each fragment is labeled with each dideoxy analog (terminators) being tagged with a different fluorescent color. They are separated by using high resolution gel electrophoresis, which is a method which separates them based on their different sizes; the shorter strands move faster though the gel. Once the strands move through the gel, their identities can be detected by fluorescence measurements (depending on their colors). The following techniques described are used in many different methods and are considered general knowledge.

Fluorescence detection

An alternative to autoradiography is "fluorescence detection". Fluorescence proteins used as a tag are found in light and color producing cells of many coelenterates such as corals and jellyfish. After fluorescent tag is attached to each of the dideoxy analog, different chain terminator has different color. Fluorescence is the process by which light is absorbed by a molecule and re-emitted at a longer wavelength (sometime falls into the visible range hence can be seen by human eyes), producing particular color. When it is excited by light, the fluorescence will fluoresce without input of energy such as ATP or any other cofactor.Then through a mixture of terminators, reaction can be performed separately and the mixture of fragments can be separated by gel electrophoresis. The bands of DNA can be detected by their color as they emerge. For example, if all of the Adenine dideoxy analogs have been labeled with a green tag and a fragment 5 bases longer than the primer comes out green, it is known that the base that the fifth base after the primer is an adenine. This method allows us to find the sequence of a polynucleotide that have numbers of bases up to 500. This is a viable and competitive solution because the use of radioactive components is eliminated, and can be automated. Thus more than 1 million bases can be sequenced per day.

Fluorescence detectors are probably the most sensitive among the existing modern HPLC detectors. It is possible to detect even a presence of a single analyte molecule in the flow cell. Typically, fluorescence sensitivity is 10 -1000 times higher than that of the UV detector for strong UV absorbing materials. Fluorescence detectors are very specific and selective among the others optical detectors. This is normally used as an advantage in the measurement of specific fluorescent species in samples.

When compounds having specific functional groups are excited by shorter wavelength energy and emit higher wavelength radiation which called fluorescence. Usually, the emission is measured at right angles to the excitation.

Roughly about 15% of all compounds have a natural fluorescence. The presence of conjugated pi-electrons especially in the aromatic components gives the most intense fluorescent activity. Also, aliphatic and alicyclic compounds with carbonyl groups and compounds with highly conjugated double bonds fluoresce, but usually to a lesser degree. Most unsubstituted aromatic hydrocarbons fluoresce with quantum yeld increasing with the number of rings, their degree of condensation and their structural rigidity.

Fluorescence intensity depends on both the excitation and emission wavelength, allowing selectively detect some components while suppressing the emission of others. The detection of any component significantly depends on the chosen wavelength and if one component could be detected at 280 ex and 340 em., another could be missed. Most of the modern detectors allow fast switch of the excitation and emission wavelength, which offer the possibility to detect all component in the mixture. For example, in the very important polynuclear aromatic chromatogram, the excitation and emission wavelengths were 280 and 340 nm, respectively, for the first 6 components, and then changed to the respective values of 305 and 430 nm; the latter values represent the best compromise to allow sensitive detection of compounds.

Fluorescence Protein

Fluorescene Protein is what we use in Fluorescence detection in order to label DNA. In Fluorescene Protein, the fluorophore is typically a double ring structure formed by three amino acids. It is called a beta barrel because it is formed by 11 strands of beta-pleated sheet, with additional amino sequence closing the top and bottom. They can often attach to other protein without changing other proteins' structure or function. Hence, it is often use in lab to label and detect molecules, cells, and organisms. It is also being used for screening durgs, evaluating viral vectors for human gene therapy, monitoring genetically altered microbes in the environment and biological pest control.

Green Fluorescent Protein (GFP) has existed for more than one hundred and sixty million years in one species of jellyfish, Aequorea victoria. The protein is found in the photoorgans of Aequorea. GFP is not responsible for the glow often seen in pictures of jellyfish - that "fluorescence" is actually due to the reflection of the flash used to photograph the jellies.

Because of the unique β-barrel fold of fluorescent proteins, mutations of residues throughout the entire protein have the potential to significantly change their fluorescent properties. As is highlighted in the poster, the most striking result of such mutations is the wide range of different emission colors that is currently available, which greatly increases the usefulness of these proteins as molecular probes. However, most single mutations have a negative impact on the tight packing of the FP β-barrel and, therefore, result in greater environmental sensitivity and reduced brightness. Although some of these defects can be compensated for by additional mutations, derivative FPs are often less bright and/or more sensitive to the environment compared with the original protein. This phenomenon has been especially evident during the search for truly monomeric versions of the tetrameric red fluorescent protein of the coral Discosoma sp.

Sequencing Methods

An abundance of sequencing methods have been developed over time and a number of them are listed below with a brief description.

Pyrosequencing

Pyrosequencing was developed in 1996 by Pål Nyrén and Mostafa Ronaghi at the Royal Institute of Technology in Stockholm. The sequencing process is done by enzymatically synthesizing a complementary single strand of DNA. The bases A, T, G and C are sequentially added and removed during the process. As DNA polymerase adds to the single stranded chain, ATP is created by ATP sulfurylase which is used by Luciferase to produce light. The light intensity is then detected and recorded for that base. If a base was added and no light was produced, then the incorrect base was added to further the sequencing. The process is repeated several times until the entire strand of DNA has been fully synthesized and sequenced.

Sanger Sequencing

Produced in 1974, at its most basic, the Sanger sequencing method uses of chain terminators to extend a specific site of a DNA template with the use of a primer which is complementary to the template at that site. In this method autoradiography is used and DNA polymerase, the four DNA bases, the primer, and a chain terminating nucleotide are used resulting in DNA fragments of different sizes; their size depending on where that particular nucleotide was used. The fragments are separated depending on their size using polyacrylamide gel electrophorese, or PAGE. An easier method would be the fluorescence dye terminator sequencing where one must label the chain terminators with a fluorescent dye. Each chain terminator would be labeled with a separate colored dye. The PAGE is performed and all of the chain terminators can be identified after a single reaction as opposed to the four reactions needed in the labeled-primer method mention before. However, the Sanger method of DNA sequencing has issues because of the need for lots of time and labor due to the gel preparation, and because it requires lots of samples. Prior to sequencing, the DNA must be denatured into single strands, and a primer must be attached to the template strands, which are created with their 3' ends located next to the DNA fragments desired. They are labeled either through radioactivity (autoradiography) or fluorescence. Then the solution is divided into 4 containers, after which one of four reagents are added, ddAPT, ddTPT, ddCTP, DDGTP, along with DNA polymerase and all four dNTP's. The method works because all the reactions begin from the same nucleotide and end with the desired base. So the new chain will terminate at every position in which the nucleotide can add, and bands of different length are created. After this the DNA is again denatured and sent through electrophoresis, then the contents of the froup vessels are run on a polyacrylmide gel to separate the bands from each other.

Because large fragments are difficult to sequence, it is necessary to work with small fragments. Frederick Sanger designed a method that controlled replication termination. In this remarkably simple technique, a 2',3'-dideoxy analog is used to initiate chain termination. Each reaction tube contains A, G, T, and C dideoxy analogs along with regular radioactively labeled dNTPs. Each reaction set is run in one column of an electrophoresis gel after the controlled replication termination. Since replication terminates after the random incorporation of a dideoxy analog, the shortest fragments, which run the farthest, are the first group of bases in the sequence.

The Future of Analyzing DNA

Genome sequencing will greatly advance our understanding of genetic biology, and has vast potential for medical diagnosis and treatment. One obstacle is the cost of genome sequencing (The first human genome mapping was done in 2003 and is estimated to have cost 3 billion dollars). The device of a nanopore may reduce this cost to a couple hundred dollars, making personal genome sequencing to be available to everyone. The idea of threading DNA through a tiny pore (nanopore) was envisioned by David Deamer from the University of California, Santa Cruz in the mid 1990’s.

A nanopore is a minuscule hole that a molecule of DNA can be threaded through and read. Currently, the method for nanopore DNA analysis involves inserting proteins into a membrane of lipids. A DNA molecule can be dragged through the nanopore when an electrical voltage is applied. A major drawback of this method is that the lipid membrane is quite fragile.

Researchers from the Delft University of Technology and Oxford University propose a new method that combines artificial and biological materials to create a nanopore on a chip, which can analyze single DNA molecules. The method involves attaching an individual protein to a larger piece of DNA, then threading it through a premade opening on a silicon nitride membrane.

However, the silicon nitride material is a bit too thick, so more than one nucleotide may enter the pore at the same time. Researchers from Delft, Pennsylvania, and Harvard University are working with graphene (one atom thick sheets of carbon), and have drawn DNA through a nanopore drilled into graphene. Graphene could indeed be the future of genome sequencing since it is strong, thin, and a great electrical conductor.

True Single Molecule Sequencing

There is a method of directly sequencing DNA or RNA molecule in a fast and low cost manner which allows multiple sequencing of many single DNA strands. Labeled dNTPs (dATP, dGTP, dCTP, dTTP) tagged with a fluorescent indicator and DNA polymerase, are added to a flow cell which is the DNA template and begins complementing. Single molecule sequencers are available to correct errors during sequencing. Wash steps are applied to remove excess nucleotides. The remaining nucleotides are captured and analyzed.

Sequencing by Hybridization (SBH)

This refers to an entire class of sequencing methods. Normally used to find a small change in a known DNA sequence, it is sensitive to even single-base mismatches. In this, a SBH chip of short sequences of nucleotides is inserted into a solution of the desired DNA sequence. Then the probes, or a single DNA fragment with a specific sequence, binds the sequence are found, and used to find the entire sequence. The problem with this method is you must find the smallest number of probes to sequence the largest amount of DNA. This technique was first proposed in 1988 and used in 1991 to reconstruct a 100 base pair DNA sequence. There are approximately two broad steps, the first is when you hybridize the DNA with the microarray, and the second when you combine them and algorithmically reconstruct them from the set of k-mers.

Mass Spectrophometry

This method can sequence approximately 100 base pairs at a time, however its resolution yet needs improvement. MALDI and ESI are the ionization methods most used, and its competitive edge comes in that it can be done in hours instead of days unlike other methods. Mass spectrometry sequencing is advantageous because frameshift mutations and heterozygous mutations can be identified. Different from other methods, like Sanger sequencing, fragments of DNA of the same lengths but with different DNA compositions can be determined. MALDI-TOF MS of these DNA sequencing fragments can be performed, and the bases can be very accurately determined.

Direct Visualization of Single DNA Molecules by Atomic Force Microscopy (AFM)

This method uses Electron Microscopes to scan across the surface of DNA, and get up to nano-meter resolution of the fragment being studied. This only uses very small quantities of DNA, within a few nanograms. With this method you can analyze different types of DNA fragments such as supercoiled, linear, or relaxed DNA. An advantage is it does not require the use of a staining or radioactive agent, and so do less damage to the fragment being sequenced. It also provides 3-D images as opposed to 2D.

Shotgun Sequencing

Created by GNN president J. Craig Venter in 1996 this method relies on isolating random pieces of DNA from their genome and then doing this several times to get redundant copies. The increased DNA fragments then are assembled by their overlapping regions and form a continuous transcript, normally done with computers. Then finally custom primers describe the gaps between these transcripts giving the sequenced genome. This method allows for much faster sequencing and was used for example to find the genome of smallpox. Whole genome shotgun sequencing is done in a few steps. First the DNA is again separated into random fragments, then cloned into an appropriate vector. First you isolate the DNA, then shear it into several pieces by a blender, passage through a narrow gauge syringe, or sonication and normally each fragment is about 2,000 base pairs. This DNA is then loaded onto a gel and compared to marker DNA already loaded. The specified DNA is then recovered and ligated into a cloning vector which amplifies the desired DNA sequences. then primers flanking the sequences are annealed and analyzed by a sequencer. Then all of the sequences, normally 500 base pairs are compared by sophisticated computer algorithms and find the largest possible continuous fragments, then put together the entire genome. The most prevalent criticism for this method is that it is not accurate enough, however in 2000 when Celera sequenced the genome of Drosophila melanogaster successfully.

Bac to Bac Sequencing

This type of sequencing was used in the human genome project. It is much slower than the shotgun approach, but has is a more certain way of sequencing. It starts off with first creating a map of the human genome prior to sequencing. The human chromosome is cut into pieces and then the order of these are first figured out before the actual sequencing begins. To start off, the genome is cut into pieces that consist of 150kb long. Then, these 150kb pieces are inserted into a BAC, which is an artificial bacterial chromosome. The pieces make up a BAC library, much like an actual library. Each BAC piece is thus like a book, which can be chosen from the library. The next step is to fingerprint the pieces, which is done by cutting the BAC pieces into more smaller pieces with an enzyme. Finding a common sequence in overlaps will then allow the location of the BAC on each chromosome to be figured out. This allows the researcher to determine their order of the fragments along the chromosome. Then next step is break the BAC into even smaller pieces, about 1.5kb long, and then inserted into an artificial DNA. This DNA is called an M13, and the pieces make up the M13 library, much like the BAC library. Then the M13 library is sequenced, and all the pieces are put together to find the order. This is usually done with a computer, because of the complexity of the sequencing. Compared the shotgun sequencing, BAC sequencing takes much longer and is more useful for genomes that are bigger.

Southern Blotting

This method is used to check for the presence of specific DNA in a DNA sample. It uses both agarose gel electrophoresis along with methods to transfer the separated DNA into a filter membrane for probe hybridization. First restriction endonucleases cut the DNA into small fragments, then the fragments are electrophoresed on the agarose gel. Next the DNA can be broken into suitably small pieces and then denatured within an alkaline solution. Next a sheet of nitrocellulose is placed on the gel, applying even pressure. This is then set in a high temperature oven or exposed to ultraviolet radiation in order to utilize the covalent crosslinks between the DNA and the membrane. After this a hybriziation probe labeled is set into the membranes, then washed away and the pattern is found on x-ray film through autoradiography. P-32 ATP is the probe that allows for autoradiography.

Dye-terminator sequencing method

Another method of DNA sequencing is the dye-terminator sequencing method. This method can be performed in only one reaction, which is quite advantageous. With this method, each of the four dideoxynucleotide chain terminators is labeled with different fluorescent dyes that are at different wavelengths. The sequencing can then be done using a computer with controlled sequence analyzers. One problem to this method is that there may be unequal peak heights and shapes in the electronic DNA sequence trace chromatogram. A way to bypass this problem, however, is to use new DNA polymerase enzymes and dyes that will limit variability and dye blobs. This sequencing method is very commonly used, especially since it is quicker than and not as costly as other methods.

Methods of Sequencing

The Sanger Dideoxy method is used to sequence DNA. This process is a fast and simple one in which it involves the use of DNA polymerase to synthesize a complementary sequence containing fluorescent tags on the four deoxyribonucletide bases. The fragments of DNA strands containing the fluorescent bases are then separated via electrophoresis or chromatography then sent through a detector. Another method to sequence genomic DNA is the Shotgun method.

Edman degradation is used to sequence proteins. Phenyl isothiocyanate reacts with the amino group in the N-terminal amino acid, then acidified to remove it. High pressure liquid chromatography (HPLC) is used to identify the amino acid. The process is repeated for each of the following proteins.

DNA Replication

The sequential assembly and reorganization of arrays of proteins in DNA sequencing is necessary because it’s crucial for the coordinated execution of initiation, elongation, and termination processes of DNA replication. The physiological significance of this process indicates that defects in proteins associated with the assembly and monitoring of the replication fork can cause genomic instabilities which can further results in carcinogenesis. It can also lead to a series of diseases known was “chromosome instability syndrome”.

A key characteristic of DNA replication in eukaryotic cells is that it is highly adaptable or plastic. This plasticity and ability to adapt is demonstrated with the fork rate and origin selection processes regulating each other and when inactive origins are activated when forks are stalled. During replication, DNA helicase completes replication and DNA polymerase elongates the DNA chain.

Errors or defects in DNA replication can often result in harmful effects such as genomic instability. This type of error can cause mutations and diseases with abnormal tissue growth such as cancer and can also give rise to groups of diseases known as ‘chromosome instability syndromes’.

References

Hall, A.R.; Scott, A.; Rotem, D.; Mehta, K.K.; Bayley, H.; Dekker, C. (2010). Nature Nanotechnology. 5 (12): 874–7. doi:10.1038/nnano.2010.237. PMC 3137937. PMID 21113160 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3137937. {{cite journal}}: Missing or empty |title= (help); Unknown parameter |tiitle= ignored (help)CS1 maint: PMC format (link) CS1 maint: multiple names: authors list (link)

Lippincott-Schwartz, J.; Patterson, G.H. (2003). "Development and use of fluorescent protein markers in living cells". Science. 300 (5616): 87–91. doi:10.1126/science.1082520. PMID 12677058.{{cite journal}}: CS1 maint: multiple names: authors list (link)

Masai, H.; Matsumoto, S.; You, Z.; Yoshizawa-Sugata, N.; Oda, M. (2010). "Eukaryotic chromosome DNA replication: Where, when, and how?". Annual Review of Biochemistry. 79: 89–130. doi:10.1146/annurev.biochem.052308.103205. PMID 20373915.{{cite journal}}: CS1 maint: multiple names: authors list (link)