Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Биомеханика

Файл:

Micro-Nano Technology for Genomics and Proteomics BioMEMs - Ozkan

.pdf

Скачиваний:

Добавлен:

10.08.2013

Размер:

10.41 Mб

Скачать

☆

<<< < Предыдущая 27 28 29 30 31 32 33 34 35 36 37 3839 / 5639 40 41 42 43 44 45 46 47 48 49 50 51 > Следующая >>>

364	JIM V. ZOVAL AND M.J. MADOU

[37]L. Bocquet, W. Losert, D, Schalk, T.C. Lubensky, and J.P. Gollub. Granular shear ﬂow dynamics and forces: experiment and continuum theory, Physical Review E (65), 011307–1.

[38]K.J. Ruschak and L.E. Scriven, Rimming ﬂow of liquid in a rotating horizontal cylinder, Fluid Mech., 76:113–127, 1976.

[39]S.T. Thoroddsen and L. Mahadevan. Experimental Study of coating ﬂows in a partially-ﬁlled horizontally rotating cylinder, Exper. Fluids, 23:1–13, 1997.

[40]R.A. Bagnold. Experiments on a gravity-free dispersion of large solid spheres in a Newtonian ﬂuid under shear,” Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 225(1160):49–63, 1954.

[41]Cliff K. K. Lun, Granular dynamics of inelastic spheres in Couette ﬂow, Phys. Fluids, 8(11), 1996.

[42]S.F. Foerster, M.Y. Louge, H. Chang, and K. Allia. Measurements of the collision properties of small spheres,

Phy. Fluids 6(3), 1994.

[43]G.K. Batchelor. A new theory of the instability of a uniform ﬂuidized bed, J. Fluid Mech., 193:75–110, 1988.

[44]R. Zenit, M.L. Hunt, and C.E. Brennen. “Collisional particle pressure measurements in solid-liquid ﬂows, Fluid Mech. 353:261–283, 1997.

[45]S.A. Morsi and A.J. Alexander. An investigation of particle trajectories in two-phase ﬂow systems, Fluid Mech., 55(2):193–208, 1972.

[46]J. Kirchner J, S.B. Sandmeyer, and D.B. Forrest. Transposition of a Ty3 GAG3-POL3 fusion mutant is limited by availability of capsid protein, Virol., 66(10):6081–92, 1992.

[47]R. Barathur, J. Bookout, S. Sreevatsan, J. Gordon, M. Werner, G. Thor, and M. Worthington. New disc-based technologies for diagnostic and research applications, Psychiat. Geneti., (12)4:193–206, 2002.

[48]I. Alexandre, Y.Houbion, J. Collet, S. Hamels, J. Demarteau, J.-L. Gala, and J. Remacle. Compact disc with both numeric and genomic information as DNA microarray platform, BioTechniques, (33)2:435, 2002.

Sequencing the Human Genome: A Historical Perspective on Challenges for Systems Integration

Lee Rowen

Multimegabase Sequencing Center, Institute for Systems Biology, 1441 N. 34th Street, Seattle WA 98103

11.1. OVERVIEW

The sequence of the human genome was declared ﬁnished on April 14, 20031. Analyses have been published in the journal Nature for chromosomes 6, 7, 14, 20, 21, 22 and Y, with the other chromosomes to follow in 2004. Although the Human Genome Project ofﬁcially began in 1990, most of the publicly accessible sequence data were produced by 20 genome centers in six countries between 1999 and 2002. This group of centers, called the International Human Genome Sequencing Consortium, coordinated their mapping and sequencing efforts and freely shared materials, data and procedures [23]. The International Consortium in turn was supported by a network of funding agency program directors, database managers, resource providers, instrumentation/protocol developers, and conference organizers. In all, several thousand people made sure that the human genome got sequenced, and the world is rightly celebrating their accomplishment.

This being said, it is unlikely that anyone would claim that the effort to obtain a complete sequence of the human genome was efﬁcient. Genomes are being sequenced now at signiﬁcantly lower cost and higher efﬁciency thanks to strategies that matured during the course of the human genome project. Efﬁciency is possible when the requirements for coordination of personnel are low; techniques are robust, automated, and scaleable;

1 See “International consortium completes human genome project” http://www.genome.gov/11006929.

366

LEE ROWEN

and the integration of cost-effective procedures into a high-throughput and streamlined system has already been achieved. When the project began in 1990, however, the most costeffective paths to a ﬁnished sequence of the human genome were unclear and remained to be determined.

The following historical perspective on how the human genome got sequenced is “internal,” meaning that it is written from a participant’s point of view. As the coordinator of a mid-sized genome center, the author personally experienced many of the developments that occurred during the course of the human genome project from 1990 to 2003. Like any good story now passing into legend, there are different ways to tell the tale, and there may not be universal agreement on what really happened in terms of facts and rationales for decisions. As for heroes and villains, it is this author’s charitable belief that the many characters engaged in the sequencing of the human genome were doing the best they could as they navigated the technological and political twists and turns of the undertaking. The story’s ultimate moral is that the genome project embodies a triumph of the human spirit along with a testimony to technological ingenuity and persistence.

In order to provide background for a subsequent discussion of the issues faced by the International Consortium, this review begins with a generic description of the approach used to sequence the human genome. A discussion of speciﬁc challenges for systems integration follows, using examples from various phases of the effort. Finally, a brief retrospective consideration of lessons learned that might be applicable to other large-scale technology development endeavors is offered.

11.2. APPROACHES USED TO SEQUENCE THE HUMAN GENOME

11.2.1. Overview

Looking at the big picture, the overarching design for sequencing the human genome entailed dividing individual donors’ genomic DNA (genomes) into manageable pieces (cloned genomic inserts; source clones), determining the sequences of the pieces (sequence reads; contigs; source clone sequences), and then reconstructing the sequence of an entire representative human genome from the sequences of overlapping pieces (overlapping source clone sequences), creating at the end one master sequence for each of the 24 chromosomes (Figure 11.1). The International Consortium used a hierarchical sequencing strategy whereby the genome was fragmented into source clones around 150 kilobases (kb) in size which were sequenced using a set of procedures to be described below [23]. The privately funded genome project led by Celera Genomics fragmented the genome into smaller 2 kb, 10 kb, and 50 kb cloned inserts and the overall sequence was assembled from sequence reads derived from the two ends of the cloned inserts using a “whole genome shotgun” approach [48]. For reasons of space, this review primarily covers the strategies used by the publicly funded human genome project.

In the hierarchical approach, the acquisition of sequence data encompassed four major processes:

mapping–determining the chromosomal location of source clones;

sequencing–obtaining raw sequence data for source clones;

SEQUENCING THE HUMAN GENOME

367

genomic DNA

DNA fragmentation and subcloning into suitable source clone vector

		source clone DNA

	source clone sequence
C.	........gggctctcagagcatactactacagctacgacatacagcatac........


	overlapping source clone sequences
D.	........gggctctcagagcatactactacagctacgacatacagcatac........
	ctacgacatacagcatactttgcgcgctactacgacatacagactac......


		chromosome sequence
E.	..............gggctctcagagcatactactacagctacgacatacagcatactttgcgcgctactacgacatacagactac

FIGURE 11.1. Strategy for sequencing the human genome. A. Total genomic DNA is obtained from the sperm or cells of an individual, fragmented, subcloned into a suitable cloning vector, and stored as libraries of random source clones. B. A source clone is chosen for sequencing and, C, the sequence of a source clone is obtained.

D.Overlapping source clone sequences are identiﬁed and, E, merged to create a chromosomal sequence.

assembly–reconstructing sequences of source clones from raw sequence data and the sequences of chromosomes from source clone sequences;

ﬁnishing–ﬁlling gaps, resolving assembly errors, obtaining high quality contiguous sequence over long stretches of genomic terrain.

368

LEE ROWEN

While the substrate for mapping and sequencing was a source clone (to be described further below), assembly and ﬁnishing pertained both to source clones, that is, to small 150 kb pieces of the genome, and to chromosomes, whose sequences were constructed by the merging of overlapping source clone sequences (Figure 11.1).

Systems integration pertains to the overall coordination of the mapping, sequencing, assembling and ﬁnishing of the genome, to the organization of the component steps for each of these processes, and to the various quality controls, validation procedures, and feedback mechanisms that ensured accuracy in the ﬁnal product. In the initial stages of the project (1990–1996), systems integration focussed primarily on procedures for sequencing source clones, with an effort to maximize efﬁciency and throughput. Here, the “pipeline” analogy prevailed. The idea is that ﬂuid ﬂows smoothly from point A to point B when things are working properly and leaks, blockages, backﬂows or diversions disrupt the ﬂow and, therefore, must be anticipated, attended to or prevented. In the later stages (2000–2003), the focus shifted to ﬁnishing a representative sequence for each human chromosome. The systems integration analogy at work in building chromosome sequences out of source clone sequences was more like conducting a symphony orchestra. Immense coordination and cooperation among the sequence centers was required to turn cacophony into glorious music.

11.2.2. Strategy Used for Sequencing Source Clones

Figure 11.2 summarizes the four major processes through which samples pertaining to source clones “ﬂowed” in the genome sequencing “pipeline.” The samples are of three types: biological/chemical materials (clones, DNA templates, sequencing reactions), images (ﬁngerprint patterns, dye peaks) and words (strings of A,G,C,T representing DNA sequences). These three data types challenged prevailing procedures for acquiring, storing, accessing, analyzing and sharing data throughout the course of the human genome project. Because of the scale involved, integrating the various processes of data acquisition put signiﬁcant pressure on the laboratory information management systems in place at each genome center.

11.2.2.1. Mapping (Source Clone Acquisition) In terms of source material for sequencing, the starting point for the publicly funded genome centers was not the genome-as- a-whole, as with the Celera effort, but rather pieces of representative genomes embedded in large-insert cloning vectors that enable propagation of the cloned insert in the bacterium Escherichia coli. Use of cloning vectors was required for the physical isolation of pieces of chromosomal DNA in a quantity sufﬁcient for sequencing. Prior to 1995, cosmid vectors, which hold an average insert size of about 40 kilobases (kb), were typically used. After 1995, cosmids were replaced by PACs (P1 artiﬁcial chromosomes), which hold inserts around 80 kb [24], and then BACs (bacterial artiﬁcial chromosomes), which hold inserts up to about 250 kb [36, 43]. Given a human genome size of 3 gigabases, it would require 20,000 BAC clones with an average insert size of 150 kb to cover the entire genome once if they were laid end-to-end. Since genomic inserts are generated by semi-random processes, ensuring adequate clone coverage of the entire genome required the construction of clone libraries containing several “genome equivalents” of DNA inserts [8]. These clone libraries consisted of several hundred 384-well plates of frozen bacterial cultures from which any

SEQUENCING THE HUMAN GENOME

369

Genome Sequencing Pipeline

Mapping

Genetic marker identification

Source clone library screening

Fingerprinting and/or End sequencing

Fluorescence in situ hybridization (used for validation)

Sequence-ready large-insert source clone

Sequencing

DNA extraction from source clone

Shotgun library construction

DNA template preparation

Sequencing reactions

Machine loading and detection

Base-calling

Sequence reads derived from clone fragments

Assembly

Pairwise alignments

Determining best overlaps

Building contigs

Contiguous stretches of sequence ("contigs")

Finishing

Gap-filling

Resolution of low-quality regions

Validation

Complete and accurate sequence of the source clone

FIGURE 11.2. Processing of samples through the genome sequencing pipeline. Each box represents the output of an overall process (arrow) that entailed several steps (see text).

individual large-insert clone could be propagated, once its clone ID and plate address were determined.

Retrieving sequence-ready clones from a clone library was easy: one could buy a copy of the library for several thousand dollars or order individual clones from a distributor. Figuring out which clones to retrieve, however, was non-trivial. During the mid-90s, genome centers claimed entire chromosomes or portions of chromosomes for their sequencing targets. Regional sequencing required centers to identify large-insert clones containing genomic DNA from their chosen territory and not some other center’s real estate: thus a need for physical mapping. In physical mapping, information is obtained from the DNA insert of a source clone that allows inferences to be made about chromosomal location.

The major strategies employed for mapping source clones prior to sequencing include:

Library screening: Genetic markers, which are short stretches of sequence known to map to a speciﬁc chromosomal position, are used to make probes for ﬁnding matching

370

LEE ROWEN

sequences among the human genomic inserts in the clone library. This is done by hybridizing the probe to ﬁlters onto which a tiny amount of DNA from each of the clones in the library has been spotted in a known location. Matches between the probe and the ﬁlter are detected as small black circles on a ﬁlm. From the position of the “positive” clones, the library addresses are “read” from the ﬁlm using a location schema provided by the library manufacturer. Clone candidates are then retrieved from the library and retested with the probe to ensure that the genetic marker of interest is present. The overall screening process starting from probe design and ending with clone validation generally required at least a couple of weeks. Screens could be multiplexed (i.e., several probes combined into one hybridization) and the positive clones sorted into clusters at the validation stage.

Restriction digest ﬁngerprinting: Restriction enzymes (enzymes that cut DNA whenever they encounter a speciﬁc short recognition sequence such as AAGCTT) are used to generate “ﬁngerprints.” Fingerprints are distinctive patterns of DNA fragment sizes that reﬂect the frequency of the enzyme’s recognition sequence in the region of interest. After restriction enzyme digestion of the DNA of a source clone, the fragments are separated according to size using agarose gel electrophoresis. Overlap between clones is inferred from a subset of shared fragment sizes. While this approach in and of itself does not point to chromosome location, it is used to determine the order of clones in a cluster—if any one clone in an ordered cluster is positioned on a chromosome, then the chromosomal position of the whole cluster is known.

Clone insert end sequencing: Obtaining short sequence reads ( 500 bases) from each of the two vector-insert joints of a clone, and looking for sequence matches with already-sequenced clones or known markers. End sequence matches allow more precise positioning of overlapping clones, so long as one of the clones has already been sequenced.

Fluorescence in situ hybridization (FISH): Labeling a large-insert clone with dye and hybridizing it to a metaphase chromosome spread and seeing which chromosomal band lights up under a microscope. This method was used primarily for validating a chromosomal location inferred from other mapping procedures [2]. The procedure is slow and requires a skilled technician to interpret the results.

Most genome centers or their mapping collaborators initially employed some variation of the following mapping strategy (Figure 11.3). After performing a round of library screening and identifying a cluster of clones containing a genetic marker of interest, one of the clones, called a “seed,” would be sequenced. From unique sequence at the ends of the seed clone insert, new probes for screening were designed for the purpose of procuring a new batch of clones that would overlap the seed clone. Because the average distance between mapped genetic markers was greater than the average length of the clone insert, multiple probes were necessary for obtaining contiguous clone coverage of a megabase-sized region. Long stretches of overlapping clones and clone sequences were thereby obtained through an iterative screen-sequence-screen-sequence approach. Local ﬁngerprinting and end sequencing were used to make the ordering of the clones in a cluster or region precise. In order to generate a steady supply of mapped clones for sequencing, multiple library screenings had to be done in parallel.

SEQUENCING THE HUMAN GENOME

371

chr15marker 1

chr15marker 2

new probe

seed clone

extending clone

BAC end	extending clone

size

Restriction digest fingerprints

B.	seed clone	extending clone

	extending clone	seed clone
	BAC ends from central resource
C.	sequence	sequence

overlapping clone

FIGURE 11.3. Construction of tiling paths. A. Clusters of clones (grey boxes) are identiﬁed from a screen of a BAC library using markers from human chromosome 15. One of the clones in each cluster is then sequenced (seed clone). From the sequences of the seed clones, new probes for screening are designed (small black boxes) and new clusters of overlapping BACs identiﬁed (blank boxes), one of which is sequenced (extending clone). The extent of overlaps are estimated using either BAC end sequencing (arrows, hatched boxes) or restriction digest ﬁngerprinting. B. Regional tiling paths are generated from minimally overlapping clones. C. The sequences of the overlapping seed and extending clones are merged. An even longer sequence would be produced after determining the sequence of the clone joining the two clusters.

As the sequencing phase of the genome project scaled up in the mid to late ‘90s, it became patently clear that the slow and laborious library screening approach could not supply enough mapped clones to feed the machines, and that large-scale and centralized resources for mapping were required [38, 47]. Between 1997 and 1999, the University of Washington and The Institute for Genomic Research generated BAC end sequences from several thousand clones in two BAC libraries–RPCI11 and Caltech D [26, 51]. With the BAC end sequence resource in hand, clones with minimal overlaps to sequenced BACs could

372

LEE ROWEN

be identiﬁed by searching the genome sequence sampling database in GenBank for unique matches to BAC ends. In an independent effort, Washington University at St. Louis built a mapping resource by ﬁngerprinting thousands of BACs from the same two libraries [22, 27]. When large numbers of ﬁngerprinted BACs became available in 1999–2000, clones could be clustered by fragment patterns, and their approximate order within the cluster inferred. Because the method is imprecise, a highly redundant supply of ﬁngerprinted clones was required for distinguishing true from spurious overlaps.

Use of a combination of mapping procedures enabled genome centers to construct “tiling paths,” that is, ordered arrays of clones containing inserts from overlapping portions of the genome (Figure 11.3). From these tiling paths, long stretches of chromosomal sequence were reconstructed by merging the sequences of overlapping clones in the tiling path.

Even though mapping procedures were slow, labor-intensive, and tedious, they usually worked. Some problems did on occasion occur:

The genetic marker used for a library screen turned out to map to the wrong chromosome or to more than one chromosome. This problem could be detected by genome centers fortunate enough to have in-house FISH capacity.

Library screens, or searches of the centralized mapping resources, yielded no positive clones, thereby leaving gaps in the chromosomal tiling path.

Mapping data based on ﬁngerprints and end sequences gave conﬂicting results, meaning that the region of interest in the genome was duplicated, or was signiﬁcantly different among individuals due to polymorphic variations.

Resolution, or attempts at resolution, of these problems generally occurred late in the game for the genome project, i.e., after year 2000.

11.2.2.2. Sequencing (Accumulation of Sequence Reads) As will be discussed later in this review, there were heated debates in the early ‘90s over sequencing strategies, yet by about 1995, the approach called “shotgun sequencing” [10] had become widely accepted. As the size of clonable genomic inserts increased (e.g., from 35 kb cosmids to 150 kb BACs), the ratio between obtainable sequence read length (only about 400 to 1000 bases) and clone insert length decreased, meaning that large numbers of overlapping sequence reads were required to reconstruct a contiguous and accurate sequence of a source clone. In shotgun sequencing, a source clone is fragmented such that positional information of the fragments is lost, and only regained after assembly of the sequence reads generated from the subcloned fragments (Figure 11.4). The “shotgun” analogy is that prior to assembly, sequence reads splatter across a virtual consensus sequence of the source clone. Randomly generated fragments sufﬁcient to cover the source clone many times over must be sequenced in order to ensure adequate coverage from the overlapping reads (redundancy).

Most genome centers used variations of the following generic procedures for generating the shotgun sequence reads from a source clone [39].

Source clone DNA preparation: Preparing DNA from mapped cosmid, PAC, or BAC clones with a minimal amount of contaminating E. coli chromosomal DNA.

Fragmentation: Randomly shearing the source clone DNA into short fragments using sonication, nebulization, or mechanical shearing by passage through a needle at high pressure.

SEQUENCING THE HUMAN GENOME

373

source clone insert (~150 kb)

short randomly generated subcloned fragments (~1.5-4 kb)

assembled sequence reads (contigs and gaps)

reads

contigs

assembled finishing reads

high quality consensus sequence (148,753 bases)

FIGURE 11.4. Strategy for shotgun sequencing. A source clone is fragmented, and the fragments of an optimal size range are subcloned into a phage or plasmid vector. After preparation of DNA from the subclones, 500 base sequence reads are generated from one or both ends of the insert, and assembled using pairwise alignments to generate contigs. Gaps between contigs and low quality regions are resolved by obtaining additional sequence (ﬁnishing reads), after which a high quality consensus sequence for the source clone is determined from the best set of reads.

Size selection: Purifying fragments of an optimal size range (usually 1.5–4 kb) suitable for subcloning and sequencing.

Subcloning: Ligating fragments into viral (phage M13) vector or a plasmid (typically pUC18) vector, and transforming the ligation mixture into E coli to generate single recombinant plaques or colonies, each harboring a subclone containing a fragment of the source clone.

Template DNA preparation: Isolating the recombinant plasmid or viral DNA from a single plaque or colony culture of E. coli.

Sequencing: Performing sequencing reactions on the puriﬁed template DNA using premixes of primer (required for DNA replication), deoxynucleoside triphosphate (dNTP) substrates (DNA building blocks), dideoxynucleoside triphosphate substrates (ddNTPs) to terminate DNA replication at random locations, buffers, and a suitable DNA polymerase [42]. Primers are designed to be complementary to a portion of the cloning vector sequence several bases short of the vector-insert joint, so that the same

<<< < Предыдущая 27 28 29 30 31 32 33 34 35 36 37 3839 / 5639 40 41 42 43 44 45 46 47 48 49 50 51 > Следующая >>>

Соседние файлы в предмете Биомеханика

#
10.08.201325.84 Mб111Kluwer - Handbook of Biomedical Image Analysis Vol.2.pdf
#
10.08.201316.35 Mб120Kluwer - Handbook of Biomedical Image Analysis Vol.3.pdf
#
10.08.20137.87 Mб1869Laser-Tissue Interactions Fundamentals and Applications - Markolf H. Niemz.pdf
#
10.08.20132.76 Mб131Mathematics for Life Sciences and Medicine - Takeuchi Iwasa and Sato.pdf
#
10.08.20131.85 Mб69Metabolic Engineering - T. Scheper and Jens Nielsen.pdf
#
10.08.201310.41 Mб72Micro-Nano Technology for Genomics and Proteomics BioMEMs - Ozkan.pdf
#
10.08.201324.9 Mб60Microarray Technology and Its Applications - U.R. Muller & D.V. Nicolau.pdf
#
10.08.20136.42 Mб73Molecular and Cellular Signaling - Martin Beckerman.pdf
#
10.08.20137.4 Mб97Nanofabrication Towards Biomedical Applications - C. S. S. R. Kumar.pdf
#
10.08.20133.01 Mб93Nanomaterials and Nanosystems for Biomedical Applications - M. Reza Mozafari.pdf
#
10.08.201310.95 Mб68Neutron Scattering in Biology - Fitter Gutberlet and Katsaras.pdf