Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Determination of Complex Reaction Mechanisms.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
7.63 Mб
Скачать

MINI-INTRODUCTION TO BIOINFORMATICS

213

Fig. 13.4 A simple Boolean network.

(a) Wiring diagram. (b) Logical Boolean rules.

(c) State transition table which defines the network. The input column corresponds to time t, the output column to time t + 1. (Figure and caption taken from [8], with permission.)

We have already introduced that subject in chapter 9 (see eqs. (9.2)–(9.5)). There we chose a new measure of a correlation distance, the mutual information. A natural measure of such a distance between the probability distributions of two variables is the number of states jointly available to them (the size of the support set) compared to the number of states available to them individually. Thus, two probability distributions are close and the support set is small, if the knowledge of one predicts the most likely state of the other, even if there exists simultaneously a substantial number of other states. Hence from the gene expression distribution calculated in the transition table, and the measured expression distribution, a mutual information can be calculated and the search for an appropriate network proceeds until a minimum mutual information is obtained.

Other methods are available (see [9]). In that paper, calculated illustrations are given on the number of calculations necessary for Boolean network identification. For example, for a connectivity of 2 genes to each gene only 50 input–output pairs in the transition table had to be calculated from random guesses to identify a gene network of

320genes.

13.5Correlation Metric Construction for Genetic Networks

An application of time-lagged correlation analysis (CMC), following the work presented in chapter 7, to a gene network of 259 genes from a photosynthetic cyanobacterium has been reported by Schmitt et al. [10]. The gene network is perturbed with a flux of light and the responses of the genes, the gene transcription, is recorded on DNA microarrays at 20 min intervals for a period of 8–16 hours. An idealized example is shown in fig. 13.5; the presence of time lags indicates a cascade of biochemical reactions. The experimental setup and the two types of light intensity profiles used are shown in fig. 13.6. Each of the time measurements of the transcription responses is used to form time-lagged correlations with the time-varying light flux. Some results are shown in fig. 13.7. The first subdivision of the responses to the light flux are obtained from the maximum of the correlations; in group I that maximum occurred after a 20 min time lag, in group II after a 40 min time lag, and so on. Within each time

214 DETERMINATION OF COMPLEX REACTION MECHANISMS

Fig. 13.5 Idealized gene expression of expected experimental results. The influx of light causes transcription in gene 1 with a time lag τ1. (Taken from [10], with permission.)

group, I–IV, further groupings are obtained by an analysis based on the profiles of the responses in the transcription. A listing of the genes in each group in fig. 13.7 is given in table 13.1.

For a summary of this work, we quote from [10]:

Although substantial effort is required to plan and perform this type of experiment, an enormous amount of information is obtained. The directionality of the resulting networks provides more information than clustering alone, and therefore allows the researcher to generate hypotheses based on the system structure. Additionally, it is important to consider

Fig. 13.6 Experimental setup and the two types of light intensity profiles used in the experiments. (Taken from [10], with permission.)

MINI-INTRODUCTION TO BIOINFORMATICS

215

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fig. 13.7 Groupings of genes obtained from the time-lagged correlation analysis, groups I–IV, and further separations into numbered groups 2–50 obtained from an analysis of the profiles of the transcription responses. The genes in each numbered group are listed in table 13.1. Dark arrows indicate close correlation, a dotted line indicates inverse correlation, and a straight line indicates zero time lag. (Taken from [10], with permission.)

similarly expressed genes as potential regulon members. Regulons are sets of coregulated genes with common promoter regions differing from operons in that they are not necessarily sequentially oriented in the genome. To this end, genes with the same time-lagged correlations may be considered as good regulon candidates. . . . This suggests that dynamic studies of transcriptional behavior with significant number of time points can play a key role in understanding cellular regulation.

We cheerfully agree.

See also [23] for another correlation analysis of a gene network.

13.6Bayesian Networks

In chapter 7 we introduced concentration correlation functions of two species. In chapter 9 that concept was generalized to more than two species. Here we go all the way to the consideration of n species, with some limitations, by means of Bayesian networks [11–13]; we follow the presentation in [12] and [13], but much detail needs to be omitted. There is a very large literature on this subject and some of it is cited in [11–13] and in the review articles cited at the end of this chapter.

216 DETERMINATION OF COMPLEX REACTION MECHANISMS

Table 13.1 Listing of the genes in each numbered group shown in fig. 13.7

Taken from [10], with permission.

A Bayesian network over a set of n species represents a joint probability distribution over all the species. The network is restricted to a directed acyclic graph; there are no feedforward or feedback loops, and the word “directed” dictates that the flow from one species to another is in one direction only. These are limitations to which we shall return later. A further specification is a conditional probability distribution for each variable given its parents in the graph G.

MINI-INTRODUCTION TO BIOINFORMATICS

217

Fig. 13.8 Example of a Bayesian network consisting of five species. (Taken from [12], with permission.)

An example of a Bayesian network is given in fig. 13.8. The concentrations of the five species are random variables; each depends on the concentration of its parents, but not on other species. The parent of C is B, the parents of B are A and E. The connectivity of network structure in fig. 13.8 states that there is a probability P(A) that the concentration of A has a given value; a probability P(B|A,E) that B has a given value given the concentrations of A and E, the parents; and similarly for P(C|B), P(D|A), and P(E). The joint probability distribution, with the cited restrictions, is a product of these distributions, that is,

P(A, B, C, D, E) = P(A)P(B|A, E)P(C|B)P(D|A)P(E)

(13.1)

For the analysis of a gene network, gene expression is taken to be a probabilistic process and the level of gene expression of each gene is a random variable. The object of the analysis is the calculation of the joint probability distribution over the set of genes, not necessarily of all genes at a time but of a limited number, and to estimate its structure. There may be different but equivalent graphs that represent the observed distribution.

A generalization of eq. (13.1) is given by

n

 

3

 

P (X1, . . . , Xn) = P (Xi |P aiG)

(13.2)

i=1

where Xi is a set of random variables, and P aiG is the set of parents of Xi in the graph G. A Bayesian graph implies certain properties of conditional independence (called Markov independence): each variable Xi is independent of its nondescendants, given its parents in G. Two Bayesian graphs may have the same set of dependences and independences. For example, in the graphs X Y and Y X the variables X and Y are dependent; the two graphs are not the same but they are equivalent. In comparing with experiments one cannot distinguish among equivalent graphs.

The comparison of an assumed Bayesian network with experiments proceeds by a learning process of the network; any prior knowledge of the system is entered deterministically, the other variables are random. To judge the progress of any learning, a statistically motivated scoring function needs to be introduced and evaluated. The search for a suitable network seeks an optimal score in this function. There are choices of doing all that; in [13] a score S(G : D) is defined as the posterior probability of a graph G, given the data (of the assumed network), P (G|D):

S(G : D) = log P (G|D)

(13.3)

P (G|D) =

218 DETERMINATION OF COMPLEX REACTION MECHANISMS

The Bayesian probability formula is now invoked:

P (D|G)P (G)

(13.4)

P (D)

which relates the posterior probability P (G|D) to the prior probability P (D|G). We thus have

S(G : D) = log P (D|G) + log P (G) + C

(13.5)

where C is a constant independent of G and

 

P (D|G) =

P (D|G, )P ( )d

(13.6)

Fig. 13.9 Two subnetworks that show discovered features: (a) iron homeostasis; (b) mating response. The widths of the arcs correspond to the confidence in the feature. An edge is directed only if there is a high confidence in its orientation. Nodes circled with a dashed line correspond to genes that have been mutated in some of the samples. Arcs marked with a + sign are activators, with the size indicative of the confidence in this feature. (Figure and caption, with minor rewording, are taken from [13], with permission.)

Fig. 13.10 Global view and higher-order organization of modules. The graph depicts inferred modules (middle; number squares), their significantly enriched cis-regulatory motifs (right), and their associated regulators (left; ovals with black border for transcription factors or with green border for signal transduction molecules). Modules are connected to their significantly enriched motifs by solid blue lines. Module groups consisting of sets of modules that share a common motif, and their associated motifs, are enclosed in bold boxes. Only connected components that include two or more modules are shown. Motifs connected to all modules of their component are marked in bold. Modules are also connected to their predicted regulators. Red edges between a regulator and module are supported in the literature: either the module contains genes that are known targets of the regulator; or upstream regions of genes in the module are enriched for the cis-regulatory motif known to be bound by the regulator. Regulators tested experimentally are marked in yellow. Module groups are defined as sets of modules that share a single significant cis-regulatory motif. Module groups whose modules are functionally related are labeled (right). Modules belonging to the same module group seem to share regulators and motifs, with individual modules having different combinations of these regulatory elements. (Figure and caption, with minor changes, taken from [14], with permission.) (See color insert.)

Fig. 13.10 Global view and higher-order organization of modules. See p. 218 for full caption.

Соседние файлы в предмете Химия