Density Estimation

9.1Entropy Metric Construction (EMC)

Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Determination of Complex Reaction Mechanisms.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

7.63 Mб

Скачать

☆

<<< < Предыдущая 5 6 7 8 9 10 11 12 13 14 15 1617 / 3417 18 19 20 21 22 23 24 25 26 27 28 29 > Следующая >>>

I (X; Y ) = h(X ) + h(Y ) − h(X, Y )

or in terms of the effective sizes of the support sets,

= eh(X,Y )−h(X )−h(Y ) = e−I (X;Y )

S(X )S(Y )

eh(X )eh(Y )

In this chapter (based on [1]) we develop further the theory of the correlation method introduced in chapter 7. Consider the expression of the pair correlation function

Sij (τ ) = (xi (τ ) − x¯i )(xj (t + τ ) − x¯j ) × p xi (t)xj (t + τ ) dxi dxj (9.1)

where an ensemble average replaces the average over a time series of experiments on a single system. The pair correlation function deﬁned in eq. (7.1) is the second moment of the pair distribution function and is obtained from eq. (9.1) by integration.

We choose a new measure of the correlation distance, one based on an information theoretical formulation [2,3]. A natural measure of the correlation distance between two variables is the number of states jointly available to them (the size of the support set) compared to the number of states available to them individually. We therefore require that the measure of the statistical closeness between variables X and Y be the fraction of the number of states jointly available to them versus the total possible number of states available to X and Y individually. Further, we demand that the measure of the support sets weighs the states according to their probabilities. Thus, two variables are close and the support set is small if the knowledge of one predicts the most likely state of the other, even if there exists simultaneously a substantial number of other states [4–6].

The information entropy gives the distance we demand in these requirements. The effective size of the support set of a continuous variable is [7]

S(X ) = eh(X )

(9.2)

98 DETERMINATION OF COMPLEX REACTION MECHANISMS

in which the entropy h(X ) is deﬁned by

h(X ) = − p(x) log p(x)dx (9.3)

where S(X ) is the support set of X and p(x) is the probability density of X. Similarly, we denote the entropy of a pair of continuous variables X,Y, as h(X,Y ), which is related to the pair distribution function p(x,y) by an equation analogous to eq. (9.3). The mutual information I(X;Y ) between two stochastic variables is

We deﬁne an EMC correlation distance based on information entropy as the minimum of eq. (9.5) regardless of the value of τ :

d		min	S(Xτ , Y )	min e−I (Xτ ;Y )	(9.6)
	xy =		S(Xτ )S(Y ) =
		τ		τ

If the correlation of the species X and Y is Gaussian, then the expression (9.6) for the EMC distance usually leads to a result similar to that of the CMC distance (eqs. (7.3) and (7.4)).

The CMC and EMC distances satisfy the ﬁrst three requirements of a metric:

(1)	dxy ≥ 0
(2)	dxy = dyx
(3)	dxy = 0 iff x =ˆ y
(4)	dxz + dzy ≥ dxy	(9.7)

but not the triangular inequality, condition (4) in eq. (9.7). This situation can be remedied by the technique of stress minimization [8]. There is frequently little difference in the results of the prediction of a reaction pathway whether the stress minimization technique is applied or not.

It is helpful to treat a speciﬁc example of a reaction mechanism to illustrate the calculation of EMC distances and the prediction of a reaction pathway from simulated time series of concentrations. We compare these results with predictions of the CMC method. The example is chosen to show the differences between these two approaches, and to indicate the origin of these differences. The example is the mechanism

A −→ X1

−→ X2

−→ X3

−→ X4

−→ X5

−→ X6

−→ X7

−→ B

(9.8)

←−

−

All of the reactions are ﬁrst order with constant rate coefﬁcients, except for the A to X1 step, which is enzyme catalyzed. The forward rates of reactions inside the chain are ki = 0.7 for all i. The backward rates for the reactions inside the chain are k−2 = 0.3, k−5 = 0.2, and k−i = 0.1 otherwise. The species A and B are held at constant concentrations, with A = B = 1, and Xs is varied in a prescribed way (i.e., X8 is

DENSITY ESTIMATION

chosen to be the input species into the system with unit inﬂux coefﬁcient). Finally, the enzyme-catalyzed step has the effective rate coefﬁcient kA = 60.0X8/(40.0 +

X8)(60.0 + X8).

The difﬁculties in this example arise from the self-inhibition of the enzyme catalysis by X8. The rate coefﬁcient kA ﬁrst increases with increasing concentration of X8 and then decreases. The hypersurface formed by eliminating the time dependence from the set of deterministic rate equations, by dividing the equation for each but one of the species by the equation for that one species, is folded over due to the quadratic dependence of kA on X8. In the simulation the concentration of X8 is varied randomly and the responses of the other species are calculated to give time series of 2,000 data points; these series are the starting point for both the EMC and the CMC analysis.

The diamonds in ﬁg. 9.1 show the values of X8 versus X1 obtained from these time series. The folding of the hypersurface in the concentration space is shown in this projection onto the X1–X8 plane. The space in this plane is divided into rectangles of varying size so that the distribution of points is uniform in each rectangle to within a given accuracy [9]. The density in each rectangle (i, j), where i and j are indices of the discretization of the continuous ranges of values of X1 and X8, is the pair probability distribution:

pij (x, y) =	Nij	(9.9)
	Ntot Aij
where Nij is the number of points in the particular rectangle labeled {ij }, Ntot		is the

total number of points, and Aij is the area of the rectangle. We use the pair distribution p(X1,X8) to calculate the singlet distributions p(X1) and p(X8) by integration. Then, with eq. (9.3), and its analog for h(X,Y), we calculate EMC distances (see eq.(9.6)) for

Fig. 9.1 The diamonds plot the values of X1 versus X8 obtained from the simulated time series. The rectangles are the result of a partitioning algorithm (see the text). (From [1].)

100 DETERMINATION OF COMPLEX REACTION MECHANISMS

Fig. 9.2 EMC construction of the reaction mechanism of the example. (From [1].)

pairs of species. The primary connections among the species are thus obtained and their corresponding distances, as derived from a multidimensional scaling analysis, are shown in ﬁg. 9.2.

The calculations of the CMC distances by means of eqs. (7.3) and (7.4), and subsequent multidimensional scaling analysis, yields a 2D projection (see ﬁg. 9.3) that differs from the EMC analysis in two respects. The reaction pathway predicted by EMC correctly shows the close correlation of the enzymatic catalyst X8 and species X1. It also shows correctly the chain of linear reactions from X1 to X7. The CMC method also shows that X8 is correlated with X1, but more weakly. Less importantly, the CMC method yields a wrap-around in the placing of species 7. However, the 3D representation of the multidimensional CMC object implemented by stressing the original CMC distances into three rather than two dimensions results in the correct sequencing of the species: the species points lie on a 3D spiral, which when projected onto two dimensions results in the wrap-around effect.

The differences in the CMC and EMC predictions can be traced to the different pair probability densities estimated by these methods from the given time series. In ﬁg. 9.4, we show a contour plot of the pair distribution function p(X1, X8) as calculated by a semi-nonparametric (SNP) method [10] with the time-series simulations for these two species superimposed. In comparison, we show in ﬁg. 9.5 a Gaussian pair probability distribution, as is consistent with CMC, with the same means and variances as those of the EMC distribution. The deviations of the EMC from the Gaussian distribution show that higher than second moments contribute. Since the information entropy for

Fig. 9.3 CMC construction of the reaction mechanism given in eq. (9.8). (From [1].)

Fig. 9.4 Contour plot of the pair distribution function p(X1, X8); the values of the concentrations of X1 versus X8, as obtained from the simulations, are also shown. (From [1].)

Fig. 9.5 Gaussian joint normal distribution of the variables X1 and X8 with the means and covariances of those of the distribution in ﬁg. 9.8. (From [1].)

<<< < Предыдущая 5 6 7 8 9 10 11 12 13 14 15 1617 / 3417 18 19 20 21 22 23 24 25 26 27 28 29 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.2013192.98 Кб14Dankwardt A. - Immunochemical assays in pesticide analysis(en).pdf
#
15.08.20132.79 Mб39De Cuyper M., Bulte J.W.M. - Physics and chemistry basis of biotechnology (Vol. 7) (2002)(en).pdf
#
15.08.20139.98 Mб62Dean J.A. Lange's handbook of chemistry 1999.pdf
#
15.08.20139.98 Mб14dean.pdf
#
15.08.20138.9 Mб17Demeunynck M., Bailly C., Wilson W.D. - Small molecule DNA and RNA binders. From synthesis to nucleic acid complexes (2003)(en).djvu
#
15.08.20137.63 Mб48Determination of Complex Reaction Mechanisms.pdf
#
15.08.20133.94 Mб29Drug Targeting Organ-Specific Strategies.pdf
#
15.08.20136.42 Mб32Ellinger Y., Defranceschi M. (eds.) Strategies and applications in quantum chemistry (Kluwer, 200.pdf
#
15.08.201311.09 Mб21Emerging Tools for Single-Cell Analysis.pdf
#
15.08.20135.93 Mб72Enzymes (Second Edition).pdf
#
15.08.20131.57 Mб49Fitts D.D. - Principles of Quantum Mechanics[c] As Applied to Chemistry and Chemical Physics (1999)(en).pdf