Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Determination of Complex Reaction Mechanisms.pdf
Скачиваний:
45
Добавлен:
15.08.2013
Размер:
7.63 Mб
Скачать

9

Density Estimation

9.1Entropy Metric Construction (EMC)

In this chapter (based on [1]) we develop further the theory of the correlation method introduced in chapter 7. Consider the expression of the pair correlation function

Sij (τ ) = (xi (τ ) x¯i )(xj (t + τ ) x¯j ) × p xi (t)xj (t + τ ) dxi dxj (9.1)

where an ensemble average replaces the average over a time series of experiments on a single system. The pair correlation function defined in eq. (7.1) is the second moment of the pair distribution function and is obtained from eq. (9.1) by integration.

We choose a new measure of the correlation distance, one based on an information theoretical formulation [2,3]. A natural measure of the correlation distance between two variables is the number of states jointly available to them (the size of the support set) compared to the number of states available to them individually. We therefore require that the measure of the statistical closeness between variables X and Y be the fraction of the number of states jointly available to them versus the total possible number of states available to X and Y individually. Further, we demand that the measure of the support sets weighs the states according to their probabilities. Thus, two variables are close and the support set is small if the knowledge of one predicts the most likely state of the other, even if there exists simultaneously a substantial number of other states [4–6].

The information entropy gives the distance we demand in these requirements. The effective size of the support set of a continuous variable is [7]

S(X ) = eh(X )

(9.2)

97

98 DETERMINATION OF COMPLEX REACTION MECHANISMS

in which the entropy h(X ) is defined by

h(X ) = − p(x) log p(x)dx (9.3)

S

where S(X ) is the support set of X and p(x) is the probability density of X. Similarly, we denote the entropy of a pair of continuous variables X,Y, as h(X,Y ), which is related to the pair distribution function p(x,y) by an equation analogous to eq. (9.3). The mutual information I(X;Y ) between two stochastic variables is

 

 

 

I (X; Y ) = h(X ) + h(Y ) h(X, Y )

(9.4)

or in terms of the effective sizes of the support sets,

 

 

S(X, Y )

 

eh(X,Y )

 

 

 

 

=

 

= eh(X,Y )h(X )h(Y ) = eI (X;Y )

(9.5)

 

S(X )S(Y )

eh(X )eh(Y )

We define an EMC correlation distance based on information entropy as the minimum of eq. (9.5) regardless of the value of τ :

d

 

min

S(Xτ , Y )

 

min eI (Xτ ;Y )

(9.6)

xy =

S(Xτ )S(Y ) =

 

τ

τ

 

If the correlation of the species X and Y is Gaussian, then the expression (9.6) for the EMC distance usually leads to a result similar to that of the CMC distance (eqs. (7.3) and (7.4)).

The CMC and EMC distances satisfy the first three requirements of a metric:

(1)

dxy ≥ 0

 

(2)

dxy = dyx

 

(3)

dxy = 0 iff x y

 

(4)

dxz + dzy dxy

(9.7)

but not the triangular inequality, condition (4) in eq. (9.7). This situation can be remedied by the technique of stress minimization [8]. There is frequently little difference in the results of the prediction of a reaction pathway whether the stress minimization technique is applied or not.

It is helpful to treat a specific example of a reaction mechanism to illustrate the calculation of EMC distances and the prediction of a reaction pathway from simulated time series of concentrations. We compare these results with predictions of the CMC method. The example is chosen to show the differences between these two approaches, and to indicate the origin of these differences. The example is the mechanism

kA

k1

k2

k3

k4

k5

k6

k7

 

A −→ X1

−→ X2

−→ X3

−→ X4

−→ X5

−→ X6

−→ X7

−→ B

(9.8)

←−

←−

←−

←−

←−

←−

←−

←−

 

k

1

k

2

k

3

k

4

k

5

k

6

k

7

k

8

 

 

 

 

 

 

 

 

 

 

All of the reactions are first order with constant rate coefficients, except for the A to X1 step, which is enzyme catalyzed. The forward rates of reactions inside the chain are ki = 0.7 for all i. The backward rates for the reactions inside the chain are k−2 = 0.3, k−5 = 0.2, and ki = 0.1 otherwise. The species A and B are held at constant concentrations, with A = B = 1, and Xs is varied in a prescribed way (i.e., X8 is

DENSITY ESTIMATION

99

chosen to be the input species into the system with unit influx coefficient). Finally, the enzyme-catalyzed step has the effective rate coefficient kA = 60.0X8/(40.0 +

X8)(60.0 + X8).

The difficulties in this example arise from the self-inhibition of the enzyme catalysis by X8. The rate coefficient kA first increases with increasing concentration of X8 and then decreases. The hypersurface formed by eliminating the time dependence from the set of deterministic rate equations, by dividing the equation for each but one of the species by the equation for that one species, is folded over due to the quadratic dependence of kA on X8. In the simulation the concentration of X8 is varied randomly and the responses of the other species are calculated to give time series of 2,000 data points; these series are the starting point for both the EMC and the CMC analysis.

The diamonds in fig. 9.1 show the values of X8 versus X1 obtained from these time series. The folding of the hypersurface in the concentration space is shown in this projection onto the X1X8 plane. The space in this plane is divided into rectangles of varying size so that the distribution of points is uniform in each rectangle to within a given accuracy [9]. The density in each rectangle (i, j), where i and j are indices of the discretization of the continuous ranges of values of X1 and X8, is the pair probability distribution:

pij (x, y) =

Nij

 

(9.9)

Ntot Aij

where Nij is the number of points in the particular rectangle labeled {ij }, Ntot

is the

total number of points, and Aij is the area of the rectangle. We use the pair distribution p(X1,X8) to calculate the singlet distributions p(X1) and p(X8) by integration. Then, with eq. (9.3), and its analog for h(X,Y), we calculate EMC distances (see eq.(9.6)) for

Fig. 9.1 The diamonds plot the values of X1 versus X8 obtained from the simulated time series. The rectangles are the result of a partitioning algorithm (see the text). (From [1].)

100 DETERMINATION OF COMPLEX REACTION MECHANISMS

Fig. 9.2 EMC construction of the reaction mechanism of the example. (From [1].)

pairs of species. The primary connections among the species are thus obtained and their corresponding distances, as derived from a multidimensional scaling analysis, are shown in fig. 9.2.

The calculations of the CMC distances by means of eqs. (7.3) and (7.4), and subsequent multidimensional scaling analysis, yields a 2D projection (see fig. 9.3) that differs from the EMC analysis in two respects. The reaction pathway predicted by EMC correctly shows the close correlation of the enzymatic catalyst X8 and species X1. It also shows correctly the chain of linear reactions from X1 to X7. The CMC method also shows that X8 is correlated with X1, but more weakly. Less importantly, the CMC method yields a wrap-around in the placing of species 7. However, the 3D representation of the multidimensional CMC object implemented by stressing the original CMC distances into three rather than two dimensions results in the correct sequencing of the species: the species points lie on a 3D spiral, which when projected onto two dimensions results in the wrap-around effect.

The differences in the CMC and EMC predictions can be traced to the different pair probability densities estimated by these methods from the given time series. In fig. 9.4, we show a contour plot of the pair distribution function p(X1, X8) as calculated by a semi-nonparametric (SNP) method [10] with the time-series simulations for these two species superimposed. In comparison, we show in fig. 9.5 a Gaussian pair probability distribution, as is consistent with CMC, with the same means and variances as those of the EMC distribution. The deviations of the EMC from the Gaussian distribution show that higher than second moments contribute. Since the information entropy for

Fig. 9.3 CMC construction of the reaction mechanism given in eq. (9.8). (From [1].)

Fig. 9.4 Contour plot of the pair distribution function p(X1, X8); the values of the concentrations of X1 versus X8, as obtained from the simulations, are also shown. (From [1].)

Fig. 9.5 Gaussian joint normal distribution of the variables X1 and X8 with the means and covariances of those of the distribution in fig. 9.8. (From [1].)

Соседние файлы в предмете Химия