Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Solid-Phase Synthesis and Combinatorial Technologies

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

7 Мб

Скачать

☆

<<< < Предыдущая 10 11 12 13 14 15 16 17 18 19 20 2122 / 6522 23 24 25 26 27 28 29 30 31 32 33 34 > Следующая >>>

5.4 LIBRARY DESIGN VIA COMPUTATIONAL TOOLS 183

	DISTANCES	4-points
	DISTANCES	pharmacophore
		pharmacophore

3-points pharmacophore

ANGLES

CIRCLES MAY BE positive or negative charges, hydrogen bond donors or acceptors, lipophilicity centres, and so on

DISTANCES are in Angstroem; increments of distances/pharmacophore distances define the bins to be filled

ANGLES are in degrees; increments of angles/pharmacophore angles define the bins to be filled (see example below)

angle: 90°

angle: 120°

Figure 5.13 Three-dimensional molecular descriptors: several examples.

overall evaluation of the molecules and would overestimate the contribution of this property to the determination of global diversity for a virtual library. All classes of descriptors have proven useful to select diverse sets of compounds when compared to random selections (29), but the use of flexible 3D descriptors/pharmacophores for huge virtual libraries is, as of today, prevented by the computational time and memory required.

Any type of selected descriptor will provide a more or less complex characterization of each virtual library component. The use of similarity indices offers a straightforward method to evaluate similarities between virtual compounds. These indices use a bit-string representation for any descriptor (distances, fingerprints, pharmacophores, and so on) and, by simply counting the presence or the absence of specific bins and comparing the bit strings of virtual compounds, provide a numerical similarity index. The formula for the commonly used Tanimoto similarity index (71, 43), which can readily be transformed into the complementary diversity index, is reported in Fig. 5.14

184 SYNTHETIC ORGANIC LIBRARIES: LIBRARY DESIGN AND PROPERTIES

MOLECULE A (STANDARD)

MOLECULE B

C C

Tanimoto similarity index:

Ncommon

= 0.2

Ncommon + Ndifferent

Ndifferent

Tanimoto dissimilarity index:

= 0.8

Ncommon + Ndifferent

Figure 5.14 Similarity/dissimilarity comparison of two molecules: the Tanimoto index.

(molecules A and B are highly dissimilar in this hypothetical example). Such an index, even if questionable when applied to dissimilarity evaluations (54), allows fast comparisons between structures or subsets of molecules.

Compound selection is a core process of library design, and three main methods can be mentioned. Dissimilarity-based methods select compounds in terms of similarity/distance between individuals in chemical space. Clustering methods first group compounds into clusters based on similarity/distance and then choose representative compounds from different clusters. Partitioning methods first create a uniform cell space that subdivides the “chemical space,” then assign all virtual compounds to the relative cells according to their properties, and finally choose representative compounds from different cells.

Dissimilarity-based methods are the fastest selection techniques, as can be easily seen from the hypothetical example reported in Fig. 5.15. This method, based on the maximum dissimilarity approach (72–74), must first define a minimum acceptable dissimilarity threshold R and a maximum subset M of compounds to be selected. The paradigm starts randomly from any virtual component (A, Fig. 5.15) and then searches for the most dissimilar individual in the virtual population (B, Fig. 5.15). The following selection rounds choose the individual most dissimilar from all those already selected, and this process is repeated (A–E, Fig. 5.15) until either the last selection produces a candidate whose dissimilarity to the other selected components is less than R or until M number of virtual components (five in the example) have been selected. Such a method is easy and fast, does not require significant computing time, and, while intrinsically biased toward picking extreme compounds from a virtual library (73), allows reasonably diverse selections even in very large virtual libraries. Other dissimilarity-based selection methods, which can complement nicely the maximum dissimilarity approach, have been reported in the past few years (75–77); their performance in

	5.4	LIBRARY DESIGN VIA COMPUTATIONAL TOOLS 185
x		x		x	x
x		x		x	B
	D				B
	D				x
	x	x			x
		x			x
x	x			x
	x			x	x
					x
x		x
			x	E
x				E	x
x	x				x
	x	x		x	x
		x			x

		x	x		x
x	A		x		x
	A				C

CHEMICAL SPACE: maximum dissimilarity method

Figure 5.15 Dissimilarity-based selection methods: maximum dissimilarity approach.

selecting active compounds has been recently compared using available databases of drug molecules (78).

The clustering method (79–81) first groups the virtual library components in clusters, such that all the individuals belonging to the same cluster have a high similarity and the individuals in different clusters have a high degree of dissimilarity. Each cluster may contain many compounds or even just one compound, called a singleton. The purpose of clustering is to select the most representative set of diverse compounds, composed of a relatively small number of individuals: A method that reduces the number of singleton clusters is highly recommended. One class of clustering methods is called nonhierarchical, an example of which is depicted in Fig. 5.16. To begin with, a search of the nearest neighbors (the most similar compounds to the selected structure in the virtual set) is performed for two compounds (A and B, Fig. 5.16) and a previously defined number X of nearest neighbors (8, included in circles, for A and B). Both compounds must be in each others nearest-neighbor sets (B in A’s neighbors and A in B’s neighbors), and a predefined minimum number of virtual compounds must be present in both neighbors sets (5 out of 8, including A and B, in Fig. 5.16: compounds in bold characters). If these conditions are satisfied, as for A and B in the example, the two compounds are clustered together. If a compound is not among the nearest-neighbor set of another (as C for B, Fig. 5.16), or if the minimum number of shared nearest neighbors is not satisfied (as C for A, being only 4 out of 8 in common), the compounds are considered members of two different clusters. This strategy does not take into account the relationship between different clusters, limiting its usefulness, and usually gives high numbers of singletons, which makes the selection of small sets more complex, but it is fast and useful for very large sets of virtual libraries. The so-called Jarvis–Patrick (82, 83) clustering technique is the most commonly used and has been applied to several compound selections (84, 40). Another

186 SYNTHETIC ORGANIC LIBRARIES: LIBRARY DESIGN AND PROPERTIES

x	x x		x	x x
x	x	x	x	x		x x
x			x	x		x x
x x x		x	x	x		C	x
x	x	x x		x	x
x x	x	x	x	x x A
x x	x	x	x	x		B x
x	x	x x		x x		x

CHEMICAL SPACE: non-hierarchical clustering

Figure 5.16 Clustering selection methods: nonhierarchical approaches.

class of clustering methods, called hierarchical methods, provides parent–daughter relationships between clusters and allows clustering according to the selection needs; an example of these methods is shown in Fig. 5.17. The treelike dendrogram can be cut at any level to give clusters of different number and size as necessary (cut 1, 10 clusters; cut 2, 32 clusters; Fig. 5.17). The dendrograms can be generated from the first huge cluster either by going to the highest fragmentation (hierarchical) or by working

Figure 5.17 Clustering selection methods: hierarchical approaches.

5.4 LIBRARY DESIGN VIA COMPUTATIONAL TOOLS 187

from the bottom up (agglomerative). These methods are more information rich than the nonhierarchical techniques but are also significantly more computationally time consuming. A popular hierarchical technique, the Ward method (85), has been used frequently for compound selections (84, 86, 87). Clustering methods, while outperforming dissimilarity-based methods, are generally computationally time consuming and arbitrary, measuring diversity among a given virtual set without framing it in the chemical space generated by the selection descriptors; nevertheless successful applications of clustering protocols to medium–large databases are made by known druglike compounds and library individuals (88). It is difficult to add new virtual compounds designed to increase the set diversity using clustering because a total reclustering process should be performed after each compound addition.

Partitioning methods (89–91) first subdivide chemical space into predefined bins (Fig. 5.18) by simply defining incremental values of the descriptors used and then assign each virtual library component to its proper bin. The schematic examples in Fig. 5.18 consider two properties/descriptors Y (8 increments, A–H) and X (12 different increments, 1–12) that generate 96 different bins, or cells. The library partitioned in Fig. 5.18, top, is clearly diverse, spanning the large majority of the bins, while underneath we have a library containing a large number of compounds in a small area of chemical space producing a nondiverse library. These methods, compared to clustering, are less computationally demanding and allow a better evaluation of a library in terms of diversity. In fact, the voids in the space are evident (see Fig. 5.18), and additional virtual compounds can be designed to fit into the desired bin. As a drawback, very similar compounds can sometimes fall into two different cells: for example, if the molecular weight is one of the considered properties and two increments are 200 < MW < 300 and 300 < MW < 400, two compounds weighing respectively 299 and 301 daltons would occupy two different bins. Recent examples of large library diversity analysis by partitioning techniques have been reported (92, 93).

In general, dissimilarity-based selections are less time and computationally consuming but are also less accurate in predicting and/or selecting diverse sets; clustering methods are good when large agglomerations of similar compounds are present in the virtual set, as they discriminate well among them; partitioning methods are reasonably fast and effectively represent large varied sets of virtual candidates. It is very difficult to give absolute rules regarding the best choice in terms of descriptors, indices, and selection methods to select the most diverse libraries: This process is highly project specific, and the best solution readily changes from case to case. However, as mentioned earlier, it has been thoroughly proven that a selection using computational diversity/similarity analysis (see also the next Section) is more successful than a random selection in producing a more meaningful library.

5.4.2 Focused Libraries

We could say that it is enough to substitute the concept of dissimilarity with that of similarity in the previous section to determine the application of computational library design to focused libraries. In fact, the tools used to describe and to select among virtual

188 SYNTHETIC ORGANIC LIBRARIES: LIBRARY DESIGN AND PROPERTIES

CHEMICAL SPACE

PARTITIONING PROTOCOLS:

a diverse set (top) and a similar set (bottom

CHEMICAL SPACE

x x

x x x

x x

Figure 5.18 Partitioning selection methods: diverse (top) and similar (bottom) sets of compounds.

molecules may be the same as seen previously, while the goal is now to find the most similar sets of reagents/products to produce a library similar in respect to a known structure (compound A, Fig. 5.19), or to a structural hypothesis (e.g., fit into a pharmacophore). We should always remember, though, that chemical and biological diversity, or similarity, are not the same thing. Very similar structures may have a

5.4

LIBRARY DESIGN VIA COMPUTATIONAL TOOLS 189

x x

x A x x

x x x

x x

CHEMICAL SPACE:

diverse library

focused library

Figure 5.19 Chemical similarity in the chemical space.

Tanimoto index > 0.9 but may be very different in terms of activity (chemically similar, biologically diverse), while completely different structures are known to have the same biological activity (chemically diverse, biologically similar). This intrinsic drawback to the computational screening of virtual libraries should always be considered when interpreting screening results of a computationally designed library, and real data should be used to refine any virtual SAR information based on chemical similarity or dissimilarity.

The number of individuals in a focused library is usually small so that it is easy to rapidly design a small set, then prepare and test it. Once this preliminary quantitative structure activity relationship (QSAR) is known, the available information can be used to produce a more significant second focused library, and the iterative cycle can continue until an active product with the desired potency is obtained. A few other points are worth mentioning. The significantly smaller set of virtual library components usually allows the use of 3D descriptors, especially as threeor four-point pharmacophores. Rather than using similarity/dissimilarity indices, such as Tanimoto’s, the comparison of fingerprints/pharmacophores can be carried out and even stored in most cases. The virtual focused library will contain mostly clusters of very similar individuals, and clustering becomes the selection method of choice where the similarity parameters can be tailored so as to obtain highly diversified subsets.

Four recent examples of computational selection of the most similar compounds in a virtual library set will be now reviewed. Either a few different lead structures were available as starting points for the selection or a detailed knowledge of the target allowed to design and select a medium/small focused library.

The first two examples, which imply that similarity between compounds is related to similar activity profiles, use two popular selection methods: genetic algorithms (GA; 96–100) and simulated annealing (SA; 101, 102).

Pentapeptides were used (94) as a training set for the design of bradykinin-poten- tiating peptides. The two known active sequences 5.1 and 5.2 (103, 104) were used to

190 SYNTHETIC ORGANIC LIBRARIES: LIBRARY DESIGN AND PROPERTIES

design and compare virtual focused libraries, as shown in Fig. 5.20. The authors randomly selected 100 pentapeptides, calculated their similarity with the two lead peptides using either amino acid–based descriptors (reagent sets) or topological descriptors (product sets) developed by Hall and Kier (35), and used a GA to select the most similar compounds out of the 3,200,000 virtual pentapeptides obtained by using the 20 natural L-amino acids as building blocks. The GA-based techniques apply evolutionary concepts to library design and optimization, as shown schematically in Fig. 5.21. Two parent sequences (27 and 74 in the example) are randomly selected from the first 100-membered random pool (a, Fig. 5.21), and submitted to crossover by mixing the two sequences (b, Fig. 5.21). One of them is then mutated by randomly replacing one building block with all the other 19 L-amino acids of the monomer set (c, Fig. 5.21). Each newly generated sequence is compared, in terms of similarity, with the two lead peptides, then with the two parent compounds, and the two highest scoring peptides are stored in the 100-membered pool in place of the two parent sequences: After this selection cycle either the two parent peptides are more similar to the leads and the initial set is unchanged (d, Fig. 5.21) or two of the newly generated sequences are better and replace their parents in the 100-membered set (e, Fig. 5.21). This extremely rapid iterative process resembles evolution in producing negative (d, Fig. 5.21) or positive (e, Fig. 5.21) mutations, and its iteration (f, Fig. 5.21) leads to a

Glu		Ala	Lys		Ala
Val	Trp	Lys	Val	Trp	Pro
	5.1			5.2
	5.1

STRUCTURAL INFORMATION

focused virtual library design

AA2 AA4

AA1

AA3

AA5

AA1-5 = 20 L-amino acids 3,200,000 virtual compounds

Genetic Algorithm-

based selection

AA2 AA4

AA1

AA3

AA5

100 best scoring, most similar pentapeptides

Figure 5.20 Virtual screening of a 3,200,000-membered pentapeptide virtual library inspired by structures 5.1 and 5.2 using GA-based selection.

5.4 LIBRARY DESIGN VIA COMPUTATIONAL TOOLS

191

AA2

AA4

AAx

AA1

AA3

AA5

AAx

101

1-100

randomly

AA7

AA9

selected pentapeptides

AA6

AA8

AA10

102

AAx

AA2

AA9

1-100

negative evolution

AA1

AA3

AA10

101

AAx

AA4

AAx

AA6

AA8

AA5

AAx

103-121

1-26, 28-73, 75-100, 102, 114

positive evolution

AAx AAx

AAx

genetically selected, most similar pentapeptides

a: random selection; b: crossover; c: single point mutation; d,e: selection of the two best scoring sequences among 27, 74 and 101-121; f: repeat a-e (2000 times).

Figure 5.21 GA-based selection: a detailed protocol for a 3,200,000-membered pentapeptide virtual library.

100-membered population of similarity-based pentapeptides. In the reported example, the 100 most similar pentapeptides generated after 2000 iterations bore a striking resemblance to the 100-membered set generated after exhaustive comparison of all the 3,200,000 virtual pentapeptides. The amino acid composition for each position was almost identical for the final GA iteration and for the exhaustive search of the virtual set. This impressive result, which derived from the use of topological descriptors related to the properties of the whole pentapeptides for similarity comparisons (the reagent-based descriptors performance was significantly worse), was possible because only several thousands of structures were considered and evaluated during the 2000

192 SYNTHETIC ORGANIC LIBRARIES: LIBRARY DESIGN AND PROPERTIES

iterative cycles, allowing a significant reduction of the necessary computational efforts for the 3,200,000 total virtual set. This fact, along with the possibility to simultaneously optimize all the properties of virtual library individuals, makes GA methods very useful. The same procedure, using dissimilarity comparisons, can be applied for the selection of diverse subsets of compounds from extremely large databases: The size of the actual library can also be tailored by the organic chemist that will eventually perform the synthesis. Recent examples of genetic algorithm-driven library design and screening have appeared (105, 106), and the application of this technique to combinatorial technologies has been reviewed (107, 108).

Another example from the same group (95) used an alternative stochastic method, simulated annealing (101, 102), to rapidly select the most similar virtual library individuals when a lead compound structure was available. A virtual library of tripeptoids was considered for the generation of compounds with α1-adrenergic activity, in accordance with a previous report (109). The structure of the virtual library (15,625 individuals), the building blocks used, and the structure of one of the three original tripeptoid hits, 5.3, are reported in Fig. 5.22. Structure 5.3 was used as a lead compound in a validation study whose strategy is reported in Fig. 5.23. A random starting point was chosen (a, Fig. 5.23), its similarity to the lead calculated (b, Fig. 5.23), and an iterative cycle of substitution of one or two of the substituents started (c, Fig. 5.23). If the similarity of the new compound to the lead was higher than that of the parent, the structure was carried forward (d, Fig. 5.23), while the parent was kept if the new compound was less similar to the original lead (e, Fig. 5.23). Four series of cycles were performed, from which three identified the tripeptoid lead compound after a few tens of iterative cycles, while the fourth required a longer path. The validation having been successful, the authors used both a natural peptide opiate, met-enkephalin (5.4, Fig. 5.24), and morphine (5.5, Fig. 5.24) as lead compounds for µ-opiate receptor

R3 O

NH2

R1-3 = 25 different alkyls or aryls

15,625 virtual library components

5.3, known lead compound, Ki=310 nM

Figure 5.22 Virtual library of 15,625 tripeptoids with potential -opiate activity containing the active lead compound 5.3.

<<< < Предыдущая 10 11 12 13 14 15 16 17 18 19 20 2122 / 6522 23 24 25 26 27 28 29 30 31 32 33 34 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.201320 Мб12Smith M.B., March J. - March[ap]s Organic Chemistry 5th ed. (2001).djvu
#
15.08.20133 Мб14Sodium Channels and Neuronal Hyperexcitability.pdf
#
15.08.20135 Мб16Solid Support Oligosaccharide Synthesis.pdf
#
15.08.20131 Мб27Solid-Phase Organic Syntheses.pdf
#
15.08.201328 Мб70Solid-Phase Organic Synthesis.pdf
#
15.08.20137 Мб21Solid-Phase Synthesis and Combinatorial Technologies.pdf
#
15.08.20136 Мб22Springer Science - 2005 - Reverse Engineering of Object Orie.pdf
#
15.08.20139 Мб85Structure Elucidation by NMR in Organic Chemistry.pdf
#
15.08.20132 Мб39Structure-Activity relationships of the cannabinoids Edited by R.S. Rapaka, A. Makriyannis [1987].pdf
#
15.08.20132 Мб20Swartz Analytical Techniques in Combinatorial Chemistry.pdf
#
15.08.2013142 Кб9Synthesis and cytostatic activity of substituted 6-phenylpurine bases and nucleosides[c] Application of the Suzuki-Miyaura Cross-Coupling reactions of 6-chloropurine derivatives with phenylboronic acid.pdf