Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
искусственный интеллект.pdf
Скачиваний:
26
Добавлен:
10.07.2020
Размер:
27.02 Mб
Скачать

suai.ru/our-contacts

quantum machine learning

36

A. V. Platonov et al.

quantum measurement paradigm [10]. As a result, it should lead to a more accurate formulation of requests context and, hence, to Þnding higher relevance of issued documents. In this regard, the model of semantic space looks very limited, as it is based on the so-called bag-of-words approach without word order consideration, when the meaning is coded by the counters of words included in the context of the object.

Quantum cognition represents one of the modern approaches taking into account the contextuality of interaction of the user and smart information system [2, 5]. Interestingly, in Ref. [4] there were the Þrst attempts to describe psychological aspects in decision-making via quantum probabilistic methods and quantum measurement theory approaches.

Nowadays, there is an increased interest in the application of quantum formalism to information retrieval problems, see, e.g., [6, 8, 9, 12, 13]. In particular, it is shown that information retrieval models such as logical, probabilistic, and vector ones, can be described with quantum formalism in Hilbert space. As a result, it is possible to take into account the contextuality of queries [8]. Notably, in [13] the authors formulate the quantum probability ranking principle, which is a generalization of a well-known probability ranking principle used to assess the criteria for ranking of issued documents considering links between documents. At the same time, to obtain the best overall search efÞciency, the information system ranks documents not only in descending order of the probability of their relevance to the user, but also considering the effects of Òquantum interference.Ó As shown in [11, 12], the analogies with quantum measurement of photon polarization considering in the framework of quantum optics paradigm are relevant here, cf. [1].

The aim of this paper is to demonstrate a quantum-like ranking algorithm using a Bell test performed with text samples given in Russian taking into account the contextuality and compatibility of different queries.

2Bell Test in the Problem of Cognitive Semantic Information Retrieval

2.1 Bell Inequality and Its Interpretation

In the problem under consideration, the most suitable toolkit is quantum theory, which originally aimed to simulate incompatible physical experiments. In such experiments a speciÞc experimental conÞguration (context) allows to consistently determine a certain set of physical quantities, which become undeÞned in another experimental conÞguration (context). In quantum cognitive science, such incompatible experimental situations correspond to incompatible cognitive contexts, the use of which in decision-making leads to violations of classical (Boolean) logic. A consequence of quantum theory is the possible occurrence of correlations between measurement results (conducted in physics over remote systems), which are stronger than allowed by classical theories with hidden parameters [7, 10].

suai.ru/our-contacts

quantum machine learning

Non-separability Effects in Cognitive Semantic Retrieving

37

Suppose there are two subsystems A and B, and experiments are performed, respectively, each having two possible outcomes coded by values of ±1. This condition is satisÞed, for example, by the system of electron pairs, whose spins are measured. Each of experiments A and B is performed in two versions A and A , B and B , which differ in the direction of spin measurement: 0, 90, 45, and 135, respectively. These directions are combined in four possible ways: {A, B}, A, B , A , B , A , B in each conÞguration to obtain a statistically signiÞcant set of (probabilistic) outcomes. The so-called ClauserÐHorneÐShimonyÐ Holt (CHSH) type Bell inequality

S = AB AB + A B + A B 2

(1)

is used to distinguish purely quantum and classical correlations in theory. Inequality

(1) holds for separable states, if factorization for statistically average values is true, i.e., . . . = . . . . . . .

In quantum theory, TsirelsonÕs bound predicts the existence of quantum states of

two subsystems for which (see [3])

 

 

 

 

2 < S 2 2.

(2)

In physics, these states are called entangled or non-separable [7]. Violation of CHSH inequality provides evidence for contextual interdependence for a given set of systems and experimental procedures.

2.2 Bell Test in Semantic Retrieving

In this paper, we use a Bell test, the experiment that checks inequalities discussed above, and conducted with objects in the semantic Hilbert space. In our experiment we use hyperspace analogue to language (HAL) algorithm for obtaining the vector representation of words, which unlike the popular bag of words allows to consider the word order in a sentence and thereby increasing the accuracy of determining dependencies between words. The HAL algorithm also gives the vector representation of words based on the vocabulary index (the word matches some unique numeric identiÞer) and the set of documents processed. To obtain such a representation, the matrix is built with cells containing sums of distances between the word corresponding to a row and the word corresponding to a column in the frame of the text corpus. At the same time, the relationship distance from a word in

ú

a row to a word in a column if the word in the column is to the rightI is modeled. Thus, the word order in the sentence is taken into account. Moreover, the distance is not calculated between all the words in the matrix, but only between those pairs of words that are closer to each other than a predetermined distance, which is called the HAL window size.

suai.ru/our-contacts

quantum machine learning

38 A. V. Platonov et al.

To perform the Bell test we use text Þles obtained from several Wikipedia articles in Russian on Programming Language: Programming Language, Programming, Java, C++. The content of these articles without the layout was subjected to preprocessing. While reading the Þles, a text index is constructed. It contains the normalized forms of words and their correspondence to a unique numeric identiÞer. This numeric identiÞer is used to determine one vector coordinate in the vector space of words obtained by HAL algorithm. After obtaining the index of the words for the document being processed, HAL matrix is built (see the next subsection). This allows to get a vector representation of the words of the document, and the average value of the sum of these vectors is the document vector. Query word vectors are derived from the same HAL matrix. The resulting document and query vectors are used to calculate a Bell parameter.

Thus, as far as the index and vector representations of words are obtained, the words of the user query, such as Programming Language, are reduced to a vector form and then normalized. On these words two bases are constructed deÞning the subspaces of vectors corresponding to the two query words. To obtain a basis, the Schmidt orthogonalization algorithm is used. The document vector decomposes

according to these bases:

 

Dw1 = a |+ A + b |− A ; Dw2 = c |+ B + d |− B ,

(3)

where Dw1 is the document vector; |+ A (|− A) and |+ B (|− B ) are bases in which queries A and B are fully relevant (not relevant) to the document, respectively. Next, we deÞne (projective) measurement operators for queries A and B, respectively:

Ax = |+ AA −| + |− AA +| ; Az = |+ AA +| − |− AA −| ;

(4)

Bx = |+ BB −| + |− BB +| ; Bz = |+ BB +| − |− BB −| .

The operators related to the same queries (for example, to A or to B) do not commute with each other and correspond to operators of measurement of spin projections in quantum physics [7, 10]. Operator Az allows us to determine the degree of the document relevance with the Þrst word of the query. In this case, if the document vector in the basis of the Þrst word is represented by the vector [a, b] (3), then the probability that the document relates to the subject of the Þrst word can

2

2

.

be obtained by the Born rule Az D = D |Az| D = ab

We also use the combinations B+ = (Bz Bx )/

2, B= −(Bz + Bx )/

 

2

which determine the degree of correspondence of the document to the second word in the rotated basis. Note that the calculations can be performed in the basis of one of the words, for example, A. The resulting set of operators is used to calculate the Bell parameter of queries:

Sq = | AzB+ + Ax B+ | + | AzBAx B| .

(5)