Harman [Har92c] suggests two possible methods for implementing RF on Boolean systems. The first is 
to present the user with a list of possible new query terms. These can be chosen, for example, by the 
term distribution in the relevant documents. This means selecting those terms that appear more often in 
the relevant than non relevant documents and which would be useful to include in a new query. The 
second approach is for the system to automatically modify Boolean queries. An example of the latter 
type of query modification can be found in the system proposed by Khoo and Poo, [KP94], which is 
intended to automatically modify both the terms and the Boolean connectives of queries based on the 
documents marked relevant by a user. 
An alternative to exact match systems, such as the Boolean model, are best match systems. These 
systems use term weights, such as tf and idf, to rank documents in decreasing order of matching score 
or estimation of relevance. The two most common best match models are the vector space model, 
which orders documents in decreasing similarity of query and document, [Sal71], and the probabilistic 
model, [RSJ76], which orders documents based on an estimate of the probability of relevance of a 
document to a query. In section 2.2.2 we discuss the vector space model, in section 2.2.3 we discuss the 
probabilistic model. 
2.2.2 Vector space model 
In the vector space model, a document is represented by a vector of n weights, where n is the number of 
unique terms in the document collection. Figure 4 shows an example vector where xi is the weight
3
 of 
the ith term in document x if x contains the term, and 0 if the term is not present in x. 
x = (x , x ,..., x )
1
2
n
Figure 4: Document vector 
Queries are also represented as a vector of length n, and the similarity of the document vectors to a 
query vector gives a retrieval score to each document, allowing comparison and ranking of documents. 
A range of similarity measures exists to calculate this similarity, e.g. DICE, inner product, cosine 
correlation, [VR79, Chap 3]. Equation 3 shows the cosine correlation, one of the more common vector 
space matching functions. 
n
(term
  
ik
  qterm
jk
)
k =1
cos(doc
i
, query
j
) =
  
n
n
(term
  
ik
)
2
(qterm
  
k=1
jk
)
2
k=1
Equation 3: Cosine correlation between document doci and queryj 
Unlike the Boolean model, which retrieves documents according to the query terms and query 
connectives, in the best match models all documents that contain at least one query term will receive a 
non zero score; the highest score going to documents that contain all the query terms. Documents that 
contain only some of the query terms will be ranked according to the sum of the weights of the query 
terms they contain. The documents that contain more query terms or contain query terms with a higher 
discriminatory power (term weight) will be retrieved above those that contain fewer query terms or 
query terms with lower weights. Similarity is then a function of term overlap between query and 
document, and the weights assigned to the terms. 
Rocchio [Roc71] is generally credited with the first formalisation of a RF technique, developed on the 
vector space model. In [Roc71] he defines the problem of retrieval as that of defining an optimal query; 
one that maximises the difference between the average vector of the relevant documents and the 
average vector of the non relevant documents. As discussed in section 1, it may not always be possible 
for a user to submit such an optimal query, so RF is required to bring the query vector closer to the 
mean of the relevant documents, and further from the mean of the non relevant documents. This is 
                                                           
3
Some implementations of the vector space model use 1 if a term occurs in a document, 0 if it does not occur. 
Most implementations will use some form of tf*idf weighting and some form of length normalisation will usually 
be performed to avoid retrieval bias towards long documents. 
 6 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting