accomplished by the addition of query terms and by the reweighting of query terms to reflect their 
utility in discriminating relevant from non relevant documents. 
Rocchio s original formula for defining a new query vector in the vector space model, is as follows, 
Equation 4 
n
n
1
2
Q
R
  
- 1
S
  
1
= Q
0
+ 1n
i
i
1
n
2
i=1
i=1
Equation 4: Rocchio s original formula for modifying a query  
based on relevance information 
where    Qo = initial query vector, Q1 = new query vector, n1 = number of 
relevant documents, n2 = number of non relevant documents, Ri = vector for the 
ith relevant document, Si = vector for the ith non relevant document 
The new query vector is the original query vector plus the terms that best differentiate the relevant 
documents from the non relevant documents. A modified query contains new terms (from the relevant 
documents) and has new weights attached to the query terms. If the weight of a query term drops to 
zero or below, it is removed from the query.  
This formula is capable of being constrained further, e.g. by weighting the original query vector so that 
the original query terms contribute more to the modified query than the new query terms or by varying 
the amount of feedback considered. A variation of this formula was tested experimentally with positive 
results on the SMART retrieval system [Roc71]. The small size of the document collection used in 
Rocchio s experiments meant that certain modifications had to be made to the formula. For example, 
although Rocchio tried to keep the size of the relevant and non relevant feedback sets identical, this 
was not always possible. In addition a term was only considered if it was one of the original query 
terms or if it appeared in more relevant than non relevant documents and in more than half the relevant 
documents. These modifications highlight the recurring difficulty of aligning theory with experimental 
practice. 
Ide [Ide71] extended the SMART relevance feedback experiments, examining different aspects of RF, 
such as only using relevant documents for feedback, varying the number of documents used for RF, and 
using non relevant documents. She found that using only relevant documents for feedback or varying 
the number of documents used at each iteration of feedback gave inconclusive or poor results. 
  
Her third strategy was a variation of Rocchio's original formula, using only the first non relevant 
document found, si. The formula used by Ide is shown in Equation 5. This was compared against 
Rocchio's original formula. Although this technique, the Ide dec hi  formula, did not improve results 
greatly it was more consistent; improving the performance of more queries. 
nr
Q = Q +
  
r - s
1
0
i
i
i
Equation 5: Ide dec hi formula for modifying a query based on relevance information 
where Q0 = initial query vector, Q1 = new query vector, nr = number of relevant 
documents, ri = vector for the ith relevant document, si = vector for the first non 
relevant document 
A common modification to the vector space RF formulae, e.g. [IdS71], is to weight the relative 
contribution of the original query, relevant and non relevant documents to the RF process. In Equation 
6, the 
, 
 and 
 values specify the degree of effect of each component on RF. 
n
n
1
2
Q
1
=
.Q
0
+ n
R
  
i
-
S
  
i
1
n
2
i =1
i=1
Equation 6: Rocchio modified relevance feedback formula 
 7 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting