normalisation of the relevant and non relevant documents by the number of relevant/non relevant 
documents. Equation 15 shows the Ide regular formula. 
n
n
1
2
Q
1
= Q
0
+
R
  
i
-
S
  
i  
i=1
i=1
Equation 15: Ide regular 
Two of the other algorithms were modifications of F4. The first used the ratio [Rob86] n N  to replace 
i
the 0.5 correction factor introduced to cope with the case where no relevant documents were retrieved 
(R = 0) or when no relevant documents contain an individual term (r = 0), Equation 16. 
    
n
   r
i
    
i
+
    
(
R - r
)
    
N    
i
+1
w
x
= log
i
    
n
   n
i
    
i
- r
i
+
     N
(
- n
)
    
N    
i
- R + r
i
+1
Equation 16: Modified F4 function using ni/N 
The second modified F4 scheme placed extra emphasis on terms that appeared in the query. Specifically 
this was achieved by assuming that a term s appearance in the query is equivalent to an occurrence in 3 
relevant documents (i.e. ri = ri + 3, R = R + 3).  
Salton and Buckley found that, for all collections, except the NPL collection
19
, the models performed 
fairly consistently with respect to each other, with the Ide dec hi performing best overall. In general, 
although the probabilistic model performed well, it did not quite reach the performance level set by the 
vector space models. This was advantageous as the vector space Ide dec hi RF technique is 
computationally very efficient.  
Salton and Buckley also provide some general guidelines based on predicting RF performance. For 
example, short queries, on the whole, do better with RF than longer queries. Longer queries, or those 
queries with more terms that appear in the relevant documents, will tend to achieve better initial 
rankings. This means that there is greater potential improvement to be gained from RF on short initial 
queries. For a similar reason queries that do poorly on initial runs tend to obtain greater improvements 
with RF than those with good initial retrieval runs 
Finally, domain specific collections also perform better with RF than domain independent collections. 
This may be because it is easier to select good expansion terms from a domain dependent collection, or 
because the ambiguity of search terms is less significant.  
As well as considering variations on the probabilistic and vector space models Salton and Buckley 
investigated weighting document terms (as opposed to binary weighting based on term 
presence/absence in each document) and three variations on query expansion   no expansion (only 
reweighting), full expansion by all the terms in the relevant documents and partial expansion, adding 
only some of the relevant terms to the query. For all collections, again except the NPL, weighting 
document terms gives a considerable improvement in feedback, as does full expansion by all terms in 
the relevant set
20
. Queries should be expanded by those terms that appear with the highest frequency in 
the relevant documents rather than those with the highest feedback weight.  
Rocchio s original formula and the Ide dec hi variant perform the joint function of modifying query 
terms and query term weights. These and the other vector space RF techniques use the original 
                                                           
19
The NPL collection differed in a number of ways from the other collections investigated. It had much shorter 
query and document vectors, and lower term frequency. For this collection, although the same relative ordering 
was found between algorithms, binary document weighting was better than weighting document terms. This may 
result in the vector space length normalisation procedure being ineffective for this collection. 
20
Although full expansion is preferable, partial expansion also gives good results and can be used to reduce 
storage. In larger collections than the ones tested here partial expansion may actually perform better than full 
expansion. 
 21 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting