probability is estimated based only on the presence of query terms within a document or the presence 
and absence of terms. 
 Ordering principle  O1: That probable relevance is based on the presence of search terms in 
documents  
 Ordering principle  O2: That probable relevance is based both on the presence of search terms in 
documents and their absence from documents  
Four weighting schemes, F
1
 F
4
, can be derived from the combination of the two variants of the 
independence assumption and the ordering principle, Table 1.   
 Independence 
Independence 
assumption I1 
assumption I2 
Ordering principle O1 
F1 
F2 
Ordering principle O2 
F3 
F4 
Table 1: Term weighting functions derived from the combination of independence 
assumptions and ordering principles  
In [RSJ76] each of these possible strategies was instantiated to give an actual method for weighting a 
query term, summarised in Figure 5. The weighting methods themselves are based on a contingency 
table, Table 2, which converts the probability values into values that can be calculated from term 
occurrence information. 
rel  
rel  
x
i
= 1 
r n r n 
x
i
= 0  
R r N n R+r N n 
R N R   
Table 2: Contingency table to calculate term weights 
where  r = the number of relevant documents containing term x
i
n = the number of documents containing term x
i
R = the number of relevant documents for query q 
N = the number of documents in the collection 
Each of the four term weighting functions is a ratio of two proportions
8
: 
   
F
1
 is the ratio of the proportion of relevant documents in which the query term t occurs (ordering 
principle O1) to the proportion of all documents in which t occurs (independence assumption I1).  
   
F
2
 is the ratio of the proportion of relevant documents in which the query term t occurs (ordering 
principle O1)) to the proportion of all non relevant documents in which t occurs (independence 
assumption I2).  
F
3
 and F
4
 both use odds 
   
F
3
, the ratio of `relevance odds' (the ratio of relevant documents containing term t and relevant 
documents not containing t   ordering principle O2) and `collection odds' (the ratio of documents 
containing t and documents not containing t   independence assumption I1). 
  F
4
 is the ratio of  relevance odds    ordering principle O2 and `non relevance odds' (the ratio of 
non relevant documents containing t and the non relevant documents not containing  t   
independence assumption I2). 
                                                           
8
It may be the case, especially when using small samples, that some of the values in the weights could be zero, 
resulting in error when taking logs. The solution is to add 0.5 to each cell in the numerator and denominator of 
each function. 
 10 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting