p
i
(1- q
i
)
w
x
= log
                     
i
q
i
(1 - p
i
)
Equation 11: 
Term weighting function based on term s distribution  
in relevant and non relevant documents 
where 
w
= the weight of term 
x
, 
p
( x |rel)   and  q
x i
i
i =  P
q
i
i =  P
q
( x
i
|rel)    
The function in Equation 11 was examined as a basis for ranking terms for query expansion. Robertson, 
[Rob90], argued that a weighting function that ranks terms for matching (as in Equation 10) may not be 
appropriate for term selection
9
. That is, the degree to which a term indicates relevant material 
(matching) is not necessarily related to how well a term will improve retrieval effectiveness if added to 
a query (term selection). For term selection, Robertson proposed the formula in Equation 12, which 
provides a better estimate for how much a term will increase a search's effectiveness. Terms should be 
chosen for expansion based on the value shown in Equation 12 rather than the w value from Equation 
11. Equation 12 incorporates the w value of a term but also takes into account the different between the 
relevant and non relevant distributions based on i. 
a
i
= w
i
(
p
i
- q
i
)
Equation 12: Formula for ranking expansion terms based on term t s distribution  
in relevant and non relevant documents 
where ai = the value of term i for query expansion, wi = weight of term i given by Equation 11, pt = 
P
q
( x
i
| rel)
 and qi = 
P
q
( x
i
| rel)
The formula in Equation 12, with the appropriate substitutions for pi and qi becomes the term ranking 
function in Equation 13. This allows the calculation of Equation 12 based on the distribution of terms 
within the relevant documents and the collection. It should be made clear here that, although at each 
iteration of RF the same calculations are taking place (the weighting functions are identical even if that 
values are not), theoretically different probabilities are being calculated at each iteration: the 
distribution that calculates 
P (rel | x)
P (rel | x
q
 and 
)
q
are different at each iteration [VR86]. 
r
(
)
   r 
n
    
w
i
R - r
i
i
i
- r
i
i
= log
       -
    
n
(
i
- r
i
) (
N - n
i
- R + r
i
)
    R N - R    
Equation 13: Term expansion ranking function 
where  ri = the number of relevant documents containing term i 
ni =  the number of documents containing term i 
R = the number of relevant documents for query q 
N = the number of documents in the collection 
The F4  reweighting function calculates weights for terms based on their distribution in the relevant and 
non relevant documents. The probabilistic model is then a retrieval model that is specifically designed 
for RF. At the start of a search, of course, there is no relevance information to estimate the probabilities 
in Equation 10. One standard solution to this problem is to use a weighting function that does not 
depend on relevance information, such as idf. After an initial ranking of documents and relevant 
information has been obtained, a function such as F4  can be used to provide improved term weights. 
The use of idf comes from substitution of appropriate values for r, R, and n into the F4 weight in Figure 
6. 
It is possible to treat the query as an additional, and relevant, document and use the F4 weight, however 
this will turn into something very like an idf weight [RWH+93]. An alternative to this was proposed by 
                                                           
9
 In [Rob86] Robertson also discussed the appropriateness of the 0.5 addition to the entries in the F4 calculation, 
arguing that better estimations are more suitable for selecting new query terms.  
 12 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting