P
q
(x
i
| rel)
r
(
R
)
 w
x
= log
= log
i
P
q
( x
i
)
n
(
N
)
  
F1 
P
r
(
R
)
w
q
(x
i
| rel)P
q
(rel)
x
= log
= log
i
P
q
(x
i
| rel)P
q
(rel)
(
n
(
- r
) (
N - R
)
)
  
F2 
P
r R
(
- r
)
w
q
(x
i
| rel) / P
q
(x
i
| rel))
x
= log
= log
i
P( x
i
) /(P(x
i
)
n
(
N - n
)
  
F3 
P
r R
(
- r
)
w
q
(x
i
| rel) / P
q
(x
i
| rel)
x
= log
= log
i
P
q
(x
i
| rel) / P
q
(x
i
| rel)
n
(
- r
) (
N - n - R + r
)
  
F4
  
Figure 5: Term weighting functions F
1 
  F
4 
In [RSJ76], Robertson and Sparck Jones used the four term weighting schemes to carry out two sets of 
experiments. The first set was based on retrospective weighting. This involves deriving optimal weights 
to retrieve the relevant documents already found   the known relevant set. The second group of 
experiments were based on predictive weighting. Predictive weighting uses the weights from the 
retrospective stage to retrieve new documents. If the known relevant set is a representative sample of all 
relevant documents, then predictive weighting should be better at retrieving unseen relevant documents 
than the original term weights. Naturally, it is the latter, predictive, case that is mainly of interest as RF 
is intended to retrieve relevant documents that the user has not yet seen. 
All functions outperformed no relevance weighting, and the idf function. F
1
 and F
2
, and F
3 
and F
4 
perform within the same range with F
3
 and F
4 
outperforming F
1 
and F
2, 
and F
4 
slightly outperforming 
F
3
. This confirms Robertson and Sparck Jones  intuition that ordering principles O2 is correct and that 
it is necessary to consider both presence and absence of query terms. No conclusive evidence was 
provided to distinguish between the two versions of the independence assumption, however Robertson 
and Sparck Jones favoured the second, I2, assumption as the more realistic assumption. 
Given that the preferred weighting scheme is F
4
, the odds function in Figure 6 (Equation 10
a
) can be 
converted to that of Equation 10
b
 by eliminating the division operators. By noting that  P
q
( x
i
| rel)  = 1 
   P
q
( x
i
| rel) , and  P
q
( x
i
| rel)  = 1    P
q
( x
i
| rel)  it is possible to convert the representation of F
4
 in 
Figure 6 to that in Equation 10
c
.  
P
P
P
w
q
(x
i
| rel) / P
q
(x
i
| rel)
q
( x
i
| rel)P
q
(x
i
| rel)
q
(x
i
| rel)(1- P
q
( x
i
| rel))
x
= log
= log
= log
i
P
q
(x
i
| rel) / P
q
(x
i
| rel)
P
q
( x
i
| rel)P
q
(x
i
| rel)
P
q
(x
i
| rel)(1- P
q
( x
i
| rel))
a 
b 
  
c 
Equation 10: 
Term weighting function based on term s distribution 
in relevant and non relevant documents 
where w
xi
= the weight of term  x
i
This equation (Equation 10
c
), which expresses the F
4
 function solely as a factor of the presence of a 
term in the relevant and non relevant documents, can alternatively be represented as in Equation 11. 
The probability of relevance of a document, then, is measured as the sum of the term weights of the 
query terms in the document, i.e. the sum of the F
4
 weights of each query term in the document. 
 11 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting