2.2.3 Probabilistic model 
In the probabilistic model, suggested by Maron and Kuhns [MK60], and developed by amongst others, 
Robertson and Sparck Jones [RSJ76], and Van Rijsbergen [VR79], documents and queries are also 
viewed as vectors but the vector space similarity measure is replaced by a probabilistic matching 
function. The probabilistic model is based on estimating the probability that a document will be 
relevant to a user, given a particular query. The higher this estimated probability, the more likely the 
document is to be relevant to the user
4
. This is instantiated in the probabilistic ranking principle, 
[Rob77]. 
 If a reference retrieval system s response to each request is a ranking of the 
documents in the collection in order of decreasing probability of relevance to the 
user who submitted the request, where the probabilities are estimated as 
accurately as possible on the basis of whatever data have been made available to 
the system for this purpose, the overall effectiveness of the system to its user will 
be the best that is obtainable on the basis of those data.  
The estimated probability of relevance can be expressed as 
P
q
(rel | x)
, the probability of relevance 
given a document x and a query q. This probability can be used to decide whether or not to retrieve a 
document: if 
P
q
(rel | x)
 = 0 then the probability of relevance given x is 0, and x should not be 
retrieved
5
.  
This can be refined by also considering the probability of non relevance given x and q, 
P
q
(rel | x)
. If 
P
q
(rel | x)
 > 
P
q
(rel | x)
 then it can be asserted that the probability of relevance is greater than the 
probability of non relevance and hence x should be retrieved
6
. Thresholds may also be used, i.e. the 
difference between the probability of relevance and the probability of non relevance must be greater 
than some threshold value before x is retrieved, ((
P
q
(rel | x)
   
P
q
(rel | x)
) > threshold). In this case 
threshold is a value set by the user or system, in order to further restrict the retrieval function.  
Having decided which documents to retrieve, the odds of relevance to non relevance, Equation 7, can 
be used as a document ranking function: the higher the ratio of the probability of relevance to non 
relevance, given x, then the more likely document x is to be relevant to a user. 
P
q
(rel | x)
P
q
(rel | x)
Equation 7: Odds of relevance to non relevance for document x and query q 
Bayes, [Bay63], theorem can be used to calculate 
P
q
(rel | x)
 and 
P
q
(rel | x)
. Equation 8 demonstrates 
this for the relevance case. 
P
P
q
(x | rel)P
q
(rel)
q
(rel | x) =
P(x)
Equation 8: Calculation of 
P
q
(rel | x)
 through Bayesian inversion 
where  
P
q
(rel)
 is the prior probability that any document in the collection is relevant to q 
P
q
( x | rel)
 is the probability of observing document x given relevance information 
P( x)
 is the probability of observing document x irrespective of relevance 
                                                           
4
The probabilistic model measures the probability of relevance, i.e. the probability that a document will be 
relevant, not the degree of relevance as is sometimes suggested. A good discussion of the difference between these 
two notions is found in [RB78]. 
5
In an operational system 
P
q
(rel| x)
will generally only equal 0 if x does not contain any query terms. This rule 
then decides only to retrieve those documents that contain at least one query term. 
6
In the case where the two probabilities are equal, it is arbitrarily decided that x is non relevant [VR79]. 
 8 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting