After Bayesian inversion and deletion of  P( x)  (which is identical for both the relevance and non 
relevance case), the odds function from Equation 7 turns into Equation 9
a
. 
The probability of relevance,  P
q
(rel) , and the probability of non relevance,  P
q
(rel) , are identical for 
all x's, That is when we use the odds in Equation 7 to rank documents, the ranking is dependent on the 
values of the probabilities  P
q
( x | rel)  and  P
q
( x | rel) , not on the values  P
q
(rel)  and  P
q
(rel) . We 
can therefore eliminate these elements and arrive at the odds in Equation 9
b
. This is then the odds of 
observing x given relevance or non relevance. 
P
q
( x | rel)P
q
(rel)
P
q
( x | rel)
P
q
( x | rel)P
q
(rel)
P
q
( x | rel)
a 
      b 
Equation 9: 
Odds of relevance, or non relevance, having observed document x 
The odds in Equation 9 refer to the probability of relevance, and non relevance, after viewing the actual 
document text rather than the vector representation of the document. That is, it measures the odds of 
relevance to non relevance based on the content of the document and is independent of the document 
representation. This means that the model can be used for many different types of document indexing 
but it also means that Equation 9 must be ultimately be expressed as a retrieval function based on the 
specific document indexing technique used to represent the documents.  
There are many probabilistic models based on the model outlined so far in this section. In the remainder 
of this section we shall describe the transformation from Equation 9 to a function based on the term 
based representation outlined in section 2.1. Specifically the discussion will be based on the 
probabilistic model known as the Binary Independence Model, as this is the most traditional variant of 
the overall probabilistic approach. This model was one of the first probabilistic models of IR, and will 
be used as an example of how the theoretical model is transformed into an actual retrieval model. 
Before converting Equation 9 into an equation that can be estimated based on the probability of 
relevance and non relevance of the terms in document x, it is necessary to consider how the 
probabilities of relevance and non relevance interact. In particular, two aspects of retrieval are 
important: the independence of terms and what information is used to order documents. 
The probabilistic model assumes that terms are distributed independently of other terms, that is the 
probability of seeing term t in a document is not affected by seeing term s in the same document. This is 
a simplifying assumption that reduces the computational complexity of the model. However it is 
necessary to define over what sets the independence holds. Two versions of the independence 
assumption were proposed in [RSJ76]. Both term independence assumptions assume that terms, query 
terms in particular, are distributed independently in the set of relevant documents: the probability of a 
term appearing in the relevant documents is not dependent on the probabilities of other terms appearing 
in the relevant documents. The two assumptions differ in whether the relevant document set should be 
distinguished from the whole document collection or only from the set of non relevant documents. 
 Independence assumption I1: The distribution of terms in relevant documents is independent and their 
distribution in all documents is independent  
 Independence assumption I2: The distribution of terms in relevant documents is independent and their 
distribution in irrelevant
7
 documents is independent  
These two versions of the independence assumption are important in distinguishing whether we should 
measure the difference in the probability of a term's occurrence against the non relevant documents (I2) 
or against its probability of occurrence the collection as a whole (I1). 
The probabilistic model ranks documents according to their probability of being relevant to a query   
the  ordering principle. Two versions of this principle distinguish between the case where this 
                                                           
7
 The labels irrelevant and non relevant are treated as synonymous in this paper. 
 9 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting