Croft and Harper [CH79] based on the formula in Equation 8. This approach ranks documents by a 
function such as idf, assumes the top n documents are relevant, then uses these so called pseudo 
relevance assessments to estimate values for pi and qi in Equation 11. This will be discussed more fully 
in section 3.5. 
This fundamental approach to probabilistic modelling has been extended in many ways, in particular to 
incorporate within document frequency information [RW94]. Pertinent additions or modifications will 
be described, where appropriate, in later sections of this paper. An historical overview of the 
probabilistic model can be found in [SSJ+00a, SSJ+00b]. 
2.2.4 Logical model 
In [Mar64], Maron hinted a potentially useful difference between the Boolean logic exact match 
process and the process of logical implication. This difference distinguishes between the Boolean 
matching of text representations, in which the system is restricted to an exact formula, and the inference 
of information needs, by which process the system can infer more about what may be relevant than is 
stated in the query. 
The advantages of implication or inference as the basis for a retrieval algorithm are demonstrated in the 
logical modelling approach to retrieval. This class of models originates from a proposal by Van 
Rijsbergen [VR86] that relevance can be modelled as a process of uncertain inference. More precisely 
the relevance of a document representation can be measured by the probability that the information in a 
document infers the information in a query
10
, Equation 14. 
P d
(
  q
)
  
Equation 14: Relevance measured as uncertain inference 
This view was encapsulated in the logical uncertainty principle, [VR86]: 
"Given any two sentences x and y; a measure of the uncertainty of y 
x related to a given 
data set is determined by the minimal extent to which we have to add information to the 
data set, to establish the truth of y 
 x." 
That is if the information in a document, d, does not infer the information in a query q how much would 
d have to be changed to be relevant to q? The degree of necessary change to d allows the calculation of 
the probability of the inference.  
As a simple example, if the query is about animals and a document mentions dogs, ponies, cats, but 
does not explicitly mention animals, then the document would not be retrieved by standard term 
matching retrieval algorithms. By including information that dogs,  ponies, and cats are kinds of 
animals, then it can be asserted that the document may be relevant and should be retrieved. Such an 
approach was taken by Lalmas, [Lal96], who used ontological relationships to express how many 
transformations or substitutions of this type would be necessary before a document s content inferred a 
query. In Lalmas's model, the number of substitutions gave a measure of the uncertainty associated with 
the inference. 
The core logical models are based on non classical logics as the classical notion of inference has 
several undesirable properties for retrieval, e.g. in classical logic the inference, 
d   q
, would hold if d 
did not contain any information, and the majority of logical models of IR are based on a possible 
worlds semantics, in which each possible world represents a possible combination of events. One 
possible representation is one in which a possible world represents a possible combination of terms. For 
example, given a set of indexing terms {t1, t2, t3, ..., t10}, there would be 210 worlds: a world in which 
all terms are true, one in which all terms except t1is true, one in which all terms except t1and t2 are true, 
and so on. In this representation each document and the query is associated with a world. The similarity 
of a document to the query is given by the distance between the document world and the query world
11
. 
                                                           
10
This is the most common version of the principle. Some authors have tried modelling the inverse; the degree to 
which the information in the query infers the information in the document  P
(
q   d
)
, or a combination of both 
measures, e.g. [Nie89] 
11
 This assumes the Closed World Assumption, i.e. any fact not known to be true is assumed false. 
 13 
<





New Page 1








Home : About Us : Network : Services : Support : FAQ : Control Panel : Order Online : Sitemap : Contact : Terms Of Service

 

Our web partners:  Jsp Web Hosting  Unlimited Web Hosting  Cheapest Web Hosting  Java Web Hosting  Web Templates  Best Web Templates  Web Design Templates  Interland Web Hosting  Cheap Web Hosting  Filemaker Web Hosting  Tomcat Web Hosting  Quality Web Hosting  Best Web Hosting  Mac Web Hosting

 
 

Virtualwebstudio. Business web hosting division of Vision Web Hosting Inc. All rights reserved

UK Web Hosting