Consider the example below, Figure 6, containing two documents indexed by a number of terms drawn
from the set of indexing terms {t1, t2, t3, t4, t5}. d1 is indexed by the conjunction of terms t1 and t2, d2
is indexed by the conjunction of terms t
12
1, t2 and t3, and a query, q, indexed by t1and t5.
d1 = <1, 1, 0, 0, 0>
d2 = <1, 1, 1, 0, 0>
q = <1, 0, 0, 0, 1>
Figure 6: Possible worlds representation of d
1
, d
2
and q
A simple retrieval model can be defined by asserting that all worlds (documents) have a distance of 1
from a query, q, if the intersection between the world and q is non empty and the distance is 0 if the
intersection is empty. This model would retrieve both d1 and d2 for q and corresponds to a Boolean
disjunction of query terms. A Boolean conjunction of terms would be modelled by requiring the
intersection of a world w and q to be identical to q.
Replacing the 1 and 0 in Figure 6 by term weights, such as idf or tf, gives the representation used by the
vector space and probabilistic models described previously. The distance between the query and
document worlds is given by the similarity or probability functions described before. Thus the logical
model can be used to encapsulate the three retrieval models outlined previously, see [Hui96].
As in the example above, the principle of transforming documents and queries can be used to
incorporate semantic information into the retrieval process. For example, consider a query t2, and
information that t2 is a synonym of t3 (from a thesaurus or dictionary). We can then assert that both d1
and d2 should both be retrieved, but that d2 should be retrieved first as it undergoes fewer
transformations than d1 to be relevant. We can also use representations based on different
transformation principles, definitions of similarities, or definitions of possible worlds to give different
retrieval models. [LaBr98] give a more detailed introduction to logical modelling of IR.
These models have the potential to be the very powerful models in IR as they attempt to model the
semantics of information and can incorporate, within a single framework, retrieval tools such as
thesauri. In addition, they also allow for multiple relations to hold they can be used to specify which
relations cause relevance (see [VR86]). The formal nature of logical models mean that they also allow
for formal comparisons between IR systems, e.g. [Hui96]. Crestani et al, [CLVR98], give an overview
of current models and approaches in logic based information retrieval.
RF has, so far, not been a major concern of existing logical models but it is possible to imagine several
approaches to the problem. We shall describe these based on the following example of a concept based
on an example given in [Seb94] which describes the class of documents which appeared in the
proceedings of SIGIR93, whose author is a member of the institution IEI CNR and which deal with
logic, Figure 7.
(and paper
(func appears in (sing SIGIR93))
(all author (func affiliation (sing IEI CNR)))
(c some deals with logic))
Figure 7: Terminological representation of a concept
Bold type indicates features of the representation language.
i. content modification. This approach is the most similar to that taken by the statistical RF models
described previously. Here, the content of query is modified, e.g. by adding or deleting terms, or
perhaps by altering connectives. For example in the above example we could refine the query to
retrieve only those papers that deal with modal_logic. This would retrieve only concepts that
specifically mentioned modal_logic, Figure 8, rather than the more general concept logic.
12
Where 1 signifies that the proposition term t indexes the document is true, 0 signifies that the proposition is
false.
14
<
New Page 1
UK Web Hosting