theft of bicycles in
the netherlands
theft in the
theft of bicy
cles
netherlands
bicycles
theft
netherlands
Figure 14: Hyper index
Each node in the hyper index corresponds to a potential query and is associated with a set of
documents. The user can browse the hyper index to select a query formulation for a search, and can
move between documents and index descriptions at will. The nodes correspond to document
descriptors: the more descriptors of a document that have been visited by a user, the more likely a
document is to be relevant to the user. The more important a document descriptor is then the more it
counts towards document retrieval and document ranking. The concept of descriptor importance is
analogous to term weighting in the traditional document retrieval models presented in section 2.
Relevance information is used to alter the importance of the document descriptors. In particular recency
information is used to increase the importance of recently visited descriptors and lower the importance
of descriptors visited earlier in the search.
Dynamic information needs also present a new problem for evaluation. If we assume a changing
information need we can no longer rely on existing test collection methods as they also rely on the
notion of a fixed information need. The assessment of recall in an interactive situation is especially
problematic, as the desired set of relevant documents
22
will change from one search iteration to
another. One further problem of RF evaluation in this context is what to measure: the quality of the
feedback (how well does the system improve the user's query) or the quality of the adaptation to the
information need (how well does the algorithm track how the query is changing)? These are not
necessarily the same entity: potentially a RF algorithm could be good at describing the known relevant
documents but poor at detecting how the user's relevance assessments are changing.
3.3 Negative RF
The majority of RF techniques are based on capturing the difference between the content of the relevant
documents, those documents that the user has marked as containing relevant information, and the
content of the non relevant documents. The label non relevant , however, if often used to refer to two
groups of documents:
i. those that have been explicitly marked non relevant by the user. In small test collections we can
assume that the documents that have not been explicitly marked relevant by an assessor have
nevertheless been assessed and judged non relevant. In larger collections, such as those provided by the
TREC initiative [Har93], a small set of documents is explicitly marked non relevant, meaning they have
been assessed as not containing relevant information.
ii. those that have not been assessed by the user. These documents may not have been retrieved or the
user did not assess the documents, or the user implicitly rejected the document but did not provide an
explicit relevance assessment. It is common to assume, for any query, that these documents will
comprise the majority of the documents in the collection. The probabilistic model and vector space
model both make use of this assumption, in that they do not differentiate between assessed non relevant
documents and unassessed documents. However some of these documents, if they had been viewed by
a user, might have been assessed as relevant.
22
That is the set of documents that the user would regard as being relevant if shown them at the current iteration,
not the set of relevant documents used for feedback.
25
<
New Page 1
UK Web Hosting