reasons. First, as indicated by Belkin et al.'s experiments, the effects of negative feedback are not clear
to users. In a positive feedback situation, it is easier to see what kind of documents are being retrieved,
and infer the change(s) that have been made by the system. The potential harm that a negative
assessment may do to a search is not apparent because the user cannot see what documents have been
suppressed by the feedback action.
Second, it may be the case that assessing non relevance is a harder task than assessing relevance. That
is, in practice, relevance and non relevance are not opposite assessments. A user making a positive
relevance assessment can often give detailed reasons for why a document is relevant, e.g. [BS98], but
the reasons for non relevance are likely to be based on what is lacking from the document, rather than
what is present. The assessment of non relevance, therefore, often requires reasoning about what is not
contained within the document. An alternative to negative assessments, in this case, may be to use
partial relevance assessments, e.g. [BI99]; rather than asking users to make binary, relevant or non
relevant, assessments on a document, the system allows the user to make a scalar or non binary
assessment of the document's relevance. We shall return to this point in section 7.3.
iii. Usability. The mechanisms for making relevance assessments are important. We shall discuss
this in more detail in section 7.3.3 but a general point is that, even though RF techniques can improve a
search, it is not always the case that users will make relevance assessments. Partly this may be due to a
lack of awareness, on the part of the user, as to the function of RF; it may also derive from a fear of
having an unknown effect on the search. The usability of making assessments can have an effect on how
likely the users are to make assessments. It may be the case, for example, that the more complex the
relevance assessment is, as a task, then the less likely users are to make more assessments. Similarly, if
the process of making relevance assessments (operating the system) detracts from gaining relevant
information (the task of using the system) then, again, the users may be less willing to explicitly assess
documents. Asking users to spend time marking documents that are not relevant to their search may be
difficult to achieve practically.
Dunlop [Dun97] presents a more specific argument against negative RF: namely that negative feedback,
as implemented in the major models, is not only inconsistent across models but is often not performing
the correct task. His paper is based on an intuitive view of what positive and negative RF should do.
Namely, positive RF on a document at the top of the document ranking, or negative RF at the bottom of
a ranking should have little effect on the query, as they both confirm the retrieval decision. In contrast,
positive feedback on a document at the bottom of the ranking, or negative feedback on a document at
the top of a ranking should have a greater effect on the query, as these feedback cases contradict the
retrieval decision made by the system.
Dunlop compared this intuitive view against three models vector space (using a modified Rocchio
formula), probabilistic (F4) and a query expansion technique (one in which negative RF reduces term
weights). The data he used was not identical in all cases, so the results are not strictly comparable,
however the general trends are important. He found that, in general, all models behave as expected for
positive RF: the effect on the query is inversely proportional to how well the document matches the
query. However for negative RF the systems differ. For the vector space implementation, the effect of
negative RF on a poorly matching document is greater that on a highly matching document, although
certain scaling factors can reduce this problem somewhat. The normalisation by document length in the
vector space model also means that the effect of negative RF is not reversible in this model.
The probabilistic model also does not behave intuitively for negative feedback, primarily because the
F4 scheme does not differentiate between documents that have been explicitly marked non relevant and
those that have simply not been marked relevant.
The third model query expansion will behave in line with Dunlop's intuitive view, if the system
allows negative weights to be attached to terms, unlike most systems which will remove a term if its
weight falls to zero or becomes negative.
Dunlop's investigation demonstrates the difficulty of incorporating negative assessments into RF. The
further difficulty of incorrect contexts, identified by Belkin et al, remains a problem for positive and
negative feedback. It maybe the case that keyword based algorithms that we have examined so far
require more complex mechanisms to make fine grained analysis of keyword contexts for feedback.
27
<
New Page 1
UK Web Hosting