Information retrieval is a wide, often loosely-defined term but in these pages I shall be concerned only with automatic information retrieval systems. Automatic as opposed to manual and information as opposed to data or fact. Unfortunately the word information can be very misleading. In the context of information retrieval (IR), information, in the technical meaning given in Shannon's theory of communication, is not readily measured (Shannon and Weaver). In fact, in many cases one can adequately describe the kind of retrieval by simply substituting 'document' for 'information'. Nevertheless, 'information retrieval' has become accepted as a description of the kind of work published by Cleverdon, Salton, Sparck Jones, Lancaster and others. A perfectly straightforward definition along these lines is given by Lancaster: 'Information retrieval is the term conventionally, though somewhat inaccurately, applied to the type of activity discussed in this volume. An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request.' This specifically excludes Question-Answering systems as typified by Winograd and those described by Minsky]. It also excludes data retrieval systems such as used by, say, the stock exchange for on-line quotations....
- Chapter 2: Automatic Text Analysis - contains a straightforward discussion of how the text of a document is represented inside a computer. This is a superficial chapter but I think it is adequate in the context of this book.
- Chapter 3: Automatic Classification - looks at automatic classification methods in general and then takes a deeper look at the use of these methods in information retrieval.
- Chapter 4: File Structures - here we try and discuss file structures from the point of view of someone primarily interested in information retrieval.
- Chapter 5: Search Strategies - gives an account of some search strategies when applied to document collections structured in different ways. It also discusses the use of feedback.
- Chapter 6: Probabilistic Retrieval - describes a formal model for enhancing retrieval effectiveness by using sample information about the frequency of occurrence and co-occurrence of index terms in the relevant and non-relevant documents.
- Chapter 7: Evaluation - here I give a traditional view of the measurement of effectiveness followed by an explanation of some of the more promising attempts at improving the art. I also attempt to provide foundations for a theory of evaluation.
- Chapter 8: The Future - contains some speculation about the future of IR and tries to pinpoint some areas of research where further work is desperately needed.