Free Data Structure & Algorithms Ebooks & Reviews

Information Retrieval

By C. J. van Rijsbergen

Introduction
Information retrieval is a wide, often loosely-defined term but in these pages I shall be concerned only with automatic information retrieval systems. Automatic as opposed to manual and information as opposed to data or fact. Unfortunately the word information can be very misleading. In the context of information retrieval (IR), information, in the technical meaning given in Shannon's theory of communication, is not readily measured (Shannon and Weaver). In fact, in many cases one can adequately describe the kind of retrieval by simply substituting 'document' for 'information'. Nevertheless, 'information retrieval' has become accepted as a description of the kind of work published by Cleverdon, Salton, Sparck Jones, Lancaster and others. A perfectly straightforward definition along these lines is given by Lancaster: 'Information retrieval is the term conventionally, though somewhat inaccurately, applied to the type of activity discussed in this volume. An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and whereabouts of documents relating to his request.' This specifically excludes Question-Answering systems as typified by Winograd and those described by Minsky]. It also excludes data retrieval systems such as used by, say, the stock exchange for on-line quotations....

The structure of the book

The introduction presents some basic background material, demarcates the subject and discusses loosely some of the problems in IR. The chapters that follow cover topics in the order in which I would think about them were I about to design an experimental IR system. They begin by describing the generation of machine representations for the information, and then move on to an explanation of the logical structures that may be arrived at by clustering. There are numerous methods for representing these structures in the computer, or in other words, there is a choice of file structures to represent the logical structure, so these are outlined next. Once the information has been stored in this way we are able to search it, hence a discussion of search strategies follows. The chapter on probabilistic retrieval is an attempt to create a formal model for certain kinds of search strategies. Lastly, in an experimental situation all of the above will have been futile unless the results of retrieval can be evaluated. Therefore a large chapter is devoted to ways of measuring the effectiveness of retrieval. In the final chapter I have indulged in a little speculation about the possibilities for IR in the next decade.

The two major chapters are those dealing with automatic classification and evaluation. I have tried to write them in such a way that each can be read independently of the rest of the book (although I do not recommend this for the non-specialist).

Outline

Chapter 2: Automatic Text Analysis - contains a straightforward discussion of how the text of a document is represented inside a computer. This is a superficial chapter but I think it is adequate in the context of this book.
Chapter 3: Automatic Classification - looks at automatic classification methods in general and then takes a deeper look at the use of these methods in information retrieval.
Chapter 4: File Structures - here we try and discuss file structures from the point of view of someone primarily interested in information retrieval.
Chapter 5: Search Strategies - gives an account of some search strategies when applied to document collections structured in different ways. It also discusses the use of feedback.
Chapter 6: Probabilistic Retrieval - describes a formal model for enhancing retrieval effectiveness by using sample information about the frequency of occurrence and co-occurrence of index terms in the relevant and non-relevant documents.
Chapter 7: Evaluation - here I give a traditional view of the measurement of effectiveness followed by an explanation of some of the more promising attempts at improving the art. I also attempt to provide foundations for a theory of evaluation.
Chapter 8: The Future - contains some speculation about the future of IR and tries to pinpoint some areas of research where further work is desperately needed.

Computational Modeling and Complexity Science

By Allen Downey

This book is about data structures and algorithms, intermediate programming in Python, complexity science and the philosophy of science: Data structures and algorithms: A data structure is a collection that contains data elements organized in a way that supports particular operations. For example, a dictionary organizes key-value pairs in a way that provides fast mapping from keys to values, but mapping from values to keys is generally slower. An algorithm is an mechanical process for performing a computation. Designing efficient programs often involves the co-evolution of data structures and the algorithms that use them. For example, the first few chapters are about graphs, a data structure (nested dictionaries) that is a good implementation of a graph, and several graph algorithms that use this data structure.

Python programming: This book picks up where Think Python leaves off. I assume that you have read that book or have equivalent knowledge of Python. As always, I will try to emphasize fundmental ideas that apply to programming in many languages, but along the way you will learn some useful features that are specific to Python. Computational modeling: A model is a simplified description of a system that is useful for simulation or analysis. Computational models are designed to take advantage of cheap, fast computation.

Philosophy of science: The models and results I will present raise a number of questions relevant to the philosophy of science, including the nature of scientific laws, theory choice, realism and instrumentalism, holism and reductionism, and Bayesian epistemology.

There are two kinds of computational models:
Continuous: Many computational models compute discrete approximations of equations that are continuous in space and time. For example, to compute the trajectory of a planet, you could describe planetary motion using differential equations and then compute a numerical approximation of the position of the planet at discrete points in time. The fields of numerical methods and scientific computing tend to focus on these kinds of models.

Discrete: Discrete models include graphs, cellular automata, and agent-based models. They are often characterized by structure, rules and transitions rather than by equations. They tend to be more abstract than continuous models; in some cases there is no direct correspondence between the model and a physical system. Complexity science is an interdiscipinary field—at the intersection of mathematics, computer science and physics—that focuses on these kinds of models.

And that’s what this book is about.

Global Optimization Algorithms – Theory and Application

By Thomas Weise

This e-book is devoted to global optimization algorithms, which are methods to find optimal solutions for given problems. It especially focuses on Evolutionary Computation by discussing evolutionary algorithms, genetic algorithms, Genetic Programming, Learning Classifier Systems, Evolution Strategy, Differential Evolution, Particle Swarm Optimization, and Ant Colony Optimization. It also elaborates on other metaheuristics like Simulated Annealing, Extremal Optimization, Tabu Search, and Random Optimization. The book is no book in the conventional sense: Because of frequent updates and changes, it is not really intended for sequential reading but more as some sort of material collection, encyclopedia, or reference work where you can look up stuff, find the correct context, and are provided with fundamentals. With this book, two major audience groups are addressed:

It can help students since we try to describe the algorithms in an understandable, consistent way and, maybe even more important, includes much of the background knowledge needed to understand them. Thus, you can find summaries on stochastic theory and theoretical computer science in Part IV on page 455. Additionally, application examples are provided which give an idea how problems can be tackled with the different techniques and what results can be expected.
Fellow researchers and PhD students may find the application examples helpful too. For them, in-depth discussions on the single methodologies are included that are supported with a large set of useful literature references.

If this book contains something you want to cite or reference in your work, please use the citation suggestion provided in Chapter D on page 591.

Topics

Evolutionary Algorithms
Genetic Algorithms
Genetic Programming
Learning Classifier Systems
Hill Climbing
Simulated Annealing
Example Applications
Sigoa – Implementation in Java
Background (Mathematics, Computer Science)

Information Retrieval

Computational Modeling and Complexity Science

Global Optimization Algorithms – Theory and Application

Followers