Galago

Galago is a toolkit for experimenting with text search. It is based on small, pluggable components that are easy to replace and change, both during indexing and during retrieval.

The Galago Search Engine was originally released as part of "Search Engine, Information Retrieval in Practice". To find the book-version of Galago please see http://www.search-engines-book.com/.

The Lemur project extends the Galago Search Engine for research purposes. As such, many components have been significantly modified to support further extensions and experimentation. Please note that the wiki-pages and support on this website, are not applicable to the book-version of Galago.

Features

It includes TupleFlow, which is a distributed computation framework like MapReduce or Dryad. TupleFlow manages the difficult parts of processing text: serializing data, sorting it, and distributing processing. The IndexReader and IndexWriter classes manage storing key/value pairs like inverted lists. This makes it possible to make your own kinds of index structures without starting from scratch. The retrieval system supports a variant of the Indri query language, but redesigned to be more flexible. You can add your own query operators without recompiling the core libraries; just put your new operator in the classpath and reference it in a query.

Download

Galago can be obtained from the SourceForge Lemur Project Page.

Release History

The first binary version (3.14159) of galago was released in Dec 2011. Subsequent releases are made twice per year in June and December. The current galago version is 3.9. Release notes for the current version can be found on SourceForge.

Related Links