Okapi-Pack
Centre For Interactive Systems Research
City University London EC1V 0BH
|
"Okapi-Pack" Introduction.
|
Okapi-Pack is a complete implementation of the Okapi system. It
is available from the Centre For Interactive Systems Research (CISR) from
under the BSD license .
The distributed system requires around 100 Mbytes of disc . There are two versions of
Okapi-pack: for Solaris and Linux. A version for Solaris
is designed to run on a Sun Sparc station with a minimum of
16 MBytes of memory running Solaris 2.5/2.6.
A version for Linux runs on Red Hat Linux 6.0/6.1.
The graphical user interfaces provided are written in a combination of
C/C++ and Tcl/Tk . All binaries were compiled with
gcc V2.7 . The GUIs have been tested with Tcl-7.4 / Tk-4.0
and Tcl-7.6 / Tk-4.2 .
The package comprises:
1. Indexing Software.
Software to enable users to create and index Okapi type
databases. Included is a graphical user interface,
indexer, to provide a basic introduction to the
process. indexer, allows the creation and indexing of both text and
abstracting and indexing (ai) databases. Although the interface will
only deal with databases that can be accommodated in one disc volume,
the programs called by the application are capable of creating and
indexing larger databases that may extend over several volumes. These
programs are documented in
Appendix E and
Appendix F.
There are two sample databases, both just over 1000 records in size,
provided with the system:
- med.sample : a small text database generated from the
Medlars collection.
- cacm.sample : a small ai database generated from the CACM
collection.
The sample databases were both downloaded from Cornell University (ftp
to ftp.cs.cornell.edu
and move into directory pub/smart). We are trying to obtain a
current database that is more of the size of parts of the TREC
collection.
For text databases made up of larger records, it is possible to generate
positional information for paragraphs so that a passage search may be
implemented. This means that it is possible to conduct an Okapi search
such that the system will attempt to find, for each document in the
ranked hitlist, the "best" sub-passage within the document.
2. The Basic Search System (BSS).
The BSS consists of a set of low-level commands, implemented as a C
library, that enables users to build their own interfaces based around
it. The BSS commands are documented in
Appendix J.
Corresponding to i0+ is an executable i1+ which may be used both as a
command line interface and for trying things out or in shell scripts.
3. The Okapi Interactive Interface.
okapi is a
configurable interface that calls BSS commands. It allows users to
conduct relevance feedback searches of both text and ai databases. The
system allows users to:
- Build an initial query by entering both single terms and/or
phrases.
- Conduct a search on a given query formulation.
- View full documents and make relevance judgements.
- Incrementally expand the query as relevance judgements are made.
- Modify the current state of the query by adding/removing terms
and clearing relevance feedback information.
- Change some interface parameters interactively.
Last modified: 12th November 2001