Okapi-Pack

Centre For Interactive Systems Research
City University
London EC1V 0BH


Appendix M: "okapi" Log Files.

All user interaction and system responses are logged in detail by "okapi" for potential analysis by researchers. The information logged for each search is stored in three files:

  1. history
A complete, chronological history of the search. This includes all user-entered and RF terms at each iteration.
  1. termset
The complete set of user-entered and RF terms at the last iteration.
  1. relsfile
The complete set of relevance judgements made by
the searcher.

The last two files are used by the application while running; the information contained in these two files is also contained in the history file. For this reason, only the history file will be documented here.

1. The "okapi" history file.

During the course of a search the system keeps a complete, chronological record of all commands executed by the system (either explicitly by the user or implicitly by the system) and, where applicable, the results returned by them. Data written to the history file is structured so that it can be entered into an Oracle database after minimal conversion into suitable batch files. The relational data model for the transaction logs is described in detail in the Journal of Documentation, Okapi Special Edition.

The commands and results returned by them are described in the following section.

2. Okapi Commands and Results.

The Okapi Commands and Results table shows the entries that are kept in the history file, distinguishing between commands and results in the Type column. The links in the following are to the relevant section of "APPENDIX G, The Graphical User Interface to the BSS".


History File Entry Types.
Name Type Action
open_database Implicit command Called implicitly when "okapi" is run.
The system is initialised.
define Explicit command Modifies the TERMSET.
Modifies the WORKING QUERY.
query Result Records information about all terms in the current TERMSET.
search Explicit command Creates a new DOCUMENT SET.
Generates a new HITLIST.
Modifies the set of USER_RELS
docset Result Records information about the current DOCUMENT SET.
hl_title Result Header information displayed about each HITLIST entry.
hl_info Result Approximately the first 200 characters (corresponding to a "title") from the start of the document. One, two or three entries are made per document, depending on the length of the "title".
hl_terms Result Query term occurrence information for each document.
show Explicit command Forces user to make a relevance judgement.
It calls expand if a positive judgement is made.
expand Implicit command Modifies the current TERMSET by merging it with the terms extracted from the appropriate section of the last document judged as relevant by the user.
Modifies the WORKING QUERY (possibly)
remove Explicit command Modifies TERMSET.
Modifies the WORKING QUERY.
restore Explicit command Modifies TERMSET.
Modifies the WORKING QUERY (possibly).
clear_rf Explicit command Modifies TERMSET.
Modifies the WORKING QUERY (possibly).
Modifies the set of USER_RELS.
clear_working_query Explicit command Modifies TERMSET.
Modifies the WORKING QUERY (possibly).
Modifies the set of USER_RELS.
adjust_rf Explicit command Modifies the WORKING QUERY.
quit Explicit command Closes the database.
Closes all open files.
Deletes all temporary files.


3. The Structure of History File Entries.


Entries are made to the history file in chronological order so that a complete processing history is kept. Successive fields in each history file entry are delimited by colons (:). The first three fields of all entries in the history file are always:

    <command_no>:<topic_no>:<elapsed_time>

Where:

command_no A sequential number allocated to each command issued. Results that are generated by a given command will have the same command_no as the command entry. e.g. A "search" command generates a new document set and a new hitlist. Thus the "docset", "hl_title", "hl_info" and "hl_terms" entries will have the same command_no as the "search" entry.

"open_database" will always be command_no 0.

topic_no A number assigned to the search.
elapsed_time The time in seconds from the beginning of the search at which the entry was written to the history file. This will be the time the command was issued or the time at which the results were generated.


The next (fourth) field is always the command/result name corresponding to an entry in the name column in the above
History File Entry Types table.


These four fields are followed by zero, one or more fields depending on the command or result. E.g. a full open_database command might look like:

    0:213:0:open_database:cacm.sample:OK

Note: open_database is always command_no zero, and issued at time zero. The extra fields after the first four are described in the History File Field Types table table below.

open_
database
    <database_name>:<success>
    open_database:trec23_95:OK
define     <term>:[<operation>]
    define:stock market:A
query     <termset_no>:<term_no>:<bss_set>:<np>: <r>:<wgt>:<rsv>:
    <source>:<parsed>:<operation>
    query:0:0:2:14575:0:72:58:U:S:computerisation:computer:N
search     <termset_no>:<bss_set>:<weight>:<op_code>

where <op_code;gt IN [   ABSGN   ]. G is a GSL phrase, N is a single indexed term, and [ABS] describe the different types of user-entered phrases recognised by the system.


    search:4:5:198:10:105:12:72:6:99:3:74:2:72:4:28769
docset     <bss-set>:<np>:<maxwt>:<nmaxwt>: <ngw>:<mpw>:<nmpw>
    docset:4:28769:450:1:28769:1206:0
hl_title     <iteration_no>:<set_recno>: <internal_recno>: <docid>:<weight>:
    <passage_offset>:<passage_length>: <fulldoc_offset>:<fulldoc_length>
    hl_title:0:3:78668:FT931-11306:21.133:12:1569:12:7483
hl_info     <hl_info>:<iteration_no>:<set_recno>: <line_no>:<line>
Entries for one document might be:

    hl_info:0:3:0:FT 17 JUL 92 / Fraud trials to come More than two dozen
    hl_info:0:3:1:serious fraud prosecutions are awaiting trial or making their
    hl_info:0:3:2:way through the courts. Those involved include: Mr Kevi.....

hl_terms     <hl_terms>:<iteration_no>:<set_recno> <term source>:<document_tf>
Entries for one document might be:

    hl_terms:0:3:0:Fraud:4
    hl_terms:0:3:1:theft:4
    hl_terms:0:3:2:deception:1

show     <iteration_no>:<set_recno>:<docid>: <weight>:<rel_length>:<relj>

Note: <rel_length> is the length of the full document.


    show:0:1:FT921-3159:21.821:2929:F
expand     <termset_no>
    expand:5
remove     <termset_no>:<term_no>:<source>:<opcode>
    remove:5:3:Bloggs:n
restore     <termset_no>:<term_no>:<source>:<opcode>
    remove:5:3:Bloggs:n



Okapi-Pack Main Menu Mail Okapi Support Registration


Last modified:   12th November 2001