Inverse Document Frequency
The Spärck Jones / Robertson IDF page
In 1972, Karen Spärck Jones published in the Journal of Documentation the paper which defined the term weighting scheme now known as inverse document frequency (IDF). This was reprinted in 2004 as part of a celebration of 60 years of the Journal. In the same 2004 issue, Stephen Robertson wrote an analysis of the theoretical basis for idf, and Karen wrote a reply.
This page links to copies of all the above, plus an interchange by letter shortly after the first publication. We are grateful to Emerald for permission to make these items available.
- The original paper: K. Spärck Jones, A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11-21, 1972 and 60, 493-502, 2004
- Letter and reply, Journal of Documentation 28, 164-165, 1972
- S. Robertson, Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation 60, 503-520, 2004
- K. Spärck Jones, IDF term weighting and IR research lessons. Journal of Documentation 60, 521-523, 2004
The original exchange in 1972 was part of the stimulus for the development (via a short paper [1] in 1974) for the Robertson/Spärck Jones relevance weighting model of 1976 [2]. However, the circle was not fully closed until the Croft/Harper paper of 1979 [3] which showed IDF as an approximation to RSJ relevance weighting, together with a much later paper [4] which clarified the difference between the Croft/Harper approximation and the original formula. A short technical report [5] summarises the text retrieval methods developed in this framework, and a comprehensive paper [6] covers the combination of IDF weighting with other weighting factors and reports extensive experimental results.
- S.E. Robertson,
Specificity and weighted retrieval.
Journal of Documentation 30,
41-6 (1974).
-
S.E. Robertson and K. Spärck Jones, Relevance weighting of search terms.
Journal of the American Society for Information Science 27,
129-46 (1976). Reprinted in: P. Willett (ed.),
Document Retrieval Systems.
Taylor Graham, 1988. (pp 143-160).
- W. Croft and D. Harper,
Using probabilistic models of information retrieval without relevance information,
Journal of Documentation 35,
285-295 (1979),
- S.E. Robertson and S. Walker, On relevance weights with little relevance
information. Presented at SIGIR 97, Philadephia, 1997. In: N.J. Belkin,
A.D. Narasimhalu and P. Willett (eds), SIGIR '97. ACM, 1997. (pp 16-24).
-
S.E. Robertson and K. Spärck Jones, Simple, proven approaches to text retrieval.
University of Cambridge Computer Laboratory Technical Report no. 356,
1994 (updated 1996,1997,2006).
-
K. Sparck Jones, S. Walker and S.E. Robertson,
A probabilistic model of information retrieval: development and
comparative experiments.
Information Processing and Management 36, Part 1 779-808;
Part 2 809-840 (2000).
Stephen Robertson
February 2005; revised March 2006