Inverse Document Frequency

The Spärck Jones / Robertson IDF page

In 1972, Karen Spärck Jones published in the Journal of Documentation the paper which defined the term weighting scheme now known as inverse document frequency (IDF). This was reprinted in 2004 as part of a celebration of 60 years of the Journal. In the same 2004 issue, Stephen Robertson wrote an analysis of the theoretical basis for idf, and Karen wrote a reply.

This page links to copies of all the above, plus an interchange by letter shortly after the first publication. We are grateful to Emerald for permission to make these items available.

The original exchange in 1972 was part of the stimulus for the development (via a short paper [1] in 1974) for the Robertson/Spärck Jones relevance weighting model of 1976 [2]. However, the circle was not fully closed until the Croft/Harper paper of 1979 [3] which showed IDF as an approximation to RSJ relevance weighting, together with a much later paper [4] which clarified the difference between the Croft/Harper approximation and the original formula. A short technical report [5] summarises the text retrieval methods developed in this framework, and a comprehensive paper [6] covers the combination of IDF weighting with other weighting factors and reports extensive experimental results.

  1. S.E. Robertson, Specificity and weighted retrieval. Journal of Documentation 30, 41-6 (1974).
  2. S.E. Robertson and K. Spärck Jones, Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129-46 (1976). Reprinted in: P. Willett (ed.), Document Retrieval Systems. Taylor Graham, 1988. (pp 143-160).
  3. W. Croft and D. Harper, Using probabilistic models of information retrieval without relevance information, Journal of Documentation 35, 285-295 (1979),
  4. S.E. Robertson and S. Walker, On relevance weights with little relevance information. Presented at SIGIR 97, Philadephia, 1997. In: N.J. Belkin, A.D. Narasimhalu and P. Willett (eds), SIGIR '97. ACM, 1997. (pp 16-24).
  5. S.E. Robertson and K. Spärck Jones, Simple, proven approaches to text retrieval. University of Cambridge Computer Laboratory Technical Report no. 356, 1994 (updated 1996,1997,2006).
  6. K. Sparck Jones, S. Walker and S.E. Robertson, A probabilistic model of information retrieval: development and comparative experiments. Information Processing and Management 36, Part 1 779-808; Part 2 809-840 (2000).

Stephen Robertson
February 2005; revised March 2006