Relevance Scores

Default scoring mechanism (TF-IDF)
When documents were ranked using the default method (TF-IDF) for scoring, two judges assigned relevance scores to the top 10 documents for the five different information needs we described in Information Needs.

Statistics

 * Cohen's kappa: 0,6078.
 * P@10 for judge 1: 0,8.
 * P@10 for judge 2: 0,9.
 * P@10 (when both think a document is relevant): 0,8.
 * P@10 (when one thinks a document is relevant): 0,9.

Statistics

 * Cohen's kappa: 0,375.
 * P@10 for judge 1: 0,3.
 * P@10 for judge 2: 0,1.
 * P@10 (when both think a document is relevant): 0,1.
 * P@10 (when one thinks a document is relevant): 0,3.

Statistics

 * Cohen's kappa: 1.
 * P@10 for judge 1: 0,4.
 * P@10 for judge 2: 0,4.
 * P@10 (when both think a document is relevant): 0,4.
 * P@10 (when one thinks a document is relevant): 0,4.

Statistics

 * Cohen's kappa: 0,393939393939.
 * P@10 for judge 1: 0,6.
 * P@10 for judge 2: 0,5.
 * P@10 (when both think a document is relevant): 0,4.
 * P@10 (when one thinks a document is relevant): 0,7.

Statistics

 * Cohen's kappa: 0,607843137255.
 * P@10 for judge 1: 0,9.
 * P@10 for judge 2: 0,8.
 * P@10 (when both think a document is relevant): 0,8.
 * P@10 (when one thinks a document is relevant): 0,9.

BM25
When documents were ranked using BM25 relevance scoring, two judges assigned relevance scores to the top 10 documents for the five different information needs we described in Information Needs.

Statistics

 * Cohen's kappa: 0,607843137255.
 * P@10 for judge 1: 0,9.
 * P@10 for judge 2: 0,8.
 * P@10 when both think a document is relevant): 0,8.
 * P@10 when one thinks a document is relevant): 0,9.

Statistics

 * Cohen's kappa: 0,375.
 * P@10 for judge 1: 0,3.
 * P@10 for judge 2: 0,1.
 * P@10 when both think a document is relevant): 0,1.
 * P@10 when one thinks a document is relevant): 0,3.

Statistics

 * Cohen's kappa: 0,79797979798.
 * P@10 for judge 1: 0,5.
 * P@10 for judge 2: 0,4.
 * P@10 when both think a document is relevant): 0,4.
 * P@10 when one thinks a document is relevant): 0,5.

Statistics

 * Cohen's kappa: 0,393939393939.
 * P@10 for judge 1: 0,6.
 * P@10 for judge 2: 0,5.
 * P@10 when both think a document is relevant): 0,4.
 * P@10 when one thinks a document is relevant): 0,7.

Statistics

 * Cohen's kappa: 0,375.
 * P@10 for judge 1: 0,9.
 * P@10 for judge 2: 0,7.
 * P@10 when both think a document is relevant): 0,7.
 * P@10 when one thinks a document is relevant): 0,9.

Language model (with JM smoothing)
When documents were ranked using a language model (with Jelinek-Mercer smoothing) for scoring, two judges assigned relevance scores to the top 10 documents for the five different information needs we described in Information Needs.

Statistics

 * Cohen's kappa: 0,607843137255.
 * P@10 for judge 1: 0,8.
 * P@10 for judge 2: 0,9.
 * P@10 when both think a document is relevant): 0,8.
 * P@10 when one thinks a document is relevant): 0,9.

Statistics

 * Cohen's kappa: 0,375.
 * P@10 for judge 1: 0,3.
 * P@10 for judge 2: 0,1.
 * P@10 when both think a document is relevant): 0,1.
 * P@10 when one thinks a document is relevant): 0,3.

Statistics

 * Cohen's kappa: 1.0,
 * P@10 for judge 1: 0,1.
 * P@10 for judge 2: 0,1.
 * P@10 when both think a document is relevant): 0,1.
 * P@10 when one thinks a document is relevant): 0,1

Statistics

 * Cohen's kappa: 0,393939393939.
 * P@10 for judge 1: 0,6.
 * P@10 for judge 2: 0,5.
 * P@10 when both think a document is relevant): 0,4.
 * P@10 when one thinks a document is relevant): 0,7.

Statistics

 * Cohen's kappa: 1.0,
 * P@10 for judge 1: 0,9.
 * P@10 for judge 2: 0,9.
 * P@10 when both think a document is relevant): 0,9.
 * P@10 when one thinks a document is relevant): 0,9.