Ranked Retrieval precision and recall • P@K: proportion of retrieved top-K documents that are relevant • R@K: proportion of relevant documents that are retrieved in the top-K • Assumption: the user will only examine the top-K results K= 10 .. 1 5 Monday, March 25, 13 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 R@K (1/20) = 0.05 K= 1 1 6 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 R@K (1/20) = 0.05 (1/20) = 0.05 K= 2 1 7 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 K= 3 1 8 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 (3/4) = 0.75 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 (3/20) = 0.15 K= 4 1 9 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 (3/4) = 0.75 (4/5) = 0.80 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 (3/20) = 0.15 (4/20) = 0.20 K= 5 2 0 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 (3/4) = 0.75 (4/5) = 0.80 (5/6) = 0.83 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 (3/20) = 0.15 (4/20) = 0.20 (5/20) = 0.25 K= 6 2 1 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 (3/4) = 0.75 (4/5) = 0.80 (5/6) = 0.83 (6/7) = 0.86 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 (3/20) = 0.15 (4/20) = 0.20 (5/20) = 0.25 (6/20) = 0.30 K= 7 2 2 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 (3/4) = 0.75 (4/5) = 0.80 (5/6) = 0.83 (6/7) = 0.86 (6/8) = 0.75 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 (3/20) = 0.15 (4/20) = 0.20 (5/20) = 0.25 (6/20) = 0.30 (6/20) = 0.30 K= 8 2 3 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 (3/4) = 0.75 (4/5) = 0.80 (5/6) = 0.83 (6/7) = 0.86 (6/8) = 0.75 (7/9) = 0.78 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 (3/20) = 0.15 (4/20) = 0.20 (5/20) = 0.25 (6/20) = 0.30 (6/20) = 0.30 (7/20) = 0.35 K= 9 2 4 Ranked Retrieval precision and recall: exercise • Assume 20 relevant documents K 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 P@K (1/1) = 1.0 (1/2) = 0.5 (2/3) = 0.67 (3/4) = 0.75 (4/5) = 0.80 (5/6) = 0.83 (6/7) = 0.86 (6/8) = 0.75 (7/9) = 0.78 (7/10) = 0.70 R@K (1/20) = 0.05 (1/20) = 0.05 (2/20) = 0.10 (3/20) = 0.15 (4/20) = 0.20 (5/20) = 0.25 (6/20) = 0.30 (6/20) = 0.30 (7/20) = 0.35 (7/20) = 0.35 K= 10 2 5 Ranked Retrieval precision-recall curves rank (K) 1 2 3 4 5 ranking R@K P@K 0.10 0.10 0.20 0.30 0.40 1.00 0.50 0.67 0.75 0.80 6 7 8 9 10 11 12 13 0.50 0.60 0.60 0.70 0.70 0.80 0.80 0.80 0.83 0.86 0.75 0.78 0.70 0.73 0.67 0.62 14 15 16 17 18 19 0.90 0.90 0.90 0.90 0.90 0.90 0.64 0.60 0.56 0.53 0.50 0.47 20 1.00 0.50 5 1 Monday, March 25, 13 Ranked Retrieval precision-recall curves 11 0.9 0.9 Precision 0.8 0.8 0.7 0.7 0.6 Precision 0.4 0.5 0.4 0.6 0.3 0.5 0.2 0.3 0.2 0.1 0.1 0 0 0.10.20.30.40.50.60.70.80.9 0.1 0.20.30.40.50.60.70.80.91 1 Recall Recall 5 2 Monday, March 25, 13 Ranked Retrieval precision-recall curves 11 0.9 0.9 Precision 0.8 0.8 0.7 0.7 0.6 0.4 0.4 Precision Precision 0.5 Wrong! When plotting a PRcurve, we use the precision best for a level of recall R or greater! 0.4 0.6 0.6 0.3 0.5 0.5 0.2 0.3 0.3 0.2 0.1 0.2 0.1 0.1 0 0 0 0.10.20.30.40.50.60.70.80.9 0.1 0.20.30.40.50.60.70.80.91 0.1 0.20.30.40.50.60.70.80.91 Recall Recall Recall Monday, March 25, 13 1 5 3 Ranked Retrieval precision-recall curves 11 0.9 0.9 Precision 0.8 0.8 0.7 0.7 0.6 Precision 0.4 0.5 0.4 0.6 0.3 0.5 0.2 0.3 0.2 0.1 0.1 0 0 0.10.20.30.40.50.60.70.80.9 0.1 0.20.30.40.50.60.70.80.91 1 Recall Recall 5 4 Monday, March 25, 13 Ranked Retrieval precision-recall curves • • For a single query, a PR curve looks like a step-function For multiple queries, we can average these curves ‣ • Average the precision values for different values of recall (e.g., from 0.01 to 1.0 in increments of 0.01) This forms a smoother function 5 5 Monday, March 25, 13 Ranked Retrieval precision-recall curves • PR curves can be averaged across multiple queries 1 0.9 0.8 Precision 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.10.20.30.40.50.60.70.80.9 1 Recall 5 6 Monday, March 25, 13 Ranked Retrieval precision-recall curves 1 0.9 0.8 Precision 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.10.20.30.40.50.60.70.80.9 1 Recall 5 7 Monday, March 25, 13 Ranked Retrieval precision-recall curves 1 which system is better? 0.9 0.8 Precision 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.10.20.30.40.50.60.70.80.9 1 Recall 5 8 Monday, March 25, 13 Ranked Retrieval precision-recall curves 1 for what type of task would the blue system be better? 0.9 0.8 Precision 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.10.20.30.40.50.60.70.80.9 1 Recall 5 9 Monday, March 25, 13 Ranked Retrieval precision-recall curves 1 for what type of task would the blue system be better? 0.9 0.8 Precision 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.10.20.30.40.50.60.70.80.9 1 Recall 6 0 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • In some retrieval tasks, we really want to focus on precision at the top of the ranking • A classic example is web-search! ‣ users rarely care about recall users rarely navigate beyond the first page of results ‣ users may not even look at results below the “fold” ‣ • Are any of the metrics we’ve seen so far appropriate for web-search? 6 1 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain 6 2 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • • • We could potentially evaluate using P@K with several small values of K But, this has some limitations What are they? 6 3 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • Which retrieval is better? 6 4 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • Evaluation based on P@K can be too coarse K= K= 1 1 K= K= 5 5 K=1 0 K=1 0 6 5 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • P@K (and all the metrics we’ve seen so far) assume binary relevance K= K= 1 1 K= K= 5 5 K=1 0 K=1 0 6 6 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • • DCG: discounted cumulative gain Assumptions: ‣ There are more than two levels of relevance (e.g., perfect, excellent, good, fair, bad) ‣ A relevant document’s usefulness to a user decreases rapidly with rank (more rapidly than linearly) 6 7 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • Let RELi be the relevance associated with the document at rank i ‣ perfect ➔ 4 ‣ ‣ excellent ➔ 3 good ➔ 2 ‣ fair ➔ 1 ‣ bad ➔ 0 6 8 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain • DCG: discounted cumulative gain K DCG@K = Â i=1 RELi log2 (max(i, 2)) 6 9 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain a relevant document’s usefulness 1 decreases log2(i) exponentially with rank 1.2 1 Discount 0.8 0.6 0.4 0.2 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Rank 7 0 Monday, March 25, 13 Ranked Retrieval discounted-cumulative gain K DCG@K = Â i=1 rank (i) REL_i 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 4 3 4 2 0 0 0 1 1 0 RELi log2(max(i, 2)) This is given! the result at rank 1 is perfect the result at rank 2 is excellent the result at rank 3 is perfect ... the result at rank 10 is bad 7 1 Ranked Retrieval discounted-cumulative gain K DCG@K = Â i=1 rank (i) 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 REL_i 4 3 4 2 0 0 0 1 1 0 RELi log2(max(i, 2)) discount factor 1.00 1.00 0.63 0.50 0.43 0.39 0.36 0.33 0.32 0.30 Each rank is associated with a discount factor log 2 (max ( i, 2 )) 1 rank 1 is a special case! 7 2 Ranked Retrieval discounted-cumulative gain K DCG@K = rank (i) 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 REL_i 4 3 4 2 0 0 0 1 1 0 Â i=l1 RELi og2(max(i, 2)) discount factor 1.00 1.00 0.63 0.50 0.43 0.39 0.36 0.33 0.32 0.30 gain 4.00 3.00 2.52 1.00 0.00 0.00 0.00 0.33 0.32 0.00 multiply RELi by the discount factor associated with the rank! 7 3 Ranked Retrieval discounted-cumulative gain K DCG@K = Â i=1 rank (i) 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 REL_i 4 3 4 2 0 0 0 1 1 0 RELi log2(max(i, 2)) discount factor 1.00 1.00 0.63 0.50 0.43 0.39 0.36 0.33 0.32 0.30 gain 4.00 3.00 2.52 1.00 0.00 0.00 0.00 0.33 0.32 0.00 DCG_i 4.00 7.00 9.52 10.52 10.52 10.52 10.52 10.86 11.17 11.17 74 Ranked Retrieval discounted-cumulative gain DCG10 = 11.17 rank (i) 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 REL_i 4 3 4 2 0 0 0 1 1 0 discount factor 1.00 1.00 0.63 0.50 0.43 0.39 0.36 0.33 0.32 0.30 gain 4.00 3.00 2.52 1.00 0.00 0.00 0.00 0.33 0.32 0.00 DCG_i 4.00 7.00 9.52 10.52 10.52 10.52 10.52 10.86 11.17 11.17 75 Ranked Retrieval discounted-cumulative gain DCG10 = 11.17 rank (i) REL_i discount factorgain 1 3 1.00 3.00 2 3 1.00 3.00 3 4 0.63 2.52 4 2 0.50 1.00 5 0 0.43 0.00 6 0 0.39 0.00 7 0 0.36 0.00 8 1 0.33 0.33 9 1 0.32 0.32 10 0 0.30 0.00 changed top result from perfect instead of excellent Monday, March 25, 13 DCG_i 3.00 6.00 8.52 9.52 9.52 9.52 9.52 9.86 10.17 10.17 7 6 Ranked Retrieval discounted-cumulative gain DCG10 = 11.17 rank (i) 1 2 3 4 5 6 7 8 9 10 Monday, March 25, 13 REL_i discount factorgain 4 1.00 4.00 3 1.00 3.00 4 0.63 2.52 2 0.50 1.00 0 0.43 0.00 0 0.39 0.00 0 0.36 0.00 1 0.33 0.33 1 0.32 0.32 3 0.30 0.90 changed 10th result from bad to excellent DCG_i 4.00 7.00 9.52 10.52 10.52 10.52 10.52 10.86 11.17 12.08 7 7 Ranked Retrieval normalized discounted-cumulative gain • • • • • DCG is not ‘bounded’ In other words, it ranges from zero to ..... Makes it problematic to average across queries NDCG: normalized discounted-cumulative gain “Normalized” is a fancy way of saying, we change it so that it ranges from 0 to 1 7 8 Monday, March 25, 13 Ranked Retrieval normalized discounted-cumulative gain • NDCGi: normalized discounted-cumulative gain • For a given query, measure DCGi • Then, divide this DCGi value by the best possible DCGi for that query • Measure DCGi for the best possible ranking for a given value i 7 9 Monday, March 25, 13 Ranked Retrieval normalized discounted-cumulative gain • • • Given: a query has two 4’s, one 3, and the rest are 0’s Question: What is the best possible ranking for i = 1 All these are equally good: ‣ 4, 4, 3, .... ‣ 4, 3, 4, .... ‣ 4, 0, 0, .... ‣ ... anything with a 4 as the top-ranked result 8 0 Monday, March 25, 13 Ranked Retrieval normalized discounted-cumulative gain • • • Given: the query has two 4’s, one 3, and the rest are 0’s Question: What is the best possible ranking for i = 2 All these are equally good: ‣ 4, 4, 3, .... ‣ 4, 4, 0, .... 8 1 Monday, March 25, 13 Ranked Retrieval normalized discounted-cumulative gain • • • Given: the query has two 4’s, one 3, and the rest are 0’s Question: What is the best possible ranking for i = 3 All these are equally good: ‣ 4, 4, 3, .... 8 2 Monday, March 25, 13 Ranked Retrieval normalized discounted-cumulative gain • NDCGi: normalized discounted-cumulative gain • For a given query, measure DCGi • Then, divide this DCGi value by the best possible DCGi for that query • Measure DCGi for the best possible ranking for a given value i 8 3 Monday, March 25, 13 Metric Review • set-retrieval evaluation: we want to evaluate the set of documents retrieved by the system, without considering the ranking • ranked-retrieval evaluation: we want to evaluate the ranking of documents returned by the system 8 4 Monday, March 25, 13 Metric Review set-retrieval evaluation • precision: the proportion of retrieved documents that are relevant • recall: the proportion of relevant documents that are retrieved • f-measure: harmonic-mean of precision and recall ‣ a difficult metric to “cheat” by getting very high precision and abysmal recall (and vice-versa) 8 5 Monday, March 25, 13 Metric Review ranked-retrieval evaluation • P@K: precision under the assumption that the top-K results is the ‘set’ retrieved • R@K: recall under the assumption that the top-K results is the ‘set’ retrieved • average-precision: considers precision and recall and focuses mostly on the top results • DCG: ignores recall, considers multiple levels of relevance, and focuses very must on the top ranks • NDCG: trick to make DCG range between 0 and 1 8 6 Monday, March 25, 13 Which Metric Would You Use? 8 7 Monday, March 25, 13