Axiomatic Analysis and Optimization of Information Retrieval Models
Introduction
Recently, an axiomatic way of analyzing and optimizing retrieval models has been developed and shown great promise in both understanding the den ultimately optimal retrieval model, enables analytical comparison of different retrieval models without necessarily requiring empirical evaluation, and has led to the development of multiple more effective retrieval models. The purpose of this page is to provide a comprehensive list for all the resources related to this direction.
Book
- Coming soon! Axiomatic Analysis and Optimization
of Information Retrieval Models, by Hui Fang and ChengXiang Zhai. (In Preparation).
Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan & Claypool
Publishers
Talks
Retrieval Constraints
- Constraints for Basic Retrieval Models
- Basic TF-IDF-LN Constraints [Fang et al. 2004]
[Fang et al. 2011]
- TFC1: We should give a higher score to ao document with more occurrences of a query term.
- TFC2: The increase in the retrieval score due to an increase in TF should be smaller for larger TFs.
- TFC3 If two documents have the same total occurrences of all query terms and all the query terms have the same term discrimination value,
a higher score will be given to the document coving more distinct query terms.
- TDC: We should penalize the terms popular in the collection.
- LNC1: The score of a document should decrease if we add an extra occurrence of a non-relevant word.
- LNC2: We should avoid over-penalizing long relevant documents.
- TF-LNC: It regulates the interaction between TF and document length.
- Lower bounding TF constraints [Lv&Zhai, 2012]
- LB1: The presence-absence gap (0-1 gap) should not be closed due ot length normalization.
- LB2: Repeated occurrence of an already matched query term t as important as the first occurrence of an otherwise absent query term.
- Semantic term matching constraints [Fang&Zhai, 2006] [Fang, 2008]
- STMC1: We should give a higher score to a document with a term that is mroe semantically related to a query term.
- STMC2: We should avoid over-favoring semantically similar terms.
- STMC3: We should favor semantically similar terms.
- TSSC1 and TSSC2 : Term semantic similarity constraints
- Term proximity constraint [Tao&Zhai, 2007]
- Constraint (proximity heuristic): Term proximity should positively contribute to the retrieval score of a document.
- Constraint (convex curve): The contribution from a distance measure would drop quickly when the distance value is small and become nearly constant as the distance becomes larger.
- Query term relation based regularization constraints [Zheng&Fang, 2010]
[Wu&Fang, 2012]
- Regularization constraint: We should give a higher score to a document that covers more query aspects.
- AND Relation Constraint: If two terms in a query has an AND relation, documents with both terms should be ranked higher than those with only one query term.
- Constraints for Pseudo Relevance Feedback [Clinchant&Gaussier, 2010][Clinchant&Gaussier, 2011]
- Document score constraint: Documents with higher scores should be given more weight in the feedback weight function.
- Proximity constraint: Feedback terms should be close to query terms in documents.
- Document frequency constraint: Feedback terms should receive higher weights when they occur more in the feedback set.
- Constraints for Translation Models for IR [Karimzadehgan&Zhai, 2012]
- General constraint 1: In order to have a reasonable retrieval behavior, for all translation language models, the self-translation probability should be the same.
- General constraint 2: Self-translational probability should be larger than translating any other words to this word.
- General constraint 3: A word is more likely to be translated to itself than translating into any other words.
- Additional constraint 4: If word u occurs more times than word v in the context of word w and both words u and v co-occur with all other words similarly, the probability of translating word u to word v should be higher.
- Additional constraint 5: If both u and v equally co-occur with word w but v co-occurs with many other words than word u, the probability of translating word u to word w is higher.
- Constarints for Multi-criteria relevance ranking
[Gerani et al. 2013]
- Constarints for Evaluation Measures
[Amigo et al. 2013]
[Busin and Mizzaro 2013]
References
- [Bruza&Huibers, 1994] Investigating aboutness axioms using information fields. P. Bruza and T. W. C. Huibers. SIGIR 1994.
- [Fang, et. al. 2004] A formal study of information retrieval heuristics. H. Fang, T. Tao and C. Zhai. SIGIR 2004.
- [Fang&Zhai, 2005] An exploration of axiomatic approaches to information retrieval. H. Fang and C. Zhai, SIGIR 2005.
- [Fang&Zhai, 2006] Semantic term matching in axiomatic approaches to information retrieval. H. Fang and C. Zhai, SIGIR 2006.
- [Tao&Zhai, 2007] An exploration of proximity measures in information retrieval. T. Tao and C. Zhai, SIGIR 2007.
- [Cummins&O'Riordan, 2007] An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. R. Cummins and C. O'Riordan. Artificial Intelligence Review, 2007.
- [Fang, 2008] A Re-examination of query expansion using lexical resources. H. Fang. ACL 2008.
- [Na et al., 2008] Improving Term Frequency Normalization for multi-topical documents and application to language modeling approaches. S. Na, I Kang and J. Lee. ECIR 2008.
- [Gollapudi&Sharma, 2009] An axiomatic approach for result diversification. S. Gollapudi and Sharma, WWW 2009.
- [Cummins&O'Riordan, 2009]
Measuring Constraint Violations in Information Retrieval. R. Cummins and C. O'Riordan. SIGIR 2009.
- [Zheng&Fang, 2010] Query aspect based term weighting regularization in information retrieval. W. Zheng and H. Fang. ECIR 2010.
- [Clinchant&Gaussier,2010] Information-based models for Ad Hoc IR. S. Clinchant and E. Gaussier, SIGIR 2010.
- [Fang et al., 2011] Diagnostic evaluation of information retrieval models. H. Fang, T. Tao and C. Zhai. TOIS, 2011.
- [Lv&Zhai, 2011a] Lower-bounding term frequency normalization. Y. Lv and C. Zhai. CIKM 2011.
- [Lv&Zhai, 2011b] Adaptive term-frequency normalization for BM25. Y. Lv and C. Zhai. CIKM 2011. [Lv&Zhai, 2011] When documents are very long, BM25 fails! Y. Lv and C. Zhai. SIGIR 2011.
- [Clinchant&Gaussier, 2011a] Is document frequency important for PRF? S. Clinchant and E. Gaussier. ICTIR 2011.
- [Clinchant&Gaussier, 2011b] A document frequency constraint for pseudo-relevance feedback models. S. Clinchant and E. Gaussier. CORIA 2011.
- [Clinchant&Gaussier, 2011c] Retrieval constraints and word frequency distributions a log-logistic model for IR. S. Clinchant and E. Gaussier. Information Retrieval. 2011.
- [Zhang et al., 2011] How to count thumb-ups and thumb-downs: user-rating based ranking of items from an axiomatic perspective. D. Zhang, R. Mao, H. Li and J. Mao. ICTIR 2011.
- [Cummins&O'Riordan, 2011]
Analysing Ranking Functions in Information Retrieval Using Constraints. R. Cummins and C. O'Riordan. Information Extraction from the Internet, 2011.
- [Lv&Zhai, 2012] A log-logistic model-based interpretation of TF normalization of BM25. Y. Lv and C. Zhai. ECIR 2012.
- [Wu&Fang, 2012] Relation-based term weighting regularization. H. Wu and H. Fang. ECIR 2012.
- [Li&Gaussier, 2012] An information-based cross-language information retrieval model. B. Li and E. Gaussier. ECIR 2012.
- [Gerani et. al. ] Score transformation in linear combination for multi-criteria relevance ranking. S. Gerani, C. Zhai and F. Crestani. ECIR 2012.
- [Karimzadehgan&Zhai, 2012] Axiomatic analysis of translation language model for information retrieval. M. Karimzadehgan and C. Zhai. ECIR 2012.
- [Cummins&O'Riordan, 2012]
A Constraint to Automatically Regulate Document-Length Normalisation. R. Cummins and C. O'Riordn. CIKM 2012.
- [Amigo et al. 2013] A general evaluation measure for document organization tasks. E. Amigo, J. Gonzalo and F. Verdejo. SIGIR 2013.
- [Busin and Mizzaro 2013] Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics. L. Busin and S. Mizzaro. ICTIR 2013.
- [Clinchant and Gaussier 2013]
A Theoretical Analysis of Pseudo-Relevance Feedback Models. S. Clinchant and E. Gaussier. ICTIR 2013.
- [Wang et al. 2014]
A Study of Concept-based Weighting Regularization for Medical Records Search. Y. Wang, X. Liu and H. Fang. ACL 2014.
Mailing List
Contributors