Axiomatic Thinking for Information Retrieval
Introduction
Recently, axiomatic thinking has been adopted for the development of both retrieval models and
evaluation metrics with great promise. The general idea of axiomatic thinking is to seek a set of desirable
properties expressed mathematically as formal constraints to guide the search for an
optimal solution; the explicit expression of desirable properties makes it possible to analytically address issues that would otherwise appear to be purely empirical, provide theoretical guidance on how we might be able to optimize a retrieval model or evaluation metric, and apply any identified constraints directly in many practical applications.
The purpose of this page is to provide a comprehensive list for all the resources related to this direction.
Book
 Coming soon! Axiomatic Analysis and Optimization
of Information Retrieval Models, by Hui Fang and ChengXiang Zhai. (In Preparation).
Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan & Claypool
Publishers
Workshop
 SIGIR'17 Workshop on Axiomatic Thinking for
Information Retrieval and Related Tasks (ATIR). (website).
Talks
Retrieval Constraints for IR models
 Constraints for Basic Retrieval Models
 Basic TFIDFLN Constraints [Fang et al. 2004]
[Fang et al. 2011]
 TFC1: We should give a higher score to ao document with more occurrences of a query term.
 TFC2: The increase in the retrieval score due to an increase in TF should be smaller for larger TFs.
 TFC3 If two documents have the same total occurrences of all query terms and all the query terms have the same term discrimination value,
a higher score will be given to the document coving more distinct query terms.
 TDC: We should penalize the terms popular in the collection.
 LNC1: The score of a document should decrease if we add an extra occurrence of a nonrelevant word.
 LNC2: We should avoid overpenalizing long relevant documents.
 TFLNC: It regulates the interaction between TF and document length.
 Lower bounding TF constraints [Lv&Zhai, 2012]
 LB1: The presenceabsence gap (01 gap) should not be closed due ot length normalization.
 LB2: Repeated occurrence of an already matched query term t as important as the first occurrence of an otherwise absent query term.
 Semantic term matching constraints [Fang&Zhai, 2006] [Fang, 2008]
 STMC1: We should give a higher score to a document with a term that is mroe semantically related to a query term.
 STMC2: We should avoid overfavoring semantically similar terms.
 STMC3: We should favor semantically similar terms.
 TSSC1 and TSSC2 : Term semantic similarity constraints
 Term proximity constraint [Tao&Zhai, 2007]
 Constraint (proximity heuristic): Term proximity should positively contribute to the retrieval score of a document.
 Constraint (convex curve): The contribution from a distance measure would drop quickly when the distance value is small and become nearly constant as the distance becomes larger.
 Query term relation based regularization constraints [Zheng&Fang, 2010]
[Wu&Fang, 2012]
 Regularization constraint: We should give a higher score to a document that covers more query aspects.
 AND Relation Constraint: If two terms in a query has an AND relation, documents with both terms should be ranked higher than those with only one query term.
 Constraints for Pseudo Relevance Feedback [Clinchant&Gaussier, 2010][Clinchant&Gaussier, 2011]
 Document score constraint: Documents with higher scores should be given more weight in the feedback weight function.
 Proximity constraint: Feedback terms should be close to query terms in documents.
 Document frequency constraint: Feedback terms should receive higher weights when they occur more in the feedback set.
 Constraints for Translation Models for IR [Karimzadehgan&Zhai, 2012]
 General constraint 1: In order to have a reasonable retrieval behavior, for all translation language models, the selftranslation probability should be the same.
 General constraint 2: Selftranslational probability should be larger than translating any other words to this word.
 General constraint 3: A word is more likely to be translated to itself than translating into any other words.
 Additional constraint 4: If word u occurs more times than word v in the context of word w and both words u and v cooccur with all other words similarly, the probability of translating word u to word v should be higher.
 Additional constraint 5: If both u and v equally cooccur with word w but v cooccurs with many other words than word u, the probability of translating word u to word w is higher.
 Constarints for Multicriteria relevance ranking
[Gerani et al. 2013]
Evaluation Metric Axiomatics
 General Constraints
 Strong Definiteness: The measures must be definable under any gold or system output. [Sebastiani 2015]
 Fixed Range: The metric score must be lower and upper bounded. [Sebastiani 2016, Moffat 2013]
 Classification metrics
 General Classification Constraints
 Strict Monotonocity Axiom: Replacing an uncorrect decision by a correct decision increases the score. [Sebastiani 2015] [Solokova 2016]
 Continuous Differentiability: The evaluation measure must be continuous and differentiable over the true positives and true negatives. [Sebastiani 2015]
 Symmetricity : The evaluation measure should be invariant with respect to switching the roles of the class and its complement. [Sebastiany 2015, Solokova 2016]
 Task dependent properties:
 Absolute Weighting : There exists a parameter c in the evaluation measure that determines if adding one document from each class into a system categorization set improves the system output. [Amigo et al. 2017]
 NonInformativeness Fixed Score: Any noninformative (random) clasification output achieves a fixed score. [Amigo et al. 2017]
 NonInformativeness Growing Score : The score of a noninformative system output for a certain class is correlated with the output class size. [Amigo et al. 2017]
 Clustering Metrics:
 Cluster Homogeneity: Joining clusters with itmes from different categories decreases the score. [Roseberg and Hirschberg 2007, Amigo et al. 2009]
 Cluster Completedness: Items belonging to the same category should appear in the same cluster. [Roseberg and Hirschberg 2007, Dom 2001, Amigo et al. 2009, Meila 2003]
 Rag Bag : Adding noise (single item categories) to a clean cluster is worse than joining noisy items in the same clusters. [Amigo et al 2009]
 Cluster Size vs. Quantity : A small error in a big cluster is preferable to a large number of small errors in small clusters. [Amigo et al 2009, Meila 2003]
 Ranking Constraints:
 Priority Constraints: Swapping two documents in the ranking according to their relevance increases the score. [Moffat 2013, Ferrante et al. 2015, Amigo et al. 2013]
 Deepness : There exits an area which is never explored by the user. The relevance of documents at the top of the ranking has more effect in the evaluation. [Moffat 2013, Ferrante et al. 2015, Amigo et al. 2013]
 Deepness Thresholds : It is better one relevant document at the top than a huge amount of relevant documents after a huge amount of non relevant documents. [Amigo et al. 2013]
 Closeness Threshold Constraint : There is an area which is always explored by the user. There existst a number n small enough such that n relevant documents in the 2*n first positions is better than one relevant document in the first position. [Amigo et al. 2013]
 Confidence Constraint: Adding noise at the end of the ranking decreases the score. [Amigo et al 2013]
Similarity Axiomatics
 Metric Space Based Similarity Axiom [Shepard 1987]
 Maximality
 Symmetricity
 Triangular Inequality
 Feature Constrast Model [Tversky 1977]
 Matching: The similarity is a function over the common and differences between objects
 Monotonocity : The similatiry increases whenever the common features increase or the differences decrease.
 Independence : Object features affect similarity independently.
 Similarity Axioms based on Information Theory
[Amigo et al 2017]
 Identity Axiom : Adding or removing features to an object decreases the similarity to the original.
 Identity Specificity Axiom : Adding new features increases the selfsimilarity.
 Unexpectedness Axiom : When adding a feature, the similarity decreases to a grater extent if the added feature is less expected.
 Dependency Axiom : Adding new features in both objects increases their similarity whenever their respective conditional probabilities grow.
 Asymmetricity: An object is more similar to any of its parts than viceversa.
References related to IR models
 [Bruza&Huibers, 1994] Investigating aboutness axioms using information fields. P. Bruza and T. W. C. Huibers. SIGIR 1994.
 [Fang, et. al. 2004] A formal study of information retrieval heuristics. H. Fang, T. Tao and C. Zhai. SIGIR 2004.
 [Fang&Zhai, 2005] An exploration of axiomatic approaches to information retrieval. H. Fang and C. Zhai, SIGIR 2005.
 [Fang&Zhai, 2006] Semantic term matching in axiomatic approaches to information retrieval. H. Fang and C. Zhai, SIGIR 2006.
 [Tao&Zhai, 2007] An exploration of proximity measures in information retrieval. T. Tao and C. Zhai, SIGIR 2007.
 [Cummins&O'Riordan, 2007] An axiomatic comparison of learned termweighting schemes in information retrieval: clarifications and extensions. R. Cummins and C. O'Riordan. Artificial Intelligence Review, 2007.
 [Fang, 2008] A Reexamination of query expansion using lexical resources. H. Fang. ACL 2008.
 [Na et al., 2008] Improving Term Frequency Normalization for multitopical documents and application to language modeling approaches. S. Na, I Kang and J. Lee. ECIR 2008.
 [Gollapudi&Sharma, 2009] An axiomatic approach for result diversification. S. Gollapudi and Sharma, WWW 2009.
 [Cummins&O'Riordan, 2009]
Measuring Constraint Violations in Information Retrieval. R. Cummins and C. O'Riordan. SIGIR 2009.
 [Zheng&Fang, 2010] Query aspect based term weighting regularization in information retrieval. W. Zheng and H. Fang. ECIR 2010.
 [Clinchant&Gaussier,2010] Informationbased models for Ad Hoc IR. S. Clinchant and E. Gaussier, SIGIR 2010.
 [Fang et al., 2011] Diagnostic evaluation of information retrieval models. H. Fang, T. Tao and C. Zhai. TOIS, 2011.
 [Lv&Zhai, 2011a] Lowerbounding term frequency normalization. Y. Lv and C. Zhai. CIKM 2011.
 [Lv&Zhai, 2011b] Adaptive termfrequency normalization for BM25. Y. Lv and C. Zhai. CIKM 2011. [Lv&Zhai, 2011] When documents are very long, BM25 fails! Y. Lv and C. Zhai. SIGIR 2011.
 [Clinchant&Gaussier, 2011a] Is document frequency important for PRF? S. Clinchant and E. Gaussier. ICTIR 2011.
 [Clinchant&Gaussier, 2011b] A document frequency constraint for pseudorelevance feedback models. S. Clinchant and E. Gaussier. CORIA 2011.
 [Clinchant&Gaussier, 2011c] Retrieval constraints and word frequency distributions a loglogistic model for IR. S. Clinchant and E. Gaussier. Information Retrieval. 2011.
 [Zhang et al., 2011] How to count thumbups and thumbdowns: userrating based ranking of items from an axiomatic perspective. D. Zhang, R. Mao, H. Li and J. Mao. ICTIR 2011.
 [Cummins&O'Riordan, 2011]
Analysing Ranking Functions in Information Retrieval Using Constraints. R. Cummins and C. O'Riordan. Information Extraction from the Internet, 2011.
 [Lv&Zhai, 2012] A loglogistic modelbased interpretation of TF normalization of BM25. Y. Lv and C. Zhai. ECIR 2012.
 [Wu&Fang, 2012] Relationbased term weighting regularization. H. Wu and H. Fang. ECIR 2012.
 [Li&Gaussier, 2012] An informationbased crosslanguage information retrieval model. B. Li and E. Gaussier. ECIR 2012.
 [Gerani et. al. ] Score transformation in linear combination for multicriteria relevance ranking. S. Gerani, C. Zhai and F. Crestani. ECIR 2012.
 [Karimzadehgan&Zhai, 2012] Axiomatic analysis of translation language model for information retrieval. M. Karimzadehgan and C. Zhai. ECIR 2012.
 [Cummins&O'Riordan, 2012]
A Constraint to Automatically Regulate DocumentLength Normalisation. R. Cummins and C. O'Riordn. CIKM 2012.
 [Amigo et al. 2013] A general evaluation measure for document organization tasks. E. Amigo, J. Gonzalo and F. Verdejo. SIGIR 2013.
 [Busin and Mizzaro 2013] Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics. L. Busin and S. Mizzaro. ICTIR 2013.
 [Clinchant and Gaussier 2013]
A Theoretical Analysis of PseudoRelevance Feedback Models. S. Clinchant and E. Gaussier. ICTIR 2013.
 [Wang et al. 2014]
A Study of Conceptbased Weighting Regularization for Medical Records Search. Y. Wang, X. Liu and H. Fang. ACL 2014.
 [Rahimi et al. 2014]
Axiomatic Analysis of CrossLanguage Information Retrieval. R. Rahimi, A. Shakery and I. King. CIKM 2014.
 [Hazimeh et al. 2015]
Axiomatic Analysis of Smoothing Methods in Language Models for PseudoRelevance Feedback. H. Hazimeh and C. Zhai. ICTIR 2015.
 [Lv 2015]
A Study of Query Length Heuristics in Information Retrieval. Y. Lv. ICTIR 2015.
 [Sebastiani 2015]
An Axiomatically Derived Measure for the Evaluation of Classification Algorithms. F. Sebastiani. ICTIR 2015.
 [Goswami et al. 2015]
Study of Heuristic IR Constraints Unver Function Discovery Framework. P. Goswami, M. Amini and E. Gaussier. ICTIR 2015.
 [Makarenkov et al. 2015]
Theoretical Categorization of Query Performance Predictors. V. Makarenkov, B. Shapira and L. Rokach. ICTIR 2015.
References related to evaluation

Enrique Amigo, Julio Gonzalo, Fernando Giner and Felisa Verdejo "An Axiomatic Account of Similarity" ATIR Workshop 2017.
 A. Tversky. Features of similarity. Psychological Review, 84:327352, 1977.
 R. Shepard. Toward a universal law of generalization for psychological science.
Science, 237:13171323, 1987.
 Alistair Moffat. 2013. Seven Numeric Properties of Effectiveness Metrics. In AIRS'13. 112.
 Marco Ferrante, Nicola Ferro, and Maria Maistro. 2015. Towards a Formal Framework for Utilityoriented Measurements
of Retrieval Effectiveness. In Proceedings of ICTIR 2015. ACM, New York, NY, USA, 2130.
 Enrique Amigo, Julio Gonzalo, and Felisa Verdejo. 2013. A General Evaluation Measure for Document Organization
Tasks. In Proceedings of the 36th ACM SIGIR. ACM, New York, NY, USA, 64365
 Marina Meila. 2003. Comparing clusterings. In Proceedings of COLT 03
 Enrique Amigo, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation
metrics based on formal constraints. Information Retrieval 12, 4 (2009), 461486.
 Andrew Rosenberg and Julia Hirschberg. 2007. VMeasure: A Conditional EntropyBased External Cluster Evaluation
Measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL).
 B. Dom. 2001. An InformationTheoretic External ClusterValidity Measure. IBM Research Report. (2001)
 Fabrizio Sebastiani. 2015. An Axiomatically Derived Measure for the Evaluation of Classification Algorithms. In
Proceedings of ICTIR 2015. ACM, 1120.
 Marina Sokolova. 2006. Assessing Invariance Properties of Evaluation Measures. In Proceedings of NIPS'06 Workshop
on Testing Deployable Learning and Decision Systems.
Mailing List
Contributors