THE AXIOMS OF SUBJECTIVE PROBABILITY: In 1986, Peter Fishburn, a mathematician working at Bell Labs in Murray Hill, New Jersey, reviewed the various ‘Axioms of Subjective Probability’, in a splendid article in Statistical Science. These have been regard by many Bayesians as justifying their inferences, even to the extent that a scientific worker can be regarded as incoherent if his statistical methodology is not completely equivalent to a Bayesian procedure based on a proper (countably additive) prior distribution i.e. if he does not act like a Bayesian (with a proper prior).
This highly normative philosophy has given rise to numerous practical and real-life implications, for instance,
(1) Many empirical and applied Bayesians have been ridiculed and even ‘cast out’ for being at apparent variance with the official paradigm.
(2) Bayesians have at times risked having their papers rejected if their methodology is not perceived as being sufficiently coherent, even if their theory made sense in applied terms.
(3) Careers have been put at risk, e.g. in groups or departments which pride themselves on being completely Bayesian.
(4) Some Bayesians have wished to remain controlled by the concept of coherence even in situations where the choice of sampling model is open to serious debate.
(5) Many Bayesians still use Bayes factors e.g. as measures of evidence or for model comparison, simply because they are ‘coherent’, even in situations where they defy common sense (unless of course a Bayesian p-value of the type recently proposed by Baskurt and Evans is also calculated).
Before 1986, the literature regarding the Axioms of Subjective Probability was quite confusing, and some of it was downright incorrect. When put to the question, some ‘coherent’ Bayesians slid from one axiom system to another, and others expressed confusion or could only remember the simple axioms within any particular system while conveniently forgetting the more complicated ones. However, Peter Fishburn describes the axiom systems in beautifully erudite terms, thereby enabling a frank and honest discussion.
Fishburn refers to a binary relation for any two events A and B which are expressible as subsets A and B of the parameter space Θ. In other words, you are required to say whether your parameter θ is more likely (Fishburn uses the slightly misleading term ‘more probable’) to belong to A than to B, or whether the reverse is true, or whether you judge the two events to be equally likely. Most of the axiom systems assume that you can do this for any pair of events A and B in Θ. This is in itself a very strong assumption. For example, if Θ is finite and contains just 30 elements then you will have over a billion subsets and over billion billion pairwise comparisons to consider.
The question then arises as to whether there exists a subjective probability distribution p defined on events in Θ which is in agreement with all your binary relations. You would certainly require your probabilities to satisfy p(A) >p(B) whenever you think that θ is more likely to belong to A than to B. It will turn out that some further, extremely strong, assumptions are needed to ensure the existence of such a probability distribution.
Note that when the parameter space contains a finite number k of elements, a probability distribution must, by definition, assign non-negative probabilities summing to unity to these k different elements. In 1931 De Finetti recommended attempting to justify representing your subjective information and beliefs by a probability distribution, by assuming that your binary relations satisfy a reasonably simple set of ‘coherency’ axioms.
However, in 1959, Charles Kraft, John Pratt, and the distinguished Berkeley algebraist Abraham Seidenberg proved that De Finetti’s ‘Axioms of Coherence’ were insufficient to guarantee the existence of a probability distribution on Θ that was in agreement with your binary relations, and they instead assumed that the latter should be taken to satisfy a strong additivity property that is extremely horribly complicated.
Indeed the strongly additive property requires you to contrast any m events with any other m events, and to be able to do this for all m=2,--, p. In Ch. 4 of his 1970 book Utility Theory for Decision Making, Fishburn proved that if your binary relations are non-trivial then strong additivity is a necessary and sufficient condition for the existence of an appropriate probability distribution.
The strong additivity property is valuable in the sense that it provides a mathematical rigorous axiomatization of subjective probability when the parameter space is finite, where the axioms do not refer to probability in themselves. However, in intuitive terms, it would surely be much simpler to replace it by the more readily comprehendible assumption:
Axiom T: You can represent your prior information and beliefs about θ by assigning non-negative relative weights (to subsequently be regarded as probabilities), summing to unity, to the k elements or ‘outcomes’ in Θ, in the knowledge that you can then calculate your relative weight of any event A by summing the relative weights of the corresponding outcomes in Θ.
I do not see why we need to contort our minds in order to justify using Axiom T. You should be just be able to look at it and decide whether you are prepared to use it or not. When viewed from an inductive scientific perspective, the strong additive property just provides us with highly elaborate window dressing. If you don’t agree with even a tiny part of it, then it doesn’t put you under any compulsion to concur with Axiom T. The so-called concept of coherence is pie in the sky! You either want to be a Bayesian (or act like one) or you don’t.
When the parameter space Θ is infinite, probabilities can, for measure-theoretic reasons only be defined on subsets of θ, known as events, which belong to a σ-algebra e.g. an infinite Boolean algebra of subsets of θ. Any probability distribution which assigns probabilities to members of this σ-algebra must, by definition, satisfy Kolmogorov’s ‘countable additivity property’ in other words your probabilities should add up sensibly when you are evaluating the probability of the union of any finite or infinite sequence of disjoint events. Most axiom systems require you to state the binary relationship between any two of the multitudinous events in the σ-algebra, after considering which of them is more likely.
Bruno De Finetti and Jimmie Savage proposed various quite complicated axiom systems while attempting to guarantee the existence of a probability distribution that would be in agreement with your infinity of infinities of binary relations. However, in his 1964 paper on quality σ-algebras, in the Annals of Mathematical Statistics, the brilliant Uruguayan mathematician Cesareo Villegas proved that a rather strong property is needed, in addition to De Finetti’s four basic axioms, in order to ensure countable additivity of your probability distribution, and he called this the monotone continuity property.
Trivialities can be more essential than generalities, as my friend Thomas the Tank-Engine once said.
Unfortunately, monotone continuity would appear to be more complicated in mathematical terms than Kolmogorov’s countable additivity axiom, and it would seem somewhat easier in scientific terms to simply decide whether you want to represent your subjective information or beliefs by a countably-additive probability distribution, or not. If monotone continuity is supposed to be a part of coherence, then this is the sort of coherence that is likely to glue your brain cells together. Any blame associated with this phenomenon falls on the ways the mathematics was later misinterpreted, and we are all very indebted to Cesareo Villegas for his wonderful conclusions.
Your backwards inductive thought processes might well suggest that most other sets of Axioms of Subjective Probability which are pulled out from under a stone would by necessity include assumptions which are similar in strength to strong additivity and monotone continuity, since they would otherwise not be able imply the existence of a countably additive probability distribution on a continuous parameter space which is in agreement with your subjectively derived binary relations. See for example, my discussion in  of the five axioms described by De Groot on pages 71-76 of Optimal Statistical Decisions. Other axiom systems are brought into play by the discussants of Peter Fishburn’s remarkable paper.
In summary, the Axioms of Subjective Probability are NOT essential ingredients of the Bayesian paradigm. They’re a torrent, rather than a mere sprinkling, of proverbial Holy Water. If we put them safely to bed, then we can broaden our wonderfully successful paradigm in order to give it even more credibility in Science, Medicine, Socio-Economics, and wherever people can benefit from it.
Meanwhile, some of the more diehard Bayesian ‘High Priests’ have been living in ‘airy fairy’ land. They have been using the axioms in their apparent attempts to control our discipline from a narrow-minded powerbase, and we should all now make a determined effort to break free from these Romanesque constraints in our search for scientific truth and reasonably evidence-based medical and social conclusions which will benefit Society and ordinary people.
Please see Author’s Notes (below) for an alternative view on these key issues, which has been kindly contributed by Angelika van der Linde, who has recently retired from the University of Bremen.
Cesareo Villegas (1921-2001) worked for much of his career at the Institute of Mathematics in Montevideo before moving to Simon Fraser University where he become Emeritus Professor of Statistics after his retirement. He published eight theoretical papers in the IMS’s Annals journals, and three in JASA. He is well-known for his development of priors satisfying certain invariance properties, was actively involved with Jose Bernardo in creating the Bayesian Valencia conferences, and seems to have been an unsung hero.
Bruno de Finetti's contributions to the Axioms of Subjective Probability were far surpassed by his (de Finetti’s) development of the key concept ‘exchangeability’, both in terms of mathematical rigor and credibility of subsequent interpretation, and it is for the latter that our Italian maestro should be longest remembered.
|Bruno de Finetti|