Search This Blog

Friday, 9 October 2015

ON RE-EVALUATING BAYESIAN AXIOMATIZATIONS OF SUBJECTIVE PROBABILITY AND UTILITY


                                                                 



I think that the history of the various axiomatizations of subjective probability and utility does severe discredit to the Bayesian statisticians who have misused, misrepresented, and abused them to the detriment of our profession. Maybe Rev, Thomas Bayes is quaking in his grave,


Here is an excerpt from Chapter 5 of my PERSONAL HISTORY OF BAYESIAN STATISTICS (2014).





THE AXIOMS OF SUBJECTIVE PROBABILITY:  In 1986, Peter Fishburn, a mathematician working at Bell Labs in Murray Hill, New Jersey, reviewed the various ‘Axioms of Subjective Probability’, in a splendid article in Statistical Science. These have been regard by many Bayesians as justifying their inferences, even to the extent that a scientific worker can be regarded asincoherent if his statistical methodology is not completely equivalent to a Bayesian procedure based on a proper (countably additive) prior distribution i.e. if he does not act like a Bayesian (with a proper prior).
      This highly normative philosophy has given rise to numerous practical and real-life implications, for instance,
(1) Many empirical and applied Bayesians have been ridiculed and even ‘cast out’ for being at apparent variance with the official paradigm.
(2) Bayesians have at times risked having their papers rejected if their methodology is not perceived as being sufficiently coherent, even if their theory made sense in applied terms.
(3) Careers have been put at risk, e.g. in groups or departments which pride themselves on being completely Bayesian.
(4) Some Bayesians have wished to remain controlled by the concept of coherence even in situations where the choice of sampling model is open to serious debate.
(5) Many Bayesians still use Bayes factors e.g. as measures of evidence or for model comparison, simply because they are ‘coherent’, even in situations where they defy common sense (unless of course a Bayesian p-value of the type recently proposed by Baskurt and Evans is also calculated).
    Before 1986, the literature regarding the Axioms of Subjective Probability was quite confusing, and some of it was downright incorrect. When put to the question, some ‘coherent’ Bayesians slid from one axiom system to another, and others expressed confusion or could only remember the simple axioms within any particular system while conveniently forgetting the more complicated ones. However, Peter Fishburn describes the axiom systems in beautifully erudite terms, thereby enabling a frank and honest discussion.
    Fishburn refers to a binary relation for any two events A and B which are expressible as subsets A and B of the parameter space Θ. In other words, you are required to say whether your parameter θ is more likely (Fishburn uses the slightly misleading term ‘more probable’) to belong to A than to B, or whether the reverse is true, or whether you judge the two events to be equally likely. Most of the axiom systems assume that you can do this for any pair of events A and B in Θ. This is in itself a very strong assumption. For example, if Θ is finite and contains just 30 elements then you will have over a billion subsets and over billion billion pairwise comparisons to consider.
    The question then arises as to whether there exists a subjective probability distribution p defined on events in Θ which is in agreement with all your binary relations. You would certainly require your probabilities to satisfy p(A) >p(B) whenever you think that θ is more likely to belong to A than to B. It will turn out that some further, extremely strong, assumptions are needed to ensure the existence of such a probability distribution.      
    Note that when the parameter space contains a finite number k of elements, a probability distribution must, by definition, assign non-negative probabilities summing to unity to these k different elements. In 1931 De Finetti recommended attempting to justify representing your subjective information and beliefs by a probability distribution, by assuming that your binary relations satisfy a reasonably simple set of ‘coherency’ axioms.
    However, in 1959, Charles Kraft, John Pratt, and the distinguished Berkeley algebraist Abraham Seidenberg proved that De Finetti’s ‘Axioms of Coherence’ were insufficient to guarantee the existence of a probability distribution on Θ that was in agreement with your binary relations, and they instead assumed that the latter should be taken to satisfy a strong additivityproperty that is extremely horribly complicated.
    Indeed the strongly additive property requires you to contrast any m events with any other m events, and to be able to do this for all m=2,--, p. In Ch. 4 of his 1970 book Utility Theory for Decision Making, Fishburn proved that if your binary relations are non-trivial then strong additivity is a necessary and sufficient condition for the existence of an appropriate probability distribution.
    The strong additivity property is valuable in the sense that it provides a mathematical rigorous axiomatization of subjective probability when the parameter space is finite, where the axioms do not refer to probability in themselves. However, in intuitive terms, it would surely be much simpler to replace it by the more readily comprehendible assumption:
Axiom T: You can represent your prior information and beliefs about θ by assigning non-negative relative weights (to subsequently be regarded as probabilities), summing to unity, to the k elements or ‘outcomes’ in Θ, in the knowledge that you can then calculate your relative weight of any event A by summing the relative weights of the corresponding outcomes in Θ.
    I do not see why we need to contort our minds in order to justify using Axiom T. You should be just be able to look at it and decide whether you are prepared to use it or not. When viewed from an inductive scientific perspective, the strong additive property just provides us with highly elaborate window dressing. If you don’t agree with even a tiny part of it, then it doesn’t put you under any compulsion to concur with Axiom T. The so-called concept of coherence is pie in the sky! You either want to be a Bayesian (or act like one) or you don’t.
     When the parameter space Θ is infinite, probabilities can, for measure-theoretic reasons only be defined on subsets of θ, known as events, which belong to a σ-algebra e.g. an infinite Boolean algebra of subsets of θ. Any probability distribution which assigns probabilities to members of this σ-algebra must, by definition, satisfy Kolmogorov’s ‘countable additivity property’ in other words  your probabilities should add up sensibly when you are evaluating the probability of the union of any finite or infinite sequence of disjoint events. Most axiom systems require you to state the binary relationship between any two of the multitudinous events in the σ-algebra, after considering which of them is more likely

                                                                    

.
    Bruno De Finetti and Jimmie Savage (pictured above) proposed various quite complicated axiom systems while attempting to guarantee the existence of a probability distribution that would be in agreement with your infinity of infinities of binary relations. However, in his 1964 paper on quality σ-algebras, in the Annals of Mathematical Statistics, the brilliant Uruguayan mathematician Cesareo Villegas proved that a rather strong property is needed, in addition to De Finetti’s four basic axioms, in order to ensure countable additivity of your probability distribution, and he called this the monotone continuity property.
    Trivialities can be more essential than generalities, as my friend Thomas the Tank-Engine once said.
    Unfortunately, monotone continuity would appear to be more complicated in mathematical terms than Kolmogorov’s countable additivity axiom, and it would seem somewhat easier in scientific terms to simply decide whether you want to represent your subjective information or beliefs by a countably-additive probability distribution, or not. If monotone continuity is supposed to be a part of coherence, then this is the sort of coherence that is likely to glue your brain cells together.  Any blame associated with this phenomenon falls on the ways the mathematics was later misinterpreted, and we are all very indebted to Cesareo Villegas for his wonderful conclusions.
    Your backwards inductive thought processes might well suggest that most other sets of Axioms of Subjective Probability which are pulled out from under a stone would by necessity include assumptions which are similar in strength to strong additivity and monotone continuity, since they would otherwise not be able imply the existence of a countably additive probability distribution on a  continuous parameter space which is in agreement with your subjectively derived binary relations. See for example, my discussion in [8] of the five axioms described by De Groot on pages 71-76 of Optimal Statistical Decisions. Other axiom systems are brought into play by the discussants of Peter Fishburn’s remarkable paper.
    In summary, the Axioms of Subjective Probability are NOT essential ingredients of the Bayesian paradigm. They’re a torrent, rather than a mere sprinkling, of proverbial Holy Water. If we put them safely to bed, then we can broaden our wonderfully successful paradigm in order to give it even more credibility in Science, Medicine, Socio-Economics, and wherever people can benefit from it.
    Meanwhile, some of the more diehard Bayesian ‘High Priests’ have been living in ‘airy fairy’ land. They have been using the axioms in their apparent attempts to control our discipline from a narrow-minded powerbase, and we should all now make a determined effort to break free from these Romanesque constraints in our search for scientific truth and reasonably evidence-based medical and social conclusions which will benefit Society and ordinary people.
    Please see Author’s Notes (below) for an alternative view on these key issues, which has been kindly contributed by Angelika van der Linde, who has recently retired from the University of Bremen.



    Cesareo Villegas (1921-2001) worked for much of his career at the Institute of Mathematics in Montevideo before moving to Simon Fraser University where he become Emeritus Professor of Statistics after his retirement. He published eight theoretical papers in the IMS’s Annals journals, and three in JASA. He is well-known for his development of priors satisfying certain invariance properties, was actively involved with Jose Bernardo in creating the Bayesian Valencia conferences, and seems to have been an unsung hero.
    Bruno de Finetti's contributions to the Axioms of Subjective Probability were far surpassed by his (de Finetti’s) development of the key concept ‘exchangeability’, both in terms of mathematical rigor and credibility of subsequent interpretation, and it is for the latter that our Italian maestro should be longest remembered.
Bruno de Finetti
Also in 1986, Michael Goldstein published an important paper on ‘Exchangeable Prior Structures’ in JASA, where he argued that expectation (or prevision) should be the fundamental quantification of individuals’ statements of uncertainty and that inner products (or belief structures) should be the fundamental organizing structure for the collection of such statements. That’s an interesting possibility. I’m sure that Michael enjoys chewing the rag with Phil Dawid.


Here is another excerpt from the same chapter:


                                                                   



THE AXIOMS OF UTILITY: In 1989, Peter Wakker, a leading Dutch Bayesian economist on the faculty of Erasmus University in Rotterdam published his magnum opus Additive Representations of Preferences: A New Foundation of Decision Analysis.Peter for example analysed ‘Choquet expected utility’ and other models using a general trade off technique for analysing cardinal utility and based on a continuum of outcomes. In his later book Prospect Theory for Risk and Ambiguity he analyses Tversky and Kahneman’s cumulative prospect theory as yet another valid alternative to Savage’s expected utility. In 2013, Peter was awarded the prestigious Frank P. Ramsey medal by the INFORMS Decision Analysis Society for his high quality endeavours.
    Peter has recently advised me (personal communication) that modifications to Savage’s expected utility which put positive premiums on the positive components of a random monetary reward which are regarded as certain to occur, and negative premiums on those negative components which are thought to be certain to occur, are currently regarded as the state of the art e.g. as it relates to Portfolio Analysis. For an easy introduction to these ideas see Ch. 4 of my 1999 book [15].
    In an important special case, which my Statistics 775: Bayesian Decision and Control students, including the Catalonian statistician Josep Ginebra Molins and the economist Jean Deichmann, validated by a small empirical study in Madison, Wisconsin during the 1980s, a positive premium ε can be shown to satisfy ε =2φ-1 whenever a betting probability φ, which should be elicited from the investor, exceeds 0.5.
    The preceding easy-to-understand approach, which contrasts with the ramifications of Prospect theory, was axiomatized in 2007 by Alain Chateauneuf, Jürgen Eichberger and Simon Grant in their article ‘Choice under uncertainty with the best and the worst in mind: New additive capacities’ which appeared in the Journal of Economic Theory.
    While enormously complex, the new axiom system is nevertheless the most theoretically convincing alternative that I know of to the highly normative Savage axioms. Their seven mathematically expressed axioms, which refer to a preference relation on the space of monetary returns, may be referred to under the following headings:
Axiom 0: Non-trivial preferences
Axiom 1: Ordering (i.e. the preference relation is complete, reflexive and transitive)
Axiom 2: Continuity
Axiom 3: Eventwise monotonicity
Axiom 4: Binary comonotonic independence
Axiom 5: Extreme events sensitivity
Axiom 6: Null event consistency
    If you are able to understand all these axioms and wish to comply with them, then that puts you under some sort of obligation to concur with either the Expected Utility Hypothesis, or the simple modification suggested in Ch.4 of [15] or obvious extensions of this idea. Alternatively, you could just replace expected utility by whatever modification or alternative best suits your practical situation. Note that, if your alternative criterion is sensibly formulated, then it might, at least in principle, be possible to devise a similarly complex axiom system that justifies it, if you really wanted to. In many such cases, the cart has been put before the proverbial horse. It’s rather like a highly complex spiritual biblical prophecy being formulated after the event to be prophesied has actually occurred. Maybe the Isaiahs of decision theoretic axiomatics should follow in the footsteps of Maurice Allais and focus a bit more on empirical validations, and the scientific and socio-economic implications of the methodology which they are striving to self-justify by these somewhat over-left-brained, potentially Nobel prize-winning theoretical discourses.

Here are some excerpts from Chapter 5 of my academic life story
                            The Life of a Bayesian Boy (2012, amended 2013):


                                                                 




16th August 2013: Please click on TONY O' HAGAN INTERVIEWS DENNIS LINDLEY for a very historical and illuminating Youtube video, This includes an account, at age 90, by Dennis of his time-honoured 'justifications' of the Bayesian paradigm, together with his totally unmerited attitude (since 1973) towards vague priors, including Sir Harold Jeffreys' celebrated invariance priors and Dennis's own student Jose Bernardo's much respected reference priors. I, quite frankly, find most of Dennis's opinions to be at best unfortunate and at worst completely ****** up, particularly in view of the highly paradoxical nature of the Savage axioms and the virtually tautologous properties of the De Finetti axioms as appropriately strengthened by Kraft, Pratt and Seidenberg (Ann. Math. Statist., 1959) and Villegas, Ann. Math. Statist (1964) [ See Fishburn (Statistical Science, 1986) for a discussion of the very complicated strong additivity and monotone continuity assumptions that are needed to imply countable additivity of the subjective probability measure]. His views on model construction demonstrate a lack of awareness of the true nature of applied statistics. He was however relatively recently awarded the Guy Medal in Gold by the Royal Statistical Society for his contributions.

Dennis also confirms how he encouraged Florence David to leave UCL for California (he'd previously been a bit more explicit to me about this) and, quite remarkably, says that he tried to arrange the early retirement of two of his colleagues at UCL for not being sufficiently Bayesian!! This was around the time that he was influencing a downturn in my career at the University of Warwick. Dennis's account of his own early retirement does not match what actually happened. According to Adrian Smith, Dennis was encouraged to retire after a fight with the administrators over the skylight in Peter Freeman's office.




24th August 2013: Since studying the Dennis Lindley interview, I have debated the relevance of the Savage and extended De Finetti axioms with Professor Peter Wakker on the ISBA website. As a spin-off of this correspondence, I was contacted by Deborah Mayo, a Professor of Philosophy at Virginia Tech, who has proposed some counterexamples to Allan Birnbaum's 1962 justification of the Likelihood Principle via the Sufficiency Principle and Conditionality Principle. Her work may be accessed by clicking on:
http://errorstatistics.com/2013/07/26/new-version-on-the-birnbaum-argument-for-the-slp-slides-for-jsm-talk/.

I leave it to the readers to decide this controversial issue for themselves. I always thought that Birnbaum's proof was elegantly simple and completely watertight, and it would be quite amusing if I was wrong on this key issue.



26th August 2013:  I have now heard from Peter Wakker that Evans, Fraser, and Monette (Canadian Journal of Statistics, 1986) claim that the Likelihood Principle is a direct consequence of the Conditionality Principle, and that the Sufficiency Principle is not needed at all. Phew! There is clearly lots of room for further discussion. Some serious mathematical issues need to be resolved.

 

26th August 2013:  A RESOLUTION OF AN OLD CONTROVERSY

                                                                           


Michael Evans of the University of Toronto has just advised me that the proof of Birnbaum's 1962 theorem is not mathematically watertight. It should be correctly stated as follows:

Theorem: If we accept SP and accept CP, and we accept all the 'equivalences' generated jointly with these principles, then we must accept LP.

He also proves:

Theorem: If we accept CP and we accept all the equivalences generated by CP then we must accept LP.

Furthermore, it is unclear how one justifies the additional hypotheses that are required to obtain LP.  Michael believes that Deborah Mayo's counterarguments are appropriate. History has been made!
Shucks, Dennis! Where does that put the mathematical foundations of the Bayesian paradigm? Both De Finetti and Birnbaum have misled us with technically unsound proofs. I should have listened to George Barnard in 1981.
 

While Professor Mayo's ongoing campaign against LP would appear to be wild and footloose, she has certainly shaken up the Bayesian Establishment.


During my sabbatical semester in Aberystwyth, I was delighted to receive an invitation from Jose Bernardo (an outstanding practical Bayesian if ever there was one) to present a discussion paper during June 1979 at the first of the long series of Bayesian Statistics conferences to be organized by the University of Valencia.
         While I was preparing my Valencia conference paper ‘The roles of inductive modelling and coherence in Bayesian Statistics,’ my only objective was to discern scientific truth, rather than to attack the high priests of the Bayesian establishment. I however clarified in my mind that the De Finetti and Savage axiom systems, which were supposed to justify Bayesian inference and decision-making, were at best tautologous with their specific theoretical conclusions and at worst downright misleading.
         Moreover, to insist that a statistician should be ‘coherent’ and Bayesian, when choosing his sampling model in relation to the scientific background, was totally out of line, as well as completely impractical. The ‘sure thing principle’, which requires a decision-maker to maximise his average long term expected utility, is absolute bullshit. For example, most mortals need to hedge against catastrophic losses and others wish to maximise the probability of a certain monetary gain.
[See Leonard and Hsu, Bayesian Methods, 1999, Ch.4]
The first Bayesian Valencia conference, at the Hotel Las Fuentes on the Mediterranean coast between Valencia and Barcelona, was a wonderfully iconic event; I for example met Jack Good, Art Dempster, Hiro Akaike and George Box for the first time. George and I went swimming in the bay together, and Jeff Harrison and I talked about synchronicity with Jack Good on the end of the pier.
                                                                               


I therefore initially  took it as a joke when Adrian advised me that ‘I would be destroyed by the storm that hit me’. I have nevertheless felt disturbed by this warning ever since.
I was impressed by Steve Fienberg’s discussion of Jeff Harrison’s paper on Bayesian Catastrophe Theory and, while Jeff seemed to regard it as a catastrophe, I was glad to renew my friendship with Steve.
[I last met Steve at the 2002 RSS Conference at the University of Plymouth, after my early retirement, but I haven’t been particularly active in Statistics since, apart from helping John Hsu to complete Bayesian Methods in Finance with Rachev et al, and Bishop Brian of Edinburgh with his Diocesan accounts. The conference dinner was held in the great barn at Buckland Abbey on Dartmoor, and that was also the last time I talked to Peter Green. Terry Speed, who was a defence expert witness in the O.J. Simpson case, and I discussed the merits of being honest and emphasising the truth, in the context of DNA profiling. Bruce Weir, the prosecution expert witness during the O.J. Simpson trial, had screwed up on the arithmetic. The British Forensic Science Service could also learn a thing or two from Terry]
The paper that I’d prepared in Aberystwyth was well received at Valencia 1 (e.g. by Jim Dickey, Bill DuMouchel and Jay Kadane), since most of the Bayesians in the audience were also pragmatic statisticians, and I was undeterred when Dennis and a couple of heavily-axiomatized discussants made some unnecessarily angry, and rather puerile, comments regarding my views on the notion of coherence.
         While the paper, that attempted to inject more practicality into applications of the Bayesian paradigm, is seldom cited, I am advised by Mark Steel and Deborah Ashby (personal communication) that it is known to the next generation of Bayesians and has influenced their thinking.
Mark SteelDeborah Ashby

No comments:

Post a Comment