Search This Blog

Wednesday 27 April 2016

A.PHILIP DAWID: Leading traditional Bayesian statistician



                                                                     



                                                           PHIL DAWID (Wiki)

       I met Phil Dawid yesterday, for the first time in eighteen years, at a business meeting at the Royal Statistical Society in London. During the academic year 1967-68 we were both students in the Mathematics Department at Imperial College London, He was a postgraduate, and I was a, first time round, second year undergraduate.
 
      During the academic year 1970-71, Phil taught me Bayesian Statistics on the 'Advanced Masters' degree in Statistics at University College London. I was 22 and he was 23! Other students on the course included Derek Teather, Ben Torsney and Jim McNicol. The subject matter covered included
(1) The conjugate analysis for the linear model which refers to prior and posterior sample sizes (this is seldom reported in the literature)
(2) Decision trees and sequential decision making
(3) Birnbaum's 1962 justification of the Likelihood Principle
(4) Most of the important results concerning linear Bayes estimation

      I taught much of this material during my course 775 Bayesian Decisions at the University of Wisconsin-Madison during the 1980s. Many subsequently eminent statisticians, including Wing Wong, Jun Shao, Sharon Lohr, Dennis Lin, Finbarr 'Sullivan, and Joanne Wendelberger first learnt about Bayesian inference on this course, and many parallels with the theory of smoothing splines were drawn..

      Phil's subsequent career at University College London, the City University, and Cambridge, established his reputation as one of the leading traditional Bayesians of the modern era, While as innovative as Jack Good, he is as mathematically incisive as Jim Berger, and technically extremely sound. Just as one example, he will be long remembered for first proposing the simple notation which is widely employed to represent conditional independence, This facilitates the straightforward proof of lots of useful theorems,



                                                                     









Phil's co-author Mervyn Stone


Phil's co-author Jim Zidek




Excerpt: Six other students studied for the Masters degree at University College at the same time as me, including my Glaswegian friends Ben Torsney and Jim McNicol, and a very nice man from Malaysia. We all took a core measure-theoretic course on Bayesian Inference from Phil Dawid, a brilliant junior lecturer, also out of Imperial College, who was just a year older than me and more recently became a professor at Cambridge.
         I was particularly impressed by Phil’s description of Alan Birnbaum’s Likelihood Principle, its easy justification via the Sufficiency and Conditionality Principles, and the way it sorted the sheep and the goats in statistical methodology. Despite objections by George Barnard and others, I still find the proof of Birnbaum’s 1962 theorem to be extremely convincing and not at
all tautologous.
[The Neyman-Fisher factorization theorem is the key to the whole issue. It is this theorem that introduces the key concept of likelihood into statistical inference, based upon purely frequency considerations, and Birnbaum applies it to an ingeniously constructed mixed experiment to extend its influence to two simple experiments that investigate the same unknown parameter. Birnbaum has been described as one of the most profound thinkers in Statistics ever, and he was a buddy of Adrian Smith. He
was however highly introspective, and took his own life in London in 1976. I have always empathised with him, particularly because of the way he was mistreated by other leading psychometricians during the 1960s. He was seriously anti-authoritarian and there
are some parallels between our life stories]
Phil fully generalised Ericson’s method for Linear Bayes estimation, and our homework was therefore light years ahead of the literature. (He later expressed his irritation at the alternative  procedure I used to quickly derive the estimates during the final exam, though that was in a special case.).
         Phil’s parameterization, using degrees of freedom and prior sample sizes, of the conjugate analysis for the linear model with unknown variance was also superbly simple. This parameterization does not appear to have been published until 1986 when J.J. Shiau (one of Grace Wahba’s Ph.D. students) successfully applied it to partial spline models after I’d included it on my Statistics 775 course in Madison.  








In 1972 Mervyn Stone and Phil Dawid reported some ingenious marginalization ‘paradoxes’ in Biometrika which seemed to suggest that improper prior distributions do not always lead to sensible marginal posterior distributions even if the (joint) posterior distribution remains proper. In 1973, they and Jim Zidek presented a fuller version [60] of these ideas to a meeting of the Royal Statistical Society, and their presentation was extremely amusing and enjoyable with each of the three co-authors taking turns to speak.

    The purported paradoxes involve discussions between two Bayesians who use different procedures for calculating a posterior in the presence of an improper prior. I was too scared to participate in the debate that took place after the presentation of the paper, because I felt that the ‘paradoxes’ only occurred in pathological cases and that they were at best implying that ‘You should regard your improper prior as the limit of a sequence of proper priors, while ensuring that your limiting marginal posterior distributions are intuitively sensible when contrasted with the marginal posterior distributions corresponding to the proper priors in the sequence’.
    Phil Dawid asked me afterwards why I hadn’t contributed to the discussion. I grunted, and he said, ‘say no more.’
    The Dawid, Stone, Zidek ‘paradoxes’ have subsequently been heavily criticized by E.T. Jaynes [61], and by the Bayesian quantum physicist and probabilist Timothy Wallstrom [62] of the Los Alamos National Laboratory in New Mexico, who believes that, ‘the paradox does not imply inconsistency for improper priors since the argument used to justify the procedure of one of the Bayesians (in the two-way discussion) is inappropriate’, and backs up this opinion with mathematical arguments.

    Wallstrom has also made seminal contributions to Bayesian quantum statistical inference and density estimation, as discussed later in the current chapter, and to the theory of stochastic integral equations.
Timothy Wallstrom
    Other very mathematical objections to the marginalization ‘paradoxes’ are raised by the highly respected mathematical statisticians Joseph Chang and David Pollard of Yale University in their 1997 paper ‘Conditioning as disintegration’ in Statistica Neerlandica. However, according to a more recent unpublished UCL technical report, Stone, Dawid and Zidek still believe that their paradoxes are well-founded.

    Despite these potential caveats, Dennis Lindley took the opportunity during the discussion of the 1973 invited paper to public renounce the use of improper priors in Bayesian Statistics. By making this highly influential recantation, Dennis effectively negated (1) Much of the material contained in his very popular pink volume, (2) Several of the procedures introduced in his own landmark papers (3) Large sections of the time-honoured literature on inverse probability including many contributions by Augustus De Morgan and Sir Harold Jeffreys, (4) Many of Jeffreys’ celebrated invariance priors, and (5) A large amount of the work previously published by other highly accomplished Bayesians in important areas of application, including numerous members of the American School.
     Sometime before the meeting, Mervyn Stone said that ‘as editor of JRSSB he would give a hard time to authors who submitted work to him that employed improper priors’. And so he did!
STOP PRESS! On 22nd January 2014, and after a helpful correspondence with my friend and former student Michael Hamada, who works at the Los Alamos National Laboratory in New Mexico, I received the following e-mail from Dr. Timothy Wallstrom:
Dear Professor Leonard,
I was flattered to learn from Mike Hamada that I am appearing in your Personal History of Bayesian Statistics, both for the work on QSI with Richard Silver, and for my 2004 arXiv publication on the Marginalization Paradox.

I have since come to realize, however, that my 2004 paper on the Marginalization Paradox, while technically correct, misses the point. Dawid, Stone, and Zidek are absolutely correct. I realized this as I was preparing a poster for the 2006 Valencia conference, which I then hastily rewrote! Dawid and Zidek were at the conference, and I discussed these ideas at length with them there, and later with Stone via email. My new analysis was included in the Proceedings; I've attached a copy. I also presented an analysis at the 2007 MaxEnt conference in Sarasota, NY. That write-up is also in the corresponding proceedings, and is posted to the arXiv, at http://arxiv.org/abs/0708.1350 .

Thank you very much for noting my work, and I apologize for the confusion. I need to revise my 2004 arXiv post; I'm encouraged to do so now that I realize someone is reading it!

Best regards,

Tim Wallstrom

P. S. A picture of me presenting the work at the Maxent conference (which is much better than the picture you have) is at





 In 1977, Phil Dawid and Jim Dickey published a pioneering paper in JASA concerning the key issue of selective reporting. If you only report your best practical results then this will inflate their apparent practical (e.g. clinical) significance. The authors accommodated this problem by proposing some novel likelihood and Bayesian inferential procedures.



In 1981, I helped George Box to organize a special statistical year at the U.S. Army’s Mathematics Research Center (M.R.C.) at the University of Wisconsin-Madison. George’s motivation was to orchestrate a big debate about the Bayesian-frequency controversy, and concerning which philosophy to adopt when checking out the specified sampling model. If he was looking for some sort of consensus of opinion that the frequency model-testing procedures he’d proposed in his 1980 paper [67] were the way to go, then that is what he eventually achieved after some relatively courteous disagreement from the diehard Bayesians.
    With these purposes in mind, a number of Bayesian and frequentist statisticians were invited to visit M.R.C. during the year. Their lengths of stay varied between a few weeks and a whole semester. Then, in December 1981 everybody was invited back to participate in an international conference in a beautiful auditorium overlooking Lake Mendota. The conference proceedingsScientific Inference, Data Analysis, and Robustness were edited by George, myself, and Jeff C.F. Wu, but I never knew how many copies were sold by Academic Press. Maybe the proceedings fell stillborn, David Hume-style, from the press.
    The Mathematics Research Center was housed at that time on the fifth, eleventh and twelfth floors of the elegantly thin WARF building on the extreme edge of the UW campus, near Picnic Point which protrudes from the south-east shoreline of the still quite temperamental Lake Mendota. The quite gobsmacking covert activities were not altogether confined to the thirteenth floor.
The WARF Building, Madison, Wisconsin
    During the Vietnam War, M.R.C. was instead housed in Sterling Hall in the middle of the UW campus. However, on August 24, 1970, it was bombed as a protest by four young people as a protest against the University’s research connections with the U.S. Military. The bombing resulted in the death of a university physics researcher and injuries to three others.

The Bombing of Sterling Hall
    Visitors during the special year included, as I remember, Hirotugu Akaike, Don Rubin, Mike Titterington, Peter Green, Granville Tunnicliffe-Wilson, Irwin Guttman, George Barnard, Colin Mallows, Morris De Groot, Toby Mitchell, Bernie Silverman, Phil Dawid, George Barnard, and Michael Goldstein.

 


Also in 1986, Michael Goldstein published an important paper on ‘Exchangeable Prior Structures’ in JASA, where he argued that expectation (or prevision) should be the fundamental quantification of individuals’ statements of uncertainty and that inner products (or belief structures) should be the fundamental organizing structure for the collection of such statements. That’s an interesting possibility. I’m sure that Michael enjoys chewing the rag with Phil Dawid. 



A leading UCL Bayesian was very much involved in the eventually very tragic 1999 Sally Clark case, when a 35 year old solicitor was convicted of murdering her two babies, deaths which had originally been attributed to sudden infant death syndrome.
Sally Clark
    Professor Philip Dawid, an expert called by Sally Clark’s team during the appeals process pointed out that applying the same flawed method to the statistics for infant murder in England and Wales in 1996 could suggest that the probabilities of two babies in one family being murdered was 1 in 2,152,224,291 (that is 1 in 2152 billion) a sum even more outlandish than 1 in 73 million (as suggested by the highly eminent prosecution expert witness Professor Sir Roy Meadows, when the relevant probability was a 2 in 3 chance of innocence. Sir Roy was later disbarred as a medical practitioner, but was reinstated upon appeal. Sally Clark committed suicide after being released from prison following several appeals.

As the decade drew to a close, Robert Cowell, Phil Dawid, David Spiegelhalter and Steffen Lauritzen combined their grey and silver matter to co-author the outstanding book Probability Networks and Expert Systems, which earned them the prestigious 2001 De Groot prize from ISBA. Here are their chapter headings:
1.Introduction
2.Logic, Uncertainty and Probability
3.Building and using Probabilistic Networks
4.Graph Theory
5.Markov Properties of Graphs
6.Discrete Networks
7.Gaussian and Mixed Discrete Gaussian Networks
8.Discrete Multi-Stage Decision Networks
9.Learning about Probabilities
10.Checking Models against data
    The authors’ epilogue includes the standard conjugate analyses for discrete data models, a discussion of Gibbs sampling, and an information section about software on the worldwide web.
    This was one of the most exciting Bayesian books since Morris De Groot published Optimal Statistical Decisions in 1970. It is a must for all criminal investigators and forensic scientists, and a good bedtime read for all fledgling Bayesians.



   The co-authors of a 1997 R.S.S. invited paper on topics relating to the genetic evaluation of evidence included Dr. Ian Evett of the British Forensic Science Service and my nemesis Adrian Smith. They, however, chose not to substantively reply, in their formal response to the discussants, to the further searching questions which I included in my written contribution to the discussion of their paper.
    These were inspired by my numerous experiences as a defence expert witness in U.S. Courts In 1992, I’d successfully challenged an alleged probability of paternity of 99.99994% in Phillips, Wisconsin, and district attorneys in the Mid-West used to settle paternity testing cases when they heard I was coming. The 1992 ‘Rosie and the ten construction workers case’, when I refuted a prior probability of paternity of 0.5 and a related posterior probability of over 99.99% in Decorah, Iowa, seemed to turn a Forensic Statistics conference in Edinburgh in 1996 head over heels, and Phil Dawid and Julia Mortera made lots of amusing jokes about it over dinner while Ian Evett and Bruce Weir fumed in the background. I, however, always declined to participate in the gruesome rape and murder cases in Chicago.


In their discussion papers in JASA in 2000, Christopher Genovese investigated his Bayesian time-course model for functional magnetic resonance data, and Phil Dawid considered causal inference without counterfactuals, and I’m sure that he was glad to do without them.


    Phil Dawid, Julia Mortera, V. Pascal and D. Boxel reported their probabilistic expert systems for forensic evidence from DNA profiling in 2002 in the Scandinavian Journal of Statistics, another magnificent piece of work.


                                                                      




    Julia Mortera is Professor of Statistics at the University of Rome. She has published numerous high quality papers on Bayesian methods, including a number of joint papers with Phil Dawid and other authors, including Steffen Lauritzen, on Bayesian inference in forensic identification. She is one of our leading applied women Bayesians, and a very pleasant lady too. Maybe I should compose a sonnet about her. The Forensic Empress of Rome, maybe.




No comments:

Post a Comment