Search This Blog

Tuesday, 6 April 2021

BAYESIAN HISTORY 7: TOWARDS A BAYESIAN-FISHERIAN 21ST. CENTURY


A PERSONAL HISTORY OF BAYESIAN STATISTICS

 

Thomas Hoskyns Leonard

 

Retired Professor of Statistics, Universities of Wisconsin-Madison and Edinburgh

 

7. TOWARDS A BAYESIAN-FISHERIAN TWENTY-FIRST CENTURY

 

Bayes Theorem is just part of my toolkit, as the walrus said to the carpenter.

 

, by Fabio Cunha

 

The coherently Bayesian twenty-first century, that was so confidently predicted by at least one influential adherent to the faith, has not to date materialised, and thank goodness for that. Nevertheless, the Bayesian paradigm nowadays plays a key role in many important scientific and social investigations. Many investigators are much less fussy nowadays about the underlying philosophy of their methodology, and they are typically much keener to know whether it is likely to help their practical objectives when they are seeking to derive meaningful conclusions from their data sets. Pragmatic compromises are frequently employed, and validating the Bayesian procedures using frequency simulations or theoretical frequency properties is never a bad thing.

    The concepts of confounding variables, spurious correlation, clinical or practical significance versus statistical significance, and the need to consider model-based inferences in the light of uncertainty in the choice of sampling model, are better known about nowadays. It is possible for active applied statisticians to be Bayesian and somewhat Fisherian/frequentist at the same time, and many of us are exactly that without feeling at all ashamed or foolish about it.

   It is also becoming increasingly recognised that you need to combine your deductive mathematical expertise with your inductive thought processes (e.g when interpreting your data and conclusions in relation to the socio-scientific background of the data) as you gain experience as a socially relevant applied statistician. As has long been well-known, is not possible to fully encompass the subject of Statistics if you just regard it as a subset of probability calculus or a branch of Mathematics.

    During the 1960s and early 1970s, a published Bayesian paper was regarded as a rare gem indeed. However many thousands of Bayesian orientated articles are now published every year. It is therefore virtually impossible to keep track of them, and it is becoming more and more difficult to ascribe primary source originality to any particular piece of theoretical methodology. Nevertheless, I believe that we should still try. Many authors are still, as always, re-inventing the wheel, particularly those who are in desperate need of a meal ticket.

    Bayesian methodology has now become more and more computer package orientated.

 
 

[Author’s Notes: My main sources of information for this chapter are the back issues of my journals, anything I have been able to access from the Internet, my e-mail conversations with several other Bayesians, and several recent meetings of the Edinburgh section of the Royal Statistical Society. I have lived isolated from academia in my flat in central Edinburgh ever since I retired in 2001. One objective in this chapter is to highlight the huge number and diversity of Bayesians who have contributed to the cause. Almost all of the references are encapsulated in the text. My main conclusion, after reviewing the Bayesian literature between 2001 and 2013, is that while our theory hasn’t developed that much over these twelve years, the diversity and socio-scientific impact of our applied work is now much more enormous than I would have ever envisioned.]

 

HEALTH WARNING: Socio-scientific workers using computer packages should try very hard to understand what they’re putting in, and how it relates to what they’re getting out. It’s very dangerous to use a computer routine unless you have a good working understanding of the, at times highly sophisticated, underlying theoretical material. You could just end up with junk. Garbage in, garbage out, as they say. Silly prior parameter values or silly data or silly models lead to silly posteriors. You should also be concerned with the numerical accuracy of the routines as well as evaluating the adequacy of any convergence criteria.

 
 
In the aftermath of Dennis Lindley’s 2013 YouTube video interview by Tony O’Hagan, it became abundantly apparent to me, following detailed correspondence with Peter Wakker, Deborah Mayo and Michael Evans, that Bayesian Statistics can no longer be justifiably motivated by the Axioms of Subjective Probability, or the Axioms of Utility and Modified Utility, This puts our paradigm into a totally different, and quite refreshing, ballpark. Maybe axiomatized coherence, incoherence and sure-loser principles will soon be relics, strange artefacts of the past. 
 
How many fingers?
 

The wide diversity of novel Bayesian theoretical methods developed and practical situations analysed during the current century are well-represented in the eleven published papers which have received the Mitchell prize (awarded by ISBA) since the turn of the century. They include:

2002 (Genetics): Jonathan Pritchard, Matthew Stevens, and Peter Donnelly

Inference of Population Structure using Multilocus Genotype Data

2003 (Medicine) Jeff Morris, Marina Vannucci, Phil Brown, and Ray Carroll

Wavelet-Based Nonparametric Modeling of Hierarchical Function in Colon Carcinogenesis

2006 (Biology) Ben Redelings and Marc Suchard

Joint Bayesian Estimation of alignment and phylogeny

2007 (Sociology) Tian Zheng, Matthew Salganik, and Andrew Gelman

How many people do you know in prison? Using overdispersion in count data to estimate social structure in networks.

2010 (Astronomy) Ian Vernon, Michael Goldstein, and Richard Bower

Galaxy Formation: A Bayesian Uncertainty Analysis

 

The Mitchell Prize is named after Toby J. Mitchell, and it was established by ISBA after his tragic death from leukaemia in 1993. He would have been glad to have inspired so much brilliantly useful research.

    Toby was a dedicated Bayesian, and he is well known for his philosophy

 

The greater the amount of information, the less you actually know.

 

    Toby was a Senior Research Staff Member at Oak Ridge National Laboratory in Tennessee. He won the George Snedecor Award in 1978 (with Bruce Turnbull) and made incisive contributions to Statistics, especially in biometry and engineering applications. He was a magnificent collaborator and a very insightful scientist.

    Norman Draper took one of the last pictures of Toby Mitchell, and he has kindly sent it to me.

 

Tom's colleague Toby J. Mitchell (d 1993). Tom and Toby taught Statistics courses at US Military Bases and the
 space station in Huntsville Alabama, together, and visited Washington D.C. from Adelphi, Maryland in 1980.

 

See Author’s Notes (below) for details of several of the Ph.D. theses whose authors have been awarded ISBA’s one of Savage Prizes during the current century. We have clearly been blessed with some talented up-and-coming Bayesian researchers.

    In the remainder of Ch. 7, I will critique some of the Bayesian literature in greater detail. This is to give readers a better understanding of the quality and relevance of 21 st. century research and of the degree of competence, applied nous, and originality of our researchers.

 
George Pickett
 

In 1863, Major General George Pickett was one of three generals who led the assault under Longstreet at the Battle of Gettysburg, as the Confederate Army charged towards the Unionist lines in all its Napoleonesque glory. In the year 2000, Dennis Lindley came out of retirement at age 77 to confront a full meeting of the Royal Statistical Society with his magnificently written invited discussion paper ‘The Philosophy of Statistics’. While Pickett’s charge ended in utter disaster for the Dixies, Dennis would, in 2002, be awarded the Society’s Guy Medal in Gold for his life-time of endeavours, having retired from UCL a quarter of a century earlier.

    Here is a brief précis of the ideas and opinions expressed in the first seven sections of Dennis’s paper:

Summary: Statistics is the study of uncertainty. Statistical inference should be based on probability alone. Progress is therefore dependent upon the construction of a probability model.

1. Introduction: Uncertainty can only be measured by probability. The likelihood principle follows from the basic role played by probability. I (Dennis) modify my previous opinions by saying that more emphasis should be placed on model construction than on formal inference. Formal inference is a systematic procedure within the calculus of probability. Model construction cannot be so systematic.

2. Statistics: Statisticians should be the experts at handling uncertainty. We do not study the mechanism of rain, only whether it will rain. We are, as practitioners, therefore dependant on others. We will suffer, even as theoreticians if we remain too divorced from the science. We define ‘the client’ as the person e.g. scientist or lawyer, who encounters uncertainty in their field of study. Statistics is ordinarily associated with (numerical) data. It is the link between uncertainty, or variability in the data, and that in the topic itself that has occupied statisticians. The passage from process (sampling model) to data is clear. It is when we attempt to go from data to process that difficulties occur.

3. Uncertainty: It is only by associating numbers with any scientific concept that the concept can be understood. Statisticians consequently need to measure uncertainty by numbers, in a way that they can combine them. It is proposed to measure your uncertainty with an event happening by comparison with a standard e.g. relating to balls drawn from an urn.

4. Uncertainty and probability: The addition and product rules, together with the convexity rule always lies in the convex unit interval, are the defining rules of probability, at least for a finite number of events. The conclusion is that the measurements of uncertainty can be described by the calculus of probability. The uncertainty related to bets placed on gambles, the use of odds, combined with the impossibility of a Dutch book, leads back to probability as before. A further approach due to De Finetti extracts a penalty score from you after you have stated your uncertainty of an event, which depends on whether the event is subsequently shown to be true or false. This [frequency!] approach provides an empirical check on the quality of your probability assessments and hence on a test of your abilities as a statistician. The gathering and publishing of data remains an essential part of statistics today.

5. Probability: Measurements of uncertainty must obey the rules of probability calculus. This is intended to be self-evident and that you would feel foolish if you were to be caught violating it. Axioms (which refer to a ‘more likely’ binary relation) all lead to probability being the only satisfactory explanation of uncertainty, though some writers have considered the axioms carefully and produced objection.

    [Lindley does not cite the 1986 paper by Fishburn or the key source references quoted by Fishburn].

    It is therefore convenient to state formally the rules of the [conditional] probability calculus. They are Rule 1 (convexity), Rule 2 (addition; Cromwell’s rule), and Rule 3 (multiplication). It is easy to extend the addition rule, for two events, to a finite number of events. I prefer to use De Finetti’s Rule 4 (conglomerity) to justify addition for an infinite sequence of events. Conglomerity is in the spirit of a class of rules known as ‘sure things’. It is ‘easy to verify’ that the [horribly complicated] Rule 4 follows from rules 1-3 when the partition is finite. [Is it?]

6. Significance and Confidence: Statisticians DO use measures of uncertainty that do NOT combine according to the rules of probability calculus. Let H denote a (null) hypothesis. Then a statistician may advise the client to use a significance level [Dennis means p-value or significance probability] that is, assuming that H is true, the probability of the observed, or more extreme, data is calculated. This usage flies in the face of the arguments above which assert that uncertainty about H needs to be measured as a probability of H. This is an example of the prosecutor’s fallacy.

    If a parameter θ is uncertain, a statistician will typically recommend a confidence interval. The development above based on measured uncertainty will use a probability density for θ, and perhaps an interval of that density. Again we have a contrast similar to the prosecutor’s fallacy.

    Statisticians have paid inadequate attention to the relationships between the statements that they make and the sample sizes on which they are based.

    How do you combine several data sets concerning the same hypothesis, each with its own significance level?

    The conclusion that probability is the only measure of uncertainty is not just a pat on the back but strikes at many of the basic statistical activities.

7. Inference: Let the parameter be (θ, α) where α is a nuisance, and let x be the observation. The uncertainty in x needs to be described probabilistically. Let p(x l θ, α) denote the probability of x given θ and α, and describe the uncertainty in the parameter by a probability p (θ, α). The revised uncertainty in the light of the data can then be evaluated from the probability calculus by the probability p (θ, α I x), the constant of proportionality dependent upon x, not the parameters. Since α is not of interest it can be eliminated by the probability calculus to give p (θ l x), which evaluates the revised uncertainty in θ. This unfortunate terminology is accompanied by some other which is even worse; p (θ) is often called the prior distribution, p (θ l x) the posterior.

    The main protest against the Bayesian position is that the inference is considered within the probability calculus.

    The remaining sections of Lindley’s paper, which contain further quite influential content, were entitled

8. Subjectivity 9.Models 10. Data Analysis 11. Models again 12. Optimality 13. The likelihood principle 14. Frequentist concepts 15. Decision Analysis 16. Likelihood principle (again) 17. Risk 18. Science 19. Criminal Law. 20. Conclusions

    During his conclusions, Dennis states,

The philosophy here has three fundamental tenets; first, that uncertainties should be described by probabilities; second, that consequences should have their merits described by utilities; third, that the  optimal decision combines the probabilities; third that the optimum decision procedures combines the probabilities and utilities by calculating expected utility and then maximising that.

    These tenets were of course adopted by Raiffa and Schlaifer as long ago as 1961, and embraced with open arms by Dennis since about 1955. Dennis’s ideas on utility refer to Leonard ‘Jimmie’ Savage’s 1954 balls-up of an entreaty The Foundations of Statistics.

    Dennis’s suggestions on modeling pay lip service to Box’s celebrated 1980 paper [67].  See also my 1979 Valencia 1 contribution [16] where I emphasise the necessity for the statistician to interact with the client in relation to the scientific background of the data. The discussion of criminal law in Dennis’s section 19 quite surprisingly and unequivocally advocates the Bayes factor approach described in 1991 in Colin Aitken’s and D.A. Stoney’s book The Use of Statistics in Forensic Science.

    During the discussion of Dennis’s paper the issue was raised as to whether he was addressing ‘statistical probability’ as opposed to real applied statistics. It would have certainly been good to see some more numbers. While of historical significance, Dennis’s treatise does not, as might have been instead anticipated, spell out a new and novel position for the twenty-first century, but rather encouraged us to refer to past long vamped philosophies.

    In the same year as Lindley’s historical presentation to the Royal Statistical Society, Fernando Quintana, one of our happy Bayesians from Chile, and Michael Newton wrote a much more important paper in JASA which concerned computational aspects of semi-parametric Bayesian analysis together with applications to the modeling of multiple binary sequences. The contrasts, in terms of statistical quality and scientific impact, with Lindley’s paper could not have been more enormous.

 
Fernando Quintana
 

Two interesting applications in archaeology were published during the year 2000.

    Yanan Fan and Stephen Brookes, then working at the University of Bristol, published a fascinating analysis in the Statistician of sets of prehistoric corbelled tomb data which were collected from a variety of sites around Europe. They investigated how earlier analyses of tomb data, e.g. by Caitlin Buck and her co-workers, where structural changes were anticipated in the shape of the tomb at various depths, could be extended and improved by considering a wider change of models. The authors also investigated the extent to which these analyses may be useful in addressing questions concerning the origin of tomb building technologies, particularly in distinguishing between corbelled domes built by different civilisations, as well as the processes involved in their construction.

    Fan and Brookes found no evidence to dispute or support a previous claim that a β slope parameter could be used to distinguish between domes of different origins through a comparison of their shape. By considering a range of sampling models, they showed that previous analyses may have been misleading in this regard.

    The authors analysed radius data taken above the lintel of the tomb, firstly by considering the Cavanagh-Laxton formulation which takes the log-radiuses to depend upon corresponding depths in a three parameter non-linear regression model with i.i.d. normal error terms. They also investigated change-point modifications to the Cavanagh-Laxton model which permitted the two parameters within its regression component to change at different depths.

    The authors were able to avoid detailed algebraic prior to posterior analyses, by immediately referring to MCMC and its generalisations. This enabled them to apply a complicated version of acceptance sampling which calculated the posterior probabilities for several choices of their sampling models; their posterior inferences were doubtlessly quite sensitive to the prior assumptions within any particular model.

    Fan and Brookes reported their conditional posterior inferences for the four top models for their Stylos data set, and their posterior model probabilities and conditional posterior inferences for the Nuranghe, Achladia and Dimini data.

    This was a fascinating Bayesian analysis which, however, deserves further practical investigation, e.g. using maximum likelihood and AIC. It is important not to be blinded by the Metropolis-Hastings algorithm to the point where it could snow over and blur both conceptual inadequacies and your statistical conclusions, or encourage you into employing over-complex sampling models.

    Caitlin Buck and Sujit Sahu of the Universities of Cardiff and Southampton were a bit cagier when they were analysing a couple of two-way contingency tables, which relate to refuse mounds in Awatowi, Arizona, and to Mesolithic stone tools from six different sites in Southern England. One of their objectives in their paper in Applied Statistics was to give insights about the relative dates of deposition of archaeological artifacts even when stratigraphic information is unavailable.

    The first of the contingency tables cross-classifies proportions of five different pottery types against layer. The second table cross-classifies seven different types of microlith against site. Both tables would have benefited from a preliminary log-linear interaction analysis since this would have helped the authors to assess the strength of information, and practical significance, in the data, before disguising it within a specialised choice of model.

    The authors instead referred to an extension of a Robinson-Kendall (RK) model which imposes serial orderings on the cell probabilities, and to a hierarchical Bayesian model for seriation. They completed their posterior computations using a hybrid Monte Carlo method based on Langevin diffusion, and compared different models by reference to posterior predictive densities. They justified their model comparison procedure by reference to an entropy-like divergence measure.

    The authors showed that the Awatoni data is better modelled using the RK model, but that both models give the same ordering d e f g h I j of the archaeological layers.

    For the stone tool data, the extended RK model and a hierarchical canonical correlation model give quite different orderings of the six sites. The co-authors surmise, after some deliberation, that the RK model gives the better fit.

    Buck and Sahu thereby concluded that the point estimates of the relative archaeological orders provided by erstwhile non-Bayesian analyses fail to capture other orderings which, given that the data are inherently noisy, could be considered appropriate seriations. A most impressive applied Bayesian analysis, and set of conclusions! The authors were extremely careful when choosing the particular Bayesian techniques which were best suited to their own situation. They, for example, steered clear of the Bayes factor trap and the related super-sensitive posterior probabilities for candidate sampling models.

    Caitlin Buck is Professor of Statistics at the University of Sheffield. She has published widely in the areas of archaeology, palaeo-environmental sciences, and scientific dating, and she is a leader in her field.

 
Caitlin Buck
 
In his JASA vignette ‘Bayesian Analysis : A Look at Today and Thoughts of Tomorrow’, Jim Berger catalogues successful Bayesian applications in the areas of archaeology, atmospheric sciences, economics and econometric , education, epidemiology, engineering, genetics, hydrology, law, measurement and assay, medicine, physical sciences, quality management, and social sciences.
 
Jim Berger rolling out the barrel
 

    Jim also provides the reader with an excellent reference list for Bayesian developments in twenty different areas, most controversially ‘causality’, but also including graphical models and Bayesian networks, non-parametrics and function estimation, and testing, model selection, and variable selection.

    Jim agrees with many statisticians that subjective probability is the soul of Bayesian Statistics. In many problems, use of subjective prior information is clearly essential, and in others it is readily available; use of subjective Bayesian analysis for such problems can provide dramatic gains.

    Jim also discusses the non-objectivity of ‘objective’ Bayesian analysis, together with robust Bayesian, frequentist Bayes, and quasi-Bayesian analysis. I guess I can’t see too much difference between his objective Bayes analysis and quasi-Bayes analysis since both refer to vague priors.

    Jim Berger concludes with his reflections on Bayesian computation and software.

    The ASA’s vignettes for the year 2000 (see the December 2000 issue of JASA) also include very informative items by other authors on statistical decision theory (Larry Brown), MCMC (Olivier Cappé and Christian Robert), Gibbs sampling (Alan Gelfand), the variable selection problem (Ed George), hierarchical models (James Hobert), and hypothesis testing and Bayes factors (John Marden), a wonderful resource.

 
 

In 2000, Ehsan Soofi and his three co-authors published a paper on maximum entropy Dirichlet modeling of consumer choice, in the Proceedings of Bayesian Statistical Sciences.

    Ehsan Soofi is Distinguished Professor of Business Statistics at UW Milwaukee. His research focuses on developing statistical information measures and showing their use in economic and business applications.

    Ehsan once encouraged me to purchase a new tie while we were attending a Bayesian seminar together.

    In the same year, Mark Berliner, Christopher Wikle and Noel Cressie co-authored a fascinating paper ‘Long-lead prediction of Pacific SST’s via Bayesian dynamic models’ in the Journal of Climate.

 

The following important Bayesian papers were published in Royal Statistical Society journals during the year 2000:

    A Bayesian Lifetime Model for the ‘Not 100’ Billboard Songs (Eric Bradlow and Peter Fader)

    Spatiotemporal Hierarchical Bayesian Modeling: Tropical Ocean Surface Winds (Christopher Wikle et al)

    Bayesian Wavelet Regression on Curves with Application to a Spectroscopic Calibration Problem (Philip Brown, Tom Fearn, and Marina Vannucci)

     Empirical Bayes Approach to Improve Wavelet Thresholding for Image Noise Reduction (Maarten Jensen and Adhemar Bultheel)

    Real-Parameter Evolutionary Monte Carlo with Applications to Bayesian Mixture Models (Faming Liang and Felix Martini)

    A Bayesian Time-Course Model for Functional Magnetic Resonance Data (Christopher Genovese, with discussion)

    Bayesian Regression Modeling with Interaction and Smooth Effects (Paul Gustafson)

    Efficient Bayesian Inference for Dynamic Mixture Models (Richard Gerlach, Chris Carter and Robert Kohn)

    Quantifying expert opinion in the UK water industry: an experimental study (Paul Garthwaite and Tony O’Hagan)

    The diversity of the authors, their subject matter, and their countries of origin, is most impressive. The ever-evolving Bayesian spatio-temporal process rolls on!

 

In their discussion papers in JASA in 2000, Christopher Genovese investigated his Bayesian time-course model for functional magnetic resonance data, and Phil Dawid considered causal inference without counterfactuals, and I’m sure that he was glad to do without them.

    Also in JASA in 2000, Balgobin Nandram, Joe Sedransk and Linda Williams Pickle described their Bayesian Analysis for chronic obstructive pulmonary disease, and David Dunson and Haibo Zhou proposed a Bayesian model for fucundability and sterility. Way to go, guys!

 

In 2000, Orestis Papasouliotis described a Bayesian spatio-temporal model in the fifth chapter of his University of Edinburgh Ph.D. thesis. Orestis used this to develop a search procedure for discovering efficient pairs of injector and producer oilwells in hydrocarbon reservoir where the geological rifts may create a propensity for long-term correlation.

    Orestis’ endeavours led to our two international patents with Ian Main FRSE, Professor of Seismology and Rock Physics in our Department of Geology and Geophysics. Ian also enjoys playing the guitar, and singing in sessions in Edinburgh’s Old Town.

 

Ian Main FRSE

 

    Orestis and I co-authored four papers with Ian Main and his colleagues in the Geophysics literature, in 1999, 2001, 2006 and 2007, the first two on topics relating to the activity of earthquakes. So I guess we turned Ian into a Bayesian seismologist and geophysicist. He is currently developing  a fully Bayesian method for earthquake hazard calculation with Richard Chandler of UCL.

    The fourth of these articles was published in Structurally Complex Reservoirs by the Geological Society of London. Kes Heffer, of the Institute of Petroleum Engineering at Heriot-Watt was one of the co-authors of our 2006 and 2007 publications. He was formerly a leading researcher with BP.

    Nowadays, Orestis Papasouliotis is a very accomplished M&S scientist with Merck Serono Pharmaceuticals in Geneva. He visited me again in Edinburgh this year, and he hopes to fly over again soon with his wife and daughter to visit the penguins in the zoo.

 

Orestis Papasouliotis, with his wife and daughter

 

By the inception of Anno Domini 2001, a fresh generation of Bayesians was already moving way ahead of the quasi-religious practices and cults of the past, as they rose to help meet the social challenges of our economically volatile, and medically and genetically challenged era.

    The quality of the new era material was re-emphasised as the century turned. In their paper in the Journal of Computational Biology, Michael Newton, Christina Kendziorski, Craig Richmond, Frederick Blattner and Kam-Wah Tsui reported their improved statistical inferences concerning gene expressions from micro-assay data. An inspired effort from five free spirits of Bayesian persuasion.

 
Michael Newton
 

    The investigators modelled intensity levels R and approximate target values G by independent Poisson variates, conditionally on different means and the same coefficient of variation c. Then the ratio of the means ρ is the parameter of interest. They take a hierarchical approach where the conditional means are assumed to be independent and identically Gamma distributed. The marginal posterior distribution of ρ, given R and G, is then the distribution of the ratio of two updated Gamma variates. Moreover, the posterior mean of ρ may be described as a differential expression of the form

 

(R+ν) / (G+ν)

 

Where ν reflects three expressions of interest, including the prior predictive mean of R.

    Michael Newton and his co-authors estimate c and the two prior parameters pragmatically, using marginal likelihood, and then apply their methodology to heat-shock and E-coli data. They furthermore refer to some simple model-checking techniques.

    A wonderfully succinct analysis which did not require any simulations at all, since exact algebraic expressions were available using probability calculus. This sort of analysis is hopefully not becoming a lost art.


In their 2001 paper in Biometrics, James Albert and Sid Chib applied their sequential ordinal modeling methodology to the analysis of survival data.

 
Jim Albert
 

    In 2002, Sid co-authored a paper in the Journal of Econometrics with Federico Nardari and Neil Shephard which highlighted even more MCMC procedures for stochastic volatility models.

    Siddhartha Chib is the Harry C. Hartford Professor of Econometrics at Washington University in St. Louis, and has a fine history of applying Bayesian methods, e.g. to Economics, very much in the Arnold Zellner tradition. In 2008, he and Edward Greenberg published a review of hierarchical Bayes modeling in the New Palgrave Dictionary of Economics.

   Formerly a don at Cambridge, Neil Shephard is now Professor of Statistics and Economics at Harvard University. His work is occasionally Bayesian, though he does sometimes experience technical difficulties when constraining his unknown covariance matrices to the interior of the parameter space.

    In 2001, Gareth Roberts co-authored a paper with Jesper Møller and Antonietta Mira in JRSSB about perfect slice samplers. I hope he spiced them with chunky marmalade.

    In 2004, Gareth Roberts, Omiros Papaspiliopoulos and Petros Dellaportas wrote another famous paper in JRSSB, this time about Bayesian Inference for non-Gaussian Ornstein-Uhlenbeck stochastic volatility processes. These processes are likely to handle stochastic volatility pretty well because of their elasticity properties. I developed a log-Gaussian doubly stochastic Poisson version of them in 1978 in [64] and applied it to the flashing Green Man pedestrian crossing data.

    Gareth Roberts F.R.S, is distinguished for his work spanning Applied Probability, Bayesian Statistics and Computational Statistics. He has made fundamental contributions to crucial convergence, stability theory, extensions to the Metropolis-Hastings algorithm and adaptive MCMC, infinite dimensional simulation problems, and inference in stochastic processes. His work has found application in the study of epidemics such as Avian influenza and foot and mouth disease.      

    As Professor of Statistics and Director of CRiSM at the University of Warwick, Gareth is one of the most distinguished Bayesians to have graced that now world-renowned institution. He obtained his Ph.D. there in 1988 on the topic ‘Some boundary hitting problems for diffusion processes’ and has taken the Bayesian paradigm to fresh heights in the quarter of a century since.

    Petros Dellaportas is Professor of Statistics and Economics at the University of Athens. He has developed and computed Bayesian solutions for a variety of complex problems, including hierarchical, GARCH, and generalised linear models.

    Omiros Papaspiliopoulos was awarded his Ph.D. in 2003 at the University of Lancaster. His thesis topic was ‘Non-centred parameterizations for hierarchical models and data augmentation’.

    Formerly an Assistant Professor in Statistics at the University of Warwick, Omiros is currently ICREA Research Professor in the Department of Economics at Universitat Pompeu Fabria.

    Omiros received the Guy Medal in Bronze from the Royal Statistical Society in 2010. He is one of our up and coming young stars, with a name to fit.


Also in 2001, Chris Glasbey and Kanti Mardia presented their seminal, effectively Bayesian paper ‘A penalised likelihood approach to image warping’ to a full meeting of the Royal Statistical Society. The authors achieved new frontiers in Image Analysis by identifying a new Fourier-Von Mises model with phase differences between Fourier-transformed images having Von Mises distributions. They used their a posteriori smoothing procedures to (a) register a remote-sensed image with a map (b) to align microscope images from different optics, and (c) to discriminate between different images of fish from photographic images. Even Chris Glasbey was impressed.

 
 
Chris Glasbey FRSE Kanti Mardia
 

    Doubtlessly one of Britain’s and India’s most brilliantly productive mathematical statisticians, Kanti Mardia has made a number of important Bayesian and effectively Bayesian contributions. After a pre-eminent career, he is currently taking time out as Senior Research Fellow in the Department of Mathematics at the University of Leeds. While other leading statisticians are better at self-promotion, Kanti deserves all the accolades that our profession can give him.


Kevin Patrick Murphy was an Associate Professor in Computer Science and Statistics at the University of British Columbia until 2012, but now works as a research scientist for Google. In 2001 he published the three somewhat Bayesian papers,

    Linear Time Inference in Hierarchical HMMs (with Mark Paskin) in Neural Info. Proc. Systems

    The Factored Frontier Algorithm for Approximate Inference in DBNs (with Yair Weiss) Uncertainty in Artificial Intelligence

     Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks (with Stuart Russell). In Sequential Monte Carlo Methods in Practice (Springer-Verlag)

Kevin Murphy has published extensively in A.I., Machine Intelligence, Bayesian Statistics, and probabilistic graphical models, with applications to information extraction, machine reading, knowledge-based construction, computer vision and computational biology.

    While Murphy’s remarkable 2012 book Machine Learning a Probabilistic Perspective is more Bayesian than some approaches to machine intelligence, he is more concerned about frequency properties than most other Bayesians working in this area. Good for him!

    Radford Neal, another Canadian Bayesian Statistician and Machine Intelligence expert, reported his work on annealed importance sampling in 2001 in Statistics and Computing. This is one of several very useful papers which Radford has published on Bayesian simulation.

 
Radford Neal
 

In their JASA papers in 2001, William Bolstad and Samuel Manda described their Bayesian investigation of child mortality in Malawi which referred to family and community random effects, our doughty Durham friends Peter Craig, Michael Goldstein, Jonathan Rougier and Allan Seheult told us all about their Bayesian forecasting for complex systems, Peter Westfall and Keith Soper used their priors to improve animal carcinogeniety tests, and three musketeers from the Orient, namely Hoon Kim, Dongchu Sun and Robert Tsutakawa followed in the footsteps of Hickman and Miller by proposing a bivariate Bayesian method for estimating mortality rates with a conditional autoregressive model.


Enrique González and Josep Ginebra Molins of the University Polytechnique of Catalonia published their book Bayesian Heuristics for Multi-Period Control in 2001, after working on lots of similarly impressive Bayesian research in Barcelona.

 
Josep Ginbra-Molins
 

    In that same year, Aaron Ellison’s book An Introduction to Bayesian Inference for Ecological Research and Environmental Decision Making was published on-line on JSTOR.

    Moreover, Ludwig Fahrmeir and Stefan Lang reported their Bayesian semi-parametric analysis of multi-categorical time-space data in the Annals of the Institute of Mathematical Statistics. They applied their methodology most effectively to the analysis of monthly unemployment data from the German Federal Employment Office, and they reported some of their results spatially as well as temporally on maps of Germany.

    Fahrmeir and Lang also reported their Bayesian inferences for generalised additive mixed models based on Markov random field priors in Applied Statistics. They use discretized versions of prior processes which could alternatively be representable via the mean value functions and covariance kernels of Gaussian processes, and may, if necessary, incorporate spatial covariates. Their prior to posterior analysis is completed by MCMC inference.

    The authors apply their methodology to forest damage and to duration of employment data. While their theory is fairly routine and unembellished, the applications are interesting.

    In the same issue of Applied Statistics, David Dunson and Gregg Dinse of the U.S. National Institute of Environmental Health Sciences in Research Triangle Park report their Bayesian incident analysis of tumorigenicity data. In most animal carcinogenicity experiments, tumours are not observable in live animals, and censoring of the tumour onset times is informative. Dunson and Dinse focus on the incidence of tumours and censored onset times without restricting tumour lethality, relying on cause-of-death data, or requiring interim sacrifices. 

    The authors’ sampling model for the four observable outcomes at each death time combine multistate stochastic, probit, and latent variables assumptions, and also model covariate effects. Their prior distributions are elicited from experts in the subject area and refer also to a meta-analysis which employs a random effects model. These complex assumptions are applied to a triphosphate study, yielding some interesting posterior inferences via Gibbs sampling.

    This is a top-quality, though highly subjective, investigation which does not refer to model comparison criteria. The method adjusts for animal survival and tumour lethality through a multi-state model of tumorigenesis and death. If I had refereed this paper then I would have advised the authors to check out their assumptions and conclusions a bit more in empirical terms, just in case they became the state of the art.

 
    David Dunson is, like Jim Berger, currently an Arts and Sciences Distinguished Professor in the Department of Statistical Science at Duke University. His Bayesian research interests focus on complex medical data sets and machine learning applications, and include image and shape analysis.
 
David Dunson
 
    I once met Gregg Dinse in Wisconsin. An extremely charming man, he is good at interacting with subject matter experts, and is also a very prolific applied researcher.
 
Greg Dinse
 

    Murray Aitkin’s article ‘Likelihood and Bayesian Analysis of Mixtures’ was published in Statistical Modeling in 2001. Murray is nowadays Professorial Fellow in the Department of Statistics at the University of Melbourne. He has published many highly innovative papers in Bayesian areas.


In their path-breaking 2002 invited paper to the Royal Statistical Society, David Spiegelhalter, Nicky Best, Brad Carlin and Angelika van der Linde proposed a Bayesian measure of model complexity and fit called DIC (the Deviance Information Criterion) that facilitates the comparison of various choices of sampling model for a specified data set.

 
Bradley Carlin
 
    Let L(θ) denote the log-likelihood for your  px1 vector θ  of parameters when a particular sampling model, with p parameters, is assumed to be true, and consider the deviance,
 
 D(θ) =-2L (θ) +C
 

where C is an arbitrary constant which cancels out in the calculations. Let ED denote the posterior expectation of the deviance, subject to your choice of prior assumptions for θ. Then ED measures how well the model under consideration fits the data.

    The effective number of parameters is, by definition

 
 q=ED-D(ξ),
 

Where ξ is some convenient-to-calculate Bayes estimate for θ, such as the posterior mean vector, if it exists, or, much more preferably the vector of posterior medians (which give invariance of q under non-linear transformations of the parameters). Spiegelhalter et al quite brilliantly suggest referring to

 

                                DIC = ED+q

                                       =D(ξ)+2q

which penalises ED, or D(ξ), according to the number of effective parameters in the model.

    At the model comparison stage of your analysis, you could consider using a vague, e.g, Jeffreys or reference prior for θ, and only referring to your informative prior assumptions when considering your model-based inferences, since their elicitation and application might only serve to confuse this pre-inferential procedure. If you are considering several candidate models, then simply choose the one with the smallest DIC, maybe after cleaning-up the data using an exploratory data analysis. If q is close to p, and ξ is close to the maximum likelihood vector of θ ,then DIC will be well-approximated by Akaike’s criterion AIC. However, when considering models with complex likelihoods, DIC will sometimes be easier to calculate e.g. using Importance Sampling, MCMC or acceptance sampling.

    Spiegelhalter et al justify DIC via convincing, though approximate, information theoretic arguments which refer to Kullback-Liebler divergence. Similar arguments would appear to hold when ED is replaced by the posterior median of the deviance; this parallels taking ξ to denote the posterior median vector of θ.

    Since reference priors maximise Lindley’s expected measure of information it would appear natural, in principle at least, to use them when calculating DIC. However, it is often easier to express Jeffreys’ invariance prior in algebraic terms, or to use some other sensible form of vague prior.

    The Deviance Information Criterion has greatly enhanced the Bayesian paradigm during the course of the last decade. It has taken Bayesians into the ‘unconstrained-by-restrictive-sampling-models’ ballpark envisioned by George Box in 1980, and enables us, including the Economists, to determine scientifically meaningful, parameter-parsimonious sampling models, e.g. as special cases of a larger all-inclusive model, which are by no means as big as an elephant.

    An inferential model comparison procedure which refers to the full posterior distributions of the deviances is discussed below in Author’s Notes. In connection with this, it might be possible to develop alternatives, also relating to cross-validation, to Box’s 1980 overall modeling checking criterion, which he believed to be appropriate for situations where you have no specific alternative model in mind.

    The deviance information criterion DIC provides Bayesians with a particularly useful and straightforward way of applying Occam’s razor. See Jeffreys and Berger [1] for a historical discussion of the razor. But Jeffreys and Berger quote the thirteenth century English Franciscan friar, logician, physicist and theologian William of Ockham, who first envisioned the philosophies of the razor,

 

‘Pluralitas non est ponenda sine necessitate’

(Plurality is not to be fixed without necessity)

 

    I wonder whether the Bayesian Goddess has already elevated the four discovers of DIC to immortality, as the Four Horsemen of Ockham’s Apocalypse perchance.

    But perhaps the Goddess of Statistics should wait and see. DIC is not applicable to all models, as demonstrated by Sylvia Richardson and Christian Robert in the discussion of the 2002 paper, and by Angelika van der Linde in her 2005 paper on DIC in variable selection, in Statistica Neerlandica.

    Angelika puts DIC into context and points to some possible alternatives.  One problem is that we do not, as yet, have an established and elaborate theory for the estimation of information theoretic quantities like Kullback-Liebler divergence. In their 2008 paper in Statistics, Van der Linde and Tutz illustrate this problem for the coefficient of variation for regression models, and this can also be reasonably related to Kullback-Liebler diagnostics.

    Gerhard Tutz is a Professor of Statistics at the University of Munich. He has published many high quality papers in Bayes-related areas.

 
Gerhard Tutz
 

In his 2002 paper ‘On irrelevance of alternatives and opinion pooling’ in the Brazilian Journal of Probability and Statistics, Gustavo Gilardoni of the University of Brasilia considered the implications of two modified versions of the ‘irrelevance of alternatives’ axiom. The consequences included a characterization of the Logarithmic Opinion Pool. This looks very interesting, Gustavo.

    In 1993, Gustavo published an article in the Annals of Statistics with Murray Clayton on the reaching of a consensus using De Groot’s iterative pooling.

    Phil Dawid, Julia Mortera, V. Pascal and D. Boxel reported their probabilistic expert systems for forensic evidence from DNA profiling in 2002 in the Scandinavian Journal of Statistics, another magnificent piece of work.

    Julia Mortera is Professor of Statistics at the University of Rome. She has published numerous high quality papers on Bayesian methods, including a number of joint papers with Phil Dawid and other authors, including Steffen Lauritzen, on Bayesian inference in forensic identification. She is one of our leading applied women Bayesians, and a very pleasant lady too. Maybe I should compose a sonnet about her. The Forensic Empress of Rome, maybe.

    Persi Diaconis and Susan Holmes of Stanford University took a Bayesian peek into Feller Volume 1 in 2002 in their fascinating article in Sankyā, and developed Bayesian versions of three classical problems: the birthday problem, the coupon collector’s problem, and the matching problem. In each case the Bayesian component involves a prior on the underlying probability mechanism, which could appreciably change the answer.

    Persi had previously published his finite forms of De Finetti’s beautiful exchangeability theorem (De Finetti’s theorem was praised in one of William Feller’s treasured footnotes in Feller Volume 2), a fundamental paper with David Freedman on Bayesian consistency, and a dozen De Finetti-style results in search of a theory.

 
Persi Diaconis
 

    In 2011, Persi Diaconis and Ron Graham were to co-author Magical Mathematics: The Mathematical Ideas that Animate Great Magical Tricks. Persi is a wonderful statistical probabilist and a great entertainer.


In 2002, Ming-Hui Chen, David Harrington and Joseph Ibrahim described some useful Bayesian cure rate models for malignant melanoma, in the online Wiley library.

    I’m quite interested in all this, since it’s now fully two years since an artistic-looking mole was removed from on my wrist leaving a dog-bite shaped scar which I’ve recently flashed to a couple of eminent audiences during my ongoing public health campaign.

    The models included

1. A piecewise exponential model: The authors proposed a semi-parametric development based on a piecewise constant hazard of the proportionate hazards model. The degree of nonparametricity is controlled by J, the number of intervals in the partition.

2. A parametric cure rate model: This assumes that a certain fraction ρ of the population are ‘cured’ and the remaining 1- ρ are not cured. The survivor function for the entire population is ρ+ (1-ρ ) S(t) where S (t) is the survivor function for the non-cured group in the population. The authors construct an elaborate choice of S(t) which refers to the numbers of metastatic competent tumour cells for each of n subjects, depending on a parameter of the random times taken for the tumour cells to produce detectable metastatic disease.

3. A semi-parametric cure rate model: This takes the survivor function to be representable by a piecewise hazards model. The degree of nonparametricity is controlled by J the number of unknown parameters in the model.

    For each of these models, the authors construct a power prior to represent the prior information concerning the unknown parameters. Their choices are proportional to the product of a subjectively assessed beta distribution and a power of the likelihood when it conditions on a set of hypothetical prior observations.

    The authors assess their models by n separate CPO statistics. The i th. CPO statistic is just the predictive density of the observed response variable for case i, when conditioned on the remaining n-1 observed response variables. They also refer to the average log-pseudo-Bayes factor B which averages the logs of the CPO statistics.

    The authors completed Bayesian analyses of the E1690 time-to-event data for high risk melanoma using MCMC and variations of each of the three preceding choices of sampling model. The cure rate models performed equally well; they fit the data slightly better than the piecewise exponential model, according to the values of the B-statistics. These results have considerable implications on designing studies in high risk melanoma.

    Thank you, guys! After a non-malignant mole on my shoulder vanished during a recent scare, my predictive probability of a malignant recurrence is now close to zero.


Kevin Gross, Bruce Craig and William Hutchison published their insightful paper ‘Bayesian estimation of a demographic matrix model from stage-frequency data’ in 2002 in Ecology.

    Bruce Craig is nowadays Professor of Statistics and director of the statistical consulting service at Purdue. His areas of interest include Bayesian hierarchical modeling, protein structure determination, and the design and analysis of micro-array experiments.

    The year 2002 was indeed a good one for Bayesian applications. Carmen Fernandez, Eduardo Ley and Mark Steel modelled the catches of cod, Greenland halibut, redfish, roundhouse grenadier and skate in a north-west Atlantic fishery, and reported their conclusions in Applied Statistics.

    Not to be outdone, David Laws and Tony O’Hagan proposed a hierarchical Bayes model for multi-location auditing in the on-line Wiley library. During their somewhat adventurous presentation they introduced the notion of the fractional error or taint of a transaction. They proposed a complicated procedure for the elicitation of their prior parameters, and their prior to posterior analysis involved oodles of ratios and products of Gamma functions.

    On a more theoretic note, Stuart Barber, Guy Nason and Bernie Silverman of the University of Bristol reported on their posterior probability intervals for wavelet thresholding in JRSSB. They approximated the first four simulants of the posterior distribution of each wavelet coefficient by linear combinations of wavelet scaling functions, and then fit a probability distribution to the approximate cumulants. Their method assumed either independent normal mixture priors for the wavelet coefficients or limiting forms of these mixtures, and this yielded a posterior distribution which was difficult to handle with exact computations. They however showed that their approximate posterior credibility intervals possessed good frequency coverage.

    It is not obvious whether the authors’ adaptive Bayesian wavelet and Bayes Thresh choices of prior model made their posterior analysis overly complicated. Indeed their prior formulations meant that the posterior distribution was not overly robust to outliers. It might be worth assuming a suitable prior distribution on function space for the entire wavelet regression function, and then approximating this on a linear subspace in order to develop a more sensible joint prior distribution for the wavelet coefficients

    In their articles in JASA in 2002, Valen Johnson, Robert Deaner and Carel von Shenk very bravely performed a Bayesian analysis of some rank data for primate intelligence experiments, B.M. Golam Kibrua, Li Sun, Jim Zidek and Nhu Le reported their Nostradamus-style Bayesian spatial prediction of random space-time fields for mapping PM 2,5 exposure, and Steven Scott came out of hiding and reviewed the recent Bayesian recursive computing methodology for hidden Markov models.


Suitably encouraged, I now survey the RSS journals for 2003 which I’ve just discovered languishing with my long-discarded Irving Welsh novels in the tottery John Lewis bookcase in my spare room.

    Alexandra Schmidt and Tony O’Hagan co-authored an important paper about Bayesian inferences for non-stationary covariance structures via spatial deformations.

    Ian Dryden, Mark Scarr, and Charles Taylor report their Bayesian texture segmentation of weed and crop images using reversible jump MCMC methods. They model their pixel intensities using second order Gaussian Markov random fields and the second-order stationary Potts model. They take the number of textures in a particular image to have a prior truncated Poisson distribution, and compute some interesting, though not that detailed, statistically smoothed onion, carrot, and sugar-beet images. Their trace plots for the simulated posteriors are not entirely convincing.

    Maura Mezzetti of the University of Roma Tor Vergata and her five worthy co-authors propose a Bayesian compartmental model for the evaluation of 1,3-butadiene (BD) metabolism. This refers to three differential equations which represent the quantity of BD in three compartments as a function of time. The equations depend upon the blood flows through the compartments, the blood-tissue specific partition coefficients, the total blood flow, the alveolar ventilation rates, and body weights.

 
Maura Mezzetti
 

    Mezzetti et al propose a hierarchical model which assigns prior distributions to the population parameters and to the individual parameters. They fit their proposed pharmokinetic model to the BD data with some interesting general conclusions.

    Stephen Walker addressed the problem of sample size determination by using a Bayesian semi-parametric approach. He expressed his prior distribution for an unknown sampling distribution as a convolution of Dirichlet processes and selected the optimal size n of his random sample by maximizing the posterior expectation of a utility function which invokes the cost in utiles of taking n observations. Stephen’s numerical example is absolutely gobsmacking, and, given the hefty prior specifications required to choose a single value n, I am left to ponder about the practical importance of his approach.

    Michail Papathomas and Roger Hutching applied their Bayesian updating procedures to data from the UK water industry. Experts are expected to express their subjective probabilities for n binary outcomes, both as prior estimates, and as ‘study estimates’ when more information is available after a typically expensive study is undertaken. Moreover the better, ‘study estimates’ should be sufficient for the prior estimates.

    As the joint p.m.f. of the binary responses typically requires the specification of an immense number of joint probabilities, the authors refer to a ‘threshold copula’ which generates dependence between the binary responses for specified marginal distributions by taking conditional probits to be correlated,  using a multivariate normal distribution. They then employ ‘Jeffreys conditionalization’ as an updating procedure. Their elicitation of the probability assessments and the correlations from experts was supervised by statisticians, though reportedly not in ideal fashion. Finally, they completed a sound MCMC analysis of pipe data provided by South West Water Services, a most fascinating study.

    Fernando Quintana and Pilar Iglesias present a decision theoretic formulation of product partition models (PPMs) that allows a formal treatment of different decision problems such as estimation or hypothesis testing together with clustering methods. The PPMs are thereby constructed in the context of model selection. A Dirichlet process prior is assumed for the unknown sampling distribution, and the posterior inferences are used to detect outliers in a Chilean stock-market set. An excellent contribution from Bayesian Chile. Why don’t you give the plucky Bolivians their corridor to the sea back, guys? You’re hogging too much of the coast.

    Maria Rita Sebastiani uses Markov random-field models to estimate local labour markets, using a Bayesian texture segmentation approach, and applies these techniques to data from 287 communes in Tuscany.

    Stuart Coles and Luis Pericchi used likelihood and Bayesian techniques to estimate probabilities of future extreme levels of a process (i.e. catastrophes) based upon historical data which consist of annual maximum observations and may be modelled as a random sample from a member of the generalised extreme value (GEV) family of distributions which possess location and scale parameters together with a kurtosis parameter. They in particular compute the predictive distribution, given the historical data of a future annual maximum observation Z, and relatively vague, though proper, prior assumptions for the three unknown parameters. Their maximum likelihood and Bayes procedures were applied to good effect to a set of rainfall data from Venezuela. While the authors’ prior to posterior analysis amounted to an extremely simple application of standard Bayesian techniques, the practical conclusions were both perceptive and fascinating.

    While still on the subject of rainfall, the Scottish BIOSS biomathematicians David Allcroft and Chris Glasbey described how to use a latent Gaussian Markov random-field model for spatiotemporal rainfall disaggregation. They transform the rainfall observations to supposed normality using an empirically estimated quadratic function of an empirically estimated power transformation. In so doing they censor any, albeit very informative, zero values of rainfall. Not a good start! I’m surprised that they were allowed to publish it. The authors then crank the MCMC handle and analysis a retrospective set of rainfall data from aways in the Red Basin river valley in Arkansas.

    Yoshiko Ogata, Koichi Katsura and Masaharu Tanemura used a Bayesian hierarchical model on tessellated spatial regions to investigate earthquakes, which of course occur heterogeneously in space and time. Their assumptions generalise a space-time epidemic-type aftershock model, and they check them out by reference to an elaborate space-time residual analysis. The authors constructed a MAP estimate of the non-homogeneous region Poisson intensity across a coastal region of Japan, and a MAP estimate of their hierarchical space-time model. This was a magnificent piece of work.

    Yoshiko Ogata and his colleagues have published many papers during the last three decades which concern the aftershocks of earthquakes. He is on the faculty of the Institute of Statistical Mathematics in Tokyo.


Maria Rita Sebastiani is Professor of Economics at the Sapienzia University of Rome. She obtained her doctorate there in 1998 with the title

 

Modelli spaziali per la stima dei mercati locali del lavaro

 

    Maria’s areas of research interest include Bayesian inference, hierarchical spatial modelling, business mortality risk, European populations and demography, and transition probabilities to illness, dependency, and death.


Elsewhere in 2003, William Penny and Karl Friston of the Welcome Trust Centre for Neuro-imaging at UCL used mixtures of generalised linear models for functional neuro-imaging in their article in IEEE Trans Med Imaging, and constructed some interesting posterior probability maps.

    Gary Chamberlain and Guido Imbens reported their supposedly non-parametric applications of Bayesian inference in 2003 in the Journal of Business & Economic Statistics. Gary is currently the Louis Berkman Professor of Economics at Harvard.

    Moreover, Linda Garside and Darren Wilkinson applied their dynamic lattice-Markov spatio-temporal models to environment data in Bayesian Statistics 7, and Harry Martz and Michael Hamada addressed uncertainty in counts and operating time in estimating Poisson occurrence rates in their article in Reliability Engineering&System Safety.

 

Michael Hamada

 

    The respected Japanese-American statistician Michael Hamada obtained his Ph.D. from the University of Wisconsin-Madison during the early 1980s where he attended my Bayesian course and worked as a project assistant with Jeff Wu and myself at the U.S. Army’s ill-fated Math Research Center, which was to finally get squished during the Gulf War of 1991. Mike nowadays solves top level problems at the Los Alamos National Laboratory in New Mexico. In 2008 he published the outstanding book Bayesian Reliability with Alyson Wilson, Shane Reese and Harry Martz.

    Harry Martz is the principal associate director for Global Security at Los Alamos. Now that’s a powerful application of Bayesian reliability. And of addressing uncertainty in counts.


Bradley Efron’s ASA Presidential address, delivered in Toronto during August 2004, was entitled Bayesian, Frequentists, and Scientists.

    Professor Efron said, in summary, that ‘My guess is that a combination of Bayesian and frequentist ideas will be needed to deal with our increasingly intense scientific environment.’

    Brad discussed his ideas in the context of breast cancer risk, data from an imaging scan which quite clearly distinguished between 7 supposedly normal children and 7 dyslectic children, and a bivariate scatterplot which measured kidney function against age.

    When concluding his address, Professor Efron said,

    ‘Now the planets may be aligning for Statistics. New technology, electronic computation, has broken the bottleneck of computation that limited classical statistical theory. At the same time an onrush of new questions has come upon us, in the form of huge data sets and large scale inference problems. I believe that the statisticians of this generation will participate in a new age of statistical innovation that might rival the golden age of Fisher, Neyman, Hotelling, and Wald.’

    Bradley Efron was applauded by Frequentists and Bayesians alike, while the Goddess Fortune hovered in the background.

 
Bradley Efron
 

In his 2004 paper in the Annals of Statistics, Stephen Walker used martingales to investigate Bayesian consistency. He derives sufficient conditions for both Hellinger and Kullback-Liebler consistency, which do not rely on the use of a sieve, together with some alternative conditions for Hellinger consistency, a splendid contribution.

    I’m wondering whether to write a poem about Stephen’s brave exploits. I could call it ‘Ode to a Martingale’. There is a martingale on Berkeley Square which drives everybody spare? Perhaps not.

    Stephen is now Professor of Mathematics at the University of Texas in Austin. His research focuses on Bayesian parametric and semi-parametric methods, with applications in medical statistics. He obtained his Ph.D. from Imperial College London in 1995, where he was supervised by Jon Wakefield.

    Richard Boys and Daniel Henderson published their Bayesian approach to DNA sequence segmentation in 2004 in Biometrics. Many DNA sequences display compositional heterogeneity in the form of segments of similar structure. The authors identified such segments using a (real) Markov chain governed by a hidden Markov model. They quite novelly assumed that the order of dependence q and the number of parameters r were unknown, and took these parameters to possess independent truncated Poisson priors. The vectors of transition probabilities were taken, in the prior assessment, to possess independent Dirichlet distributions.

    Boys and Henderson applied their computer-simulated prior-to-posterior MCMC procedure to an analysis of the bacteriophage lambda, a parasite of the intestine bacterium Enchiridia coli. They computed a very illuminating joint posterior p.m.f. for the parameters q and r, and checked this out with a prior sensitivity analysis. A highly original and very thorough piece of work.

    Richard Boys is Professor of Statistics at the University of Newcastle at Newcastle-upon-Tyne.. He applies his Bayesian ideas to science, social science, and medicine, and he also has research interests in statistical biomathematics and stochastic systems biology. Daniel Henderson is a hard-working teaching fellow in the same department.

    Richard’s departmental colleague Professor Darren Wilkinson is also well-published in similar sorts of areas. When I first met him in 1997, he was still green behind the ears, but he is now highly accomplished. The Bayesian Geordies are good to drink with. They sip their beer in silence for lengthy periods of time, while making the occasional wry wisecrack out of the depths of their grey matter.

 
 

Richard Boys Darren Wilkinson
 

In their JASA papers of 2004, Scott Berry and five co-authors employed their Bayesian survival analysis with non-proportional hazards for a metanalysis of combination pravastatin-aspirin, the redoubtable Jay Kadane and Nicole Lazar discussed various model-checking criteria, Stephen Walker, Paul Damien and Peter Lenk investigated priors with a Kullback-Liebler property, and Nidhan Choudhuri, Subhashis Ghoshal and Anindaya Roy considered the Bayesian estimation of a spectral density.

  In 2004, the ever insightful Michael Hamada, Valen Johnson, Leslie Moore and Joanne Wendelberger co-authored a useful paper in Technometrics on Bayesian prediction and tolerance intervals.
 

In the same year, George Streftaris and Gavin Gibson published their very useful paper 'Bayesian Inference for stochastic epidemics in closed populations' in Statistical Modeling.

    George and Gavin are respectively Senior Lecturer and Professor in the Department of Actuarial Mathematics and Statistics at Heriot-Watt.


    In the Royal Statistical Society journals of 2004,

Patrick Wolfe, Simon Godsill and Wee-Jing Ng described their Bayesian variable selection and regularization methodologies for time-series surface estimation. They, in particular, analysed the Gabor regression model, and investigated Frame theory, sparsity, and related prior dependence structures.

Dan Cornfield, Lehel Csató, David Evans and Manfred Opper published an invited discussion paper about their Bayesian analysis of the scatterometer wind retrieval inverse problem. They showed how Gaussian processes can be used efficiently with a variety of likelihood models, using local forward observation models and direct inverse models for the scatterometer. Their vector Gaussian process priors are very useful.

Randall Eubank and five co-authors discussed smoothing spline estimation in varying coefficient models. They used the Kalman filter to compute their posterior inferences, and developed Bayesian intervals for the coefficient curves. A very competent piece of research.

Jeremy Oakley and Tony O’Hagan described their posterior sensitivity analysis of complex models. This is a very intense paper with lots of novel ideas, which is well worth scrutinizing in detail. I think.

Tae Young Yang described his Bayesian binary segmentation procedure for detecting streakiness in sports. He employed an interesting integer-valued change-point model but falls straight into the Bayes factor trap. His posterior inferences would be very sensitive to small changes, e.g. unit changes in the ‘prior sample sizes’, to the parameters of his beta priors. What a shame they didn’t teach you that at Myongji University, Tae.

    Tae nevertheless successfully investigated the assertions that Barry Bonds was, and Javy Lopez wasn’t, a streaky home run hitter during the 2001 and 1988 seasons, whether the Golden Warriors were a streaky basketball team during the 2000-2001 season, and whether Tiger Woods was a streaky golfer during  September 1996-June 2001. This is all velly velly interesting stuff!

Gary Koop followed that up by describing his Bayesian techniques for modelling the evolution of entire distributions over time. He uses his techniques to model the distribution of team performance in Major League baseball between 1901 and 2000.

Konstandinos Politis and Lennart Robertson described a forecasting system which predicts the dispersal of contamination on a large scale grid following a nuclear accident. They employed a hierarchical Bayesian forecasting model with multivariate normal assumptions, and computed some convincing estimated dispersion maps.

Carolyn Rutter and Gregory Simon of the Center for Health Studies in Seattle described their Bayesian method for estimating the accuracy of recalled depression among outpatients suffering from supposed bipolar disorder [mood swings may well be symptomatic of physical, e.g organ, gland or sleep, malfunctions rather than the famously hypothetical ‘biochemical imbalance’] who took part in LIFE interviews. In the study under consideration, each of 376 patients were interviewed twice by phone. One of various problems with this approach is that telephone interviews are likely to increase the accuracy of recall i.e. during the second telephone call the patient may remember his first interview rather than his previous mood state.

    It does seem strange that these patients only received follow-up telephone interviews as a mode of treatment. In Britain they would have been more closely monitored during their mood swings by their community psychiatric nurses (CPNs). There are often big differences between the moods observed by the CPNs and the moods perceived by the patients (David Morris, personal communication) neither of which, of course, may match reality.

    Researchers at the Center for Health Studies in Seattle have also been known to advocate the use of an ‘optimal’ regime of atypical anti-psychotic drugs which have statistically established high probabilities (around 70% in the short term as established by the CATIE study) of causing intolerable physical side effects among patients suffering from schizophrenia. This apparent mental disorder could, however, be caused by calcification of the pineal gland, and maybe even from high oil concentration in the skin, or from a variety of possible physical causes.

    I therefore have some mild a priori reservations regarding the sorts of advice the center in Seattle might give which could influence future mental health treatment. Rutter and Simon would appear to have omitted a number of important symptom variables from their analysis e.g. relating to lack of sleep and thyroid dysfunction. Their conclusions could therefore be a bit spurious.

Fulvio De Santis, Marco Perone Pacifico and Valeria Sambucini describe some Bayesian procedures for determining the ‘optimal’ predictive sample size n for case control studies.

    Let ψ denote a parameter of interest, for example the log-measure of association in a 2x2 contingency table, and consider the 100(1-α)% highest posterior density (HPD) Bayesian interval for ψ . Then, according to the length probability criterion (LPC), we should choose the value of n which minimizes the expected length of this interval, where the expectation should be taken with respect to the joint prior predictive distribution of the endpoints.

    The authors generalise LPC, in order to take variability into account, by choosing the smallest n such that the prior predictive probability of having an interval estimate whose length is greater or less than a given threshold is limited by a chosen level.

    In the context of hypothesis testing they recommend choosing the smallest n such that the probability that neither the null nor the alternative hypothesis is ‘strongly supported’ is less than a chosen threshold, where a further threshold needs to be specified in order to determine ‘strong support’.

    The authors apply these ideas to a practical example concerning the possible association between non-Hodgkin’s lymphoma and exposure to herbicide. Observations for 1145 patients are used as a training sample that helps to determine the, very strong, choice of prior distribution, which is doubtlessly needed to justify the normal approximations.

    I’m sorry if I’m sounding too frequentist, but wouldn’t it be simpler, in terms of prior specification, to choose the value of n which minimises the strength, i.e. the average power with respect to some prior measure, for a test of specified size?

Francesco de Pasquale, Piero Barone, Giovanni Sebastini and Julian Stander describe an integrated Bayesian methodology for analysing dynamic magnetic resonance images of human breasts. The methods comprise image restoration and classification steps. The authors use their methodology to analyse a DMRI sequence of 20 two-dimensional images of 256x256 pixels of the same slice of breast. An absolutely splendid and highly influential contribution.

 
 

Nicky Best and Sylvia Richardson co-authored nine splendid joint papers with other co-workers between 2005 and 2009. They include articles on modelling complexity in health and social science: Bayesian graphical models as a tool for combining multiple sources of information, improving ecological inference using individual-level data, studying place effects on health by synthesising individual and area-level incomes, and adjusting for self-selection bias in case control studies

    Nicky Best is Professor of Statistics and Epidemiology at Imperial College London. She and Deborah Ashby recently developed a Bayesian approach to complex clinical diagnoses, with a case study in child abuse. They reported this, with Frank Dunstan, David Foreman and Neil McIntosh in an invited paper to the Royal Statistical Society in 2013.


In George Barnard and David Cox’s days, the statisticians at ICL taught in a charming house on Exhibition Road. and in the Victorianesque Huxley Building, which tagged onto the similarly spacious Victoria and Albert Museum and where, according to George Box, a statistician was once crushed to death by the much dreaded lift.

 
The infamous Huxley Building lift shaft
 

    Philip Prescott writes from the University of Southampton ‘the lift had large metal gates that would have to be careful closed in order that the lift would open properly. It was usually quicker to walk up the stairs, or ran if we were late for our lectures.’

    Doubtlessly, the Bayesians at ICL are much safer nowadays.

    I last visited ICL in 1978 to teach a short course on Bayesian Categorical Data Analysis. This was during the final stages of the World Cup in Argentina (I remember David Cox telling me that Argentina had just beaten Peru six nil). Nowadays, the statisticians are housed in much more modern premises on Queen’s Gate, and the life expectancy has improved somewhat.      

    Nicky Best received the Royal Statistical Society’s Guy Medal in Bronze in 2004.

    While Sylvia Richardson is slightly more mathematically inclined, Nicky also shows considerable practical acumen. Both Nicky and Sylvia have been heavily involved in Imperial College’s BIAS research program, which addresses social science data that are notoriously full of missing values, non-responses, selection biases and other idiosyncrasies. Bayesian graphical and hierarchical models offer a natural tool for linking many different sub-models and data sources, though they may not provide the final answer.


In 2005 the second edition of the best-selling book Statistics for Experimenters, by George Box, J. Stuart Hunter, and Bill Hunter was published, a quarter of a century after the first edition, but with the new subtitle Design, Innovation, and Discovery. The second edition incorporated many new ideas at Stu Hunter’s suggestion, such as the optimal design of experiments and Bayesian Analysis.

 
J. Stuart Hunter
 

    J. Stuart Hunter is considered by many people to be a wonderful character, a gifted scientist, and one of the most important and influential statisticians of the last half century, especially with regard to applying statistics to problems in industry. He is currently investigating developments in data mining and machine learning.

    In their JASA papers of 2005, Peter Müller discussed applied Bayesian modeling, Michael Elliott and Rod Little described their Bayesian evaluation of the 2000 census, using ACE survey data and demographic analysis, and Antonio Lijio, Igor Prünster and Stephen Walker investigated the consistency of semi-parametric normal mixtures for Bayesian density estimation,


John Pratt published his paper ‘How many balance functions does it take to determine a utility function?’ in 2005 in the Journal of Risk and Uncertainty.

    John is the William Ziegler Professor Emeritus of Business Administration at Harvard. He is a traditional Bayesian of the old school, but with all sorts of inspirational ideas.   

    Peter Rossi and Greg Allenby published their book Bayesian Statistics and Marketing with John Wiley in 2005.

    Peter is James Collins Professor of Marketing, Statistics, and Economics at UCLA. He used to be one of Arnold Zellner’s happy crew in the University of Chicago Business School.

    Dario Spanò and Robert C. Griffiths co-authored their paper on transition functions with Dirichlet and Poisson-Dirichlet stationary distributions in Oberwolfach Reports in 2005.

 
Dario Spanò
 

    Dario Spanò obtained his Ph.D. in Mathematical Statistics from the University of Pavia in 2003. In 2013 he was promoted to Associate Professor of Statistics at the University of Warwick, where he is also director of the M.Sc. program and a diehard Bayesian to boot.

    Robert C. Griffiths FRS is Professor of Statistics at the University of Oxford.


Also in 2005, Geoff McLachlan and David Peel co-authored A Bayesian Analysis of Mixture Models for the Wiley on-line library. This is the fourth chapter of their 2004 book Finite Mixture Models, and it is very important in terms of semi-parametric multivariate density estimation. This is a situation where proper priors are always needed, e.g. any improper prior for the mixing probabilities will invariably lead to an improper posterior. Morris de Groot was the first to tell me that. Maybe a maximum likelihood procedure using the EM algorithm would sometimes work better in practical terms. Geoff has produced a wonderful computer package which does just that. See [15], p3. Orestis Papasouliotis and I applied Geoff’s package to the Lepenski Vir Mesolithic and Neolithic skeletons data to excellent effect.

    Professor McLachlan is interested in applications in medicine and genetics. He is a Distinguished Senior Research Fellow at the University of Queensland. In 2011, he was awarded the Pitman Medal, the Statistical Society of Australia’s highest honour.

    Laurent Itti and Pierre Baldi published their paper ‘Bayesian Surprise Attracts Human Attention’ in 2005 in Advances in Neural Processing Systems. The following year, the authors were to characterise surprise in humans and monkeys, and to model what attracts human gaze over natural dynamic scenes.

    Researchers in the murky interior of the USC Hedco Neuroscience building in Los Angeles, who are heavily involved in the Bayesian theory of surprise, later developed a ‘bottoms-up visual surprise’ model for event detection in natural dynamic scenes. I wouldn’t be surprised by whatever the heady people at Hedco come up with next.

    Jay Kadane and four co-authors reported their conjugate analysis of the Conway-Maxwell-Poisson (CMP) distribution in 2005 in Bayesian Analysis. This distribution adds an extra parameter to the Poisson distribution to model overdispersion and under-dispersion.

    Following my oft-referred-to four-decade-old paradigm (Do transform to a priori normality!), I might, if the mood strikes, instead assume a, more flexible, bivariate normal prior for the logs of the two parameters. This non-conjugate prior extends quite naturally to a multivariate normal prior for the parameters in generalised linear models which refer to CMP sampling assumptions.


During 2005, Luis Pericchi published an article in Elsevier R.V. Handbook of Statistics entitled ‘Model Selection and Hypothesis Testing based on Objective Probabilities and Bayes Factors’.

 
Luis Pericchi
 

    Luis Pericchi is Professor of Mathematics at the University of Puerto Rico. He has authored numerous influential papers on Bayesian methodology and its applications, and is highly regarded.

    Samuel Kou, Sunney Lie and Jun Liu of Harvard University reported their Bayesian analysis of single-molecule experimental data in 2005 in the invited discussion paper to the Royal Statistical Society. They investigate an interesting two-state model for Y(t) the total number of photons arriving up to time t. This is equivalent to the photons arrivals following a doubly stochastic Poisson process, and the authors use their two-state formulation to investigate Brownian motion, and to solve problems relating to the Deoxyribonucleic acid hairpin and fluorescence life-time experiments.

    Professor Alan Hawkes of the University College of Wales at Swansea was slightly puzzled when he proposed the Vote of Thanks, and he raised a few quibbles. For example, the authors’ process for photon observation depends on an underlying gamma process which is not dependant on the sequence of pulses. And yet it is the pulses that emit photon emission. The authors addressed these quibbles in their written reply.

    In 2005, Thomas Louis published an article in Clinical Trials on the fundamental concepts of Bayesian methods.

    Louis is Professor of Biostatistics at the Johns Hopkins Bloomberg School of Health in Baltimore. He has many Bayesian research interests, for example in the analysis of medical longitudinal, spatial, and observational data, and he is very well published.

 
 

During 2005 in Applied Statistics:

Samuel Mwalili, of the Jomo Kenyatta University of Agriculture and Technology in Nairobi, Emmanuel Lesaffre and Dominic Declerck used a Bayesian ordinal regression model to correct for inter-observer measurement error in a geographical health study;

Jaime Peters and five colleagues at the University of Leicester used Bayesian procedures to investigate the cross-design synthesis of epidemiological and toxicological evidence, using some ingeniously vague choices for their prior parameters;

Claudio Verzilli, of the London School of Hygiene and Tropical Medicine, John Whittaker, Nigel Stallard and Daniel Chapman proposed a hierarchical Bayesian multivariate adaptive regression spline model for predicting the functional consequences of amino-acid polymorphisms, and used it to investigate the lac repressor molecule in Escherida coli. In the absence of lactose, this molecule binds to the DNA double helix upstream of the genes that code for enzymes;

Sujit Sahu and Kanti Mardia propose a Bayesian kriged Kalman model for the short-term forecasting of air pollution levels, which assumes that the spatial covariance kernel belongs to the Matérn family. The authors apply their Bayesian analysis most successfully to the New York air pollution data.

 
Sujit Sahu
 

Sujit Sahu is Professor of Statistics at the University of Southampton. He is interested in Bayesian modeling for interpreting large and complex data sets in a wide range of application areas.


In 2006, Stephen Pitt reported his efficient Bayesian inferences, with David Chan and Robert Kohn, for Gaussian copula regression models in Biometrika. He is one of the most insightful of our up-and-coming Bayesian Economists. I remember chewing the rag with him in 1998 at Valencia 6.

    Stephen is Professor of Economics at the University of Warwick, where he has worked as liaison officer for the highly Bayesian, integrated single honours MORSE (Mathematics, Operational Research, Statistics, and Economics) degree which I helped Robin Reed and the Economists to create in 1975. Stephen’s research areas include financial time series, non-Gaussian state models, stochastic volatility models, and MCMC.

    The campus of the University of Warwick is situated in the southern fringes of the once bombed out City of Coventry several miles north of the ruins of the historic Kenilworth Castle where the rebellious thirteenth century leader Simon de Montfort, the Ill-fated King Edward the Second, and King Henry the Fifth, the victor at Agincourt, both stayed. The better preserved Warwick Castle, with its vibrant dungeons and strutting peacocks, lies even further to the south. Therein, the evil Kingmaker of the Wars of the Roses once lived, a century or so after the losing King at Bannockburn’s butch lover Piers Gaveston was beheaded by irritated nobles in a ditch nearby.

 
Warwick Castle
 

    The University of Warwick of the late 1960s and early 1970s just consisted of a few loosely white-tiled buildings scattered across a long tract of farm land, and the junior academics were reportedly exploited as guinea pigs by the Napoleonic Vice-Chancellor. However, after that gentleman melted   into thin air, the campus slowly became jam-packed with buildings and eventually accommodated one of the most thriving universities in Europe. The MORSE degree, which started in 1975 with 30 students a year, became world famous with it.

    The current undergraduate intake for the MORSE and MMORSE degrees is 150 a year, with many students coming from overseas. A new interdepartmental B.Sc. degree in Data Analysis will start in 2014.

    Getting back to 2006, Petros Dellaportas, Nial Friel and Gareth Roberts reported their Bayesian model selection criterion for partially (finitely) observed diffusion models in Biometrika. For a fixed model formulation, the strong dependence between the missing paths and the volatility of the diffusion can be broken down using one of Gareth’s previous methods. The authors described how this method may be extended via reversible jump MCMC to the case of model selection. As is ever the case with reversible jump MCMC, my mind boggles as to how long the simulations will dither and wander before wandering close to the theoretical solution.

 
Nial Friel
 

    Nial Friel is Associate Professor and Head of Statistics at University College Dublin. His research interests include Bayesian inference for statistical network models, social network analysis, and model selection. He obtained his Ph.D. in 1999 from the University of Glasgow.


In his interesting 2006 article in Statistics in Society, Gene Hahn re-examined informative prior elicitation through the lens of MCMC methods. After reviewing the literature, he stated four principles for prior specification relating to the need (1) to elicit prior distributions which are of flexible form (2) to minimize the cognitive demands on the expert (3) to minimize the demands on the statistician, and (4) to develop prior elicitation methodologies which can be easily applied to a wide range of models.

    With these ambitious, though somewhat expedient, principles in mind, Hahn recommended eliciting non-conjugate priors by reference to Kullback-Liebler divergence. He applied his ideas to inference about a regression parameter in the context of set of data on rainfall in York. Overall, an intriguing study, which seeks to simplify the plethora of modern prior elicitation techniques.

 
Gene Hahn
 
    Gene Hahn is Associate Professor of Information and Decision Sciences at Salisbury University in Maryland. His research interests include management decision making, Bayesian inference, and international operations including offshore and global supply chain management.
 
 

In contrast to Gene Hahn’s prior informative approach, Trevor Sweeting, Gaura Datta, and Malay Ghosh proposed deriving vague non-subjective priors by minimizing predictive entropy loss. Their suggestions in their exquisitely mathematical paper ‘Nonsubjective priors via predictive relative entropy regret’ in the Annals of Statistics (2006) may be contrasted reference priors. Sweeting is one of those wonderful English surnames which makes you proud to feel that you’re British.

    Trevor Sweeting is Emeritus Professor of Statistics at UCL. His interests also include Bayesian computations, Laplacian Approximations, and Bayesian semi-parametric hierarchical modeling, and he has published extensively within the Bayesian paradigm. He is one of the later generation of UCL Bayesians who, together with Tom Fearn, picked up the pieces during the decades following Dennis Lindley’s dramatic departure in 1977, which had all the ingredients of a Shakespearean play.



The UCL statisticians are now housed in modern premises on Tottenham Court Road, a couple of hundred yards to the west of the Malet Street quadrangle, where feisty Administrators and beefeaters once roamed and well away from the padded skeleton of the purveyor of happiness Jeremy Bentham, which is still on show in a glass case. An innocuously quiet, unassuming Bayesian called Rodney Brooks still beavers away in the Stats department over forty years after he published a straightforward version of Bayesian Experimental Design in his UCL Ph.D. thesis, and Mervyn Stone still floats through, over a century after Karl Pearson first moved into Malet Street with an autocratic demeanour which was only to be rivalled by Sir Ronald Fisher. According to the industrial statistician and crafty Welsh mountain walker Owen Davies, who succeeded Dennis Lindley to the Chair in Aberystwyth in 1967, Karl and Sir Ronald both accused the other of behaving too autocratically; indeed their interdepartmental dispute over who should teach which course was never to be completely resolved.

 
 
The UCL Centre for Computational Statistics and Machine Learning
 

In their 2006 paper in Statistics in Society, John Hay, Michelle Haynes, Tony Pettitt and Thu Tran investigated a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia.

    The observed binary responses can be arranged in an NxJxT array, where each of the N individuals in the sample can be in any one of J states at each of T times. Under appropriate multinomial assumptions, the authors took the three-dimensional array of multivariate logits to be constrained by a linear model that may or may not have random components, but which depends upon NxT vectors of explanatory variables and NxT vectors of lagged variables. While the authors could have explained a bit more what they were up to, they referred to WinBUGS for their, presumably first stage multivariate normal, prior specifications and used DIC when comparing different special cases of their general model specification. Then they used their model-specific inferences to draw some tolerably interesting applied conclusions from their social survey.

    John Hay published his book Statistical Modeling for Non-Gaussian Time Series Data with Explanatory Variables out of his 1999 Ph.D.thesis at the Queensland University of Technology (QUT) in Brisbane.

    Tony Pettitt is Professor of Statistics at QUT. His areas of interest, while working in that neck of the woods, include Bayesian Statistics, neurology, inference for transmissions of pathogens and disease, motor unit number registration, and Spatial Statistics.

  

The Mayflower Steps
 

    Meanwhile, Professor George Kuczera of the Department of Engineering of the University of Newcastle in New South Wales has established himself as a world authority on the theory and applications of Bayesian methods in hydrology and water resources.


Mark Steel, our friendly Dutchman at the University of Warwick, was his usually dynamic self in 2006. He published the following three joint papers in JASA during that same year:

    Order Based Dependent Dirichlet Processes (with Jim Griffin),

    A constructive representation of skewed normal distributions (with José Ferreira),

    Non-Gaussian Bayesian geostatistical modelling (with M.Blanca Palacios).

    Mark also co-authored papers in 2006 with Jim Griffin in the Journal of Econometrics, and with J.T. Ferreira in the Canadian Journal of Statistics.

    By coincidence, our very own Peter Diggle and his mate Soren Lophaven also published a paper on Bayesian geostatistical design in 2006, but in the Scandinavian Journal of Statistics.

    Peter Diggle is a Distinguished Professor of Statistics at the University of Lancaster down by the Lake District. Peter has authored a number of Bayesian papers, and his research interests are in spatial statistics, longitudinal data, and environmental epidemiology, with applications in the biomedical, clinical, and health sciences. He is very highly regarded.

    Peter is President-elect of the Royal Statistical Society, and received the Guy Medal in Silver in 1997.

    Jim Griffin is Professor of Statistics at the University of Kent. His research interests include Bayesian semi-parametrics, slice sampling, high frequency financial data, variable selection, shrinkage priors and stochastic frontier models.

    In their 2006 papers in JASA, Knashawn Morales, Joseph Ibraham, Chien-Jen Chen and Louise Ryan applied their Bayesian model averaging techniques to benchmark dose estimation for arsenic in drinking water, Nicholas Heard, Christopher Holmes and David Stephens described their quantitative study of the gene regulation involved in the immune response of anopheline mosquitoes, and Niko Kaciroti and his five courageous co-authors proposed a Bayesian procedure for clustering longitudinal ordinal outcomes for the purpose of evaluating an asthma education program.

 

In their 2007 in Statistics in Society, David Ohlssen, Linda Sharples and David Spiegelhalter proposed a hierarchical modelling framework for identifying unusual performance in health-care providers. In a special case where the patients are nested within surgeons a two-way hierarchical linear logistic regression model is assumes for the zero-one counts which incorporates provider effects and the logits of EuroSCOREs are used as covariates for case-mix adjustments. This is a special case of a more general formulation which contains the same sort of parametrization.

    The provider effects are taken to either be fixed or to constitute a random sample from some distribution e.g. a normal or heavier-tailed t-distribution, a mixture of distributions, or a non-parametric distribution. I would personally use a large equally weighted mixture of normal distributions with common unknown dispersion, and unknown locations. If this random effects distribution is estimated from the data by either hierarchical or empirically Bayesian procedures, then appropriate posterior inferences will detect unusual performances by the health care providers. The authors seemed to take a long time saying this, but their job was then well done.

    Meanwhile, John Quigley and Tim Bedford of the University of Strathclyde in Glasgow used Empirical Bayes techniques to estimate the rate of occurrence of rare events on railways. They published their results in the Journal of Reliability and System Safety.

    Jean-Michel Marin and Christian Robert published their challenging and high level book Bayesian Core: A Practical Approach to Computational Bayesian Statistics in 2007.The authors address complex Bayesian computational problems in regression and variable selection, generalised linear models, capture-recapture models, dynamic models, and image analysis.

    If computer R labs are added to three hours a week lectures, then culturally-adjusted graduate students with a proper mathematical background can reportedly hope to achieve a complete picture of Marin and Robert’s treatise within a single semester. I hope that they are also given an intuitive understanding of the subjective ideas involved, and are not just taught how to crank the computational handle.

    In their article in Accident Analysis and Prevention, Tom Brijs, Dimitris Karlis, Filip Van den Bossche and Geert Wets used an ingenious two-stage multiplicative Poisson model to rank hazardous road sites according to numbers of accidents, fatalities, slight injuries, and serious injuries. They assumed independent gamma priors for the seven sets of unknown parameters, before referring to an MCMC analysis, and derived the posterior distributions of the ‘expected cost’ parameters for the different sites. These are expressible as complicated functions of the model parameters.

    The authors utilised this brilliant representation to analyse the official traffic accidents on 563 road intersections in the Belgian city of Leuven for the years 1991-98. Their MCMC algorithm converged quite easily because the sampling model was so well-conditioned. Their posterior boxplots described their conclusions in quite beautiful fashion, and they checked out their model using a variety of predictive p-values. Their Bayesian p-values for accidents, fatalities, severely injured and slightly injured were respectively 0.208, 0.753, 0.452 and 0.241. The authors therefore deemed their fit to be satisfactory. This is what our paradigm is all about, folk!


In their 2007 JASA papers, Chunfang Fin, Jason Fine and Brian Yandell applied their unified semi-parametric framework for quantitative trait loci to the analysis of spike phenotypes, Anna Grohovac Rappold, Michael Lavine and Susan Lozier used subjective likelihood techniques to assess the trends in the ocean’s mixed layer depth, Daniel Cooley, Doug Nychka and Philippe Naveau used Bayesian spatial models to analyse extreme precipitation return levels, and Bo Cai and David Dunson used Bayesian multivariate isotonic regression splines in their carcinogeniety studies,

    In their 2007 paper in Applied Statistics, Roland De La Cruz-Mesia, Fernando Quintana and Peter Müller used semi-parametric Bayesian classification to obtain longitudinal markers for 173 pregnant women who are measured for β human chorionic gonadotropin hormone during the first 80 days of gestational age.

    The data consisted of the observed response vector y for each patient for the known time-points at which their hormone level was observed, together with a zero-one observation x indicating whether the pregnancy was regarded as normal or abnormal. A two-stage sampling model was assumes, where the y’s were taken to be independent given a matching set of random effects parameters. Then the random effects were assumed to be independent with distributions depending on the corresponding x’s a vector φ of common unknown parameters, and two random effects G parameters The posterior classification probabilities (as to whether a further patient with another y vector has a normal or abnormal pregnancy) can then be obtained in terms of the prior probabilities by a simple application of Bayes rule.

    When applying this procedure, it would have seemed important to use empirical estimates for φ and distributions of the G random effects parameters, in order to avoid the interpretative difficulties inherent in Bayes factors. However, the authors represented φ in terms of latent variables in the form of random matrices, and assumed dependent Dirichlet priors for the distributions of the G parameters. Phew, Gor blimey!

    The author’s proceeded merrily along with their MCMC computations, and got the posterior inferences which they, as full-dress hierarchical semi-parametric Bayes factorists, may well have deserved. A nous more practical common sense would not have gone amiss.


Jose Bernardo presented a high-powered invited discussion paper in Sort in 2007 on objective Bayesian point and region estimation in location-scale models. I don’t know what Jose meant by ‘objective’. Isn’t it a bit of a problem for Bayesians to use this evocative sort of terminology? Observational data are usually more subjective than objective.

    In 2007, the book Gaussian Markov Random Fields by Håvard Rue and Leonhard Held was a runner-up for ISBA’s De Groot Prize.

    Whoops! We almost missed out on three exciting publications in the 2007 RSS journals:

    Yuan Ji and seven co-authors used Bayesian mixture models for complex high dimensional count data in phage display experiments, and proposed an interesting hierarchical prior distribution for the parameters in a Poisson/ log-linear model for the observed counts, which were taken at different phages and from different, e.g. mouse, organs. They then referred to a full-dress, parametric Bayesian analysis. The wonders of MCMC will never cease

    Alexandros Gryparis, Brent Coull, Joel Schwarz and Helen Suh used semi-parametric latent variable regression models for the spatiotemporal modelling of mobile source particles in the greater Boston area. They used a non-linear factor analytic model together with geoadditive semi-parametric regression assumptions, and sorted out the identifiability problems with their informative prior specifications. Their MCMC analysis yielded some beautiful-looking, and neatly coloured maps, of Boston, Massachusetts.

    Nicole Augustin, Stefan Lang, Monica Musio and Klaus von Wilpert,  applied Bayesian structural additive regression to a spatial model for the needle losses of pine-trees in the forests of Baden-Württemberg. A delightful bed-time read.


Pilar Iglesias-Zuazola, the spiritual leader of Bayesians in Chile, passed away on the third of March 2007. She was one of the leading Bayesian researchers in Latin America, and an outstanding educator who would visit Chilean pupils in their high schools.

 
Pilar Iglesias
 

    After working with Carlos Pereira, Pilar received her doctorate from the University of São Paulo in Brazil. Upon becoming a faculty member of the Catholic University of Chile (PUC), she started making contact with Bayesians at other Chilean Universities. Consequently, the University of La Serena, Chile, hosted the First Bayesian Workshop in January 1996.

    In 2010, ISBA instituted the Pilar Iglesias Travel Award in Pilar’s honour. Recipients to date include Delson Chivabu (South Africa), Jose Ramiro (Chile), Fernando do Nascimento (Brazil), and Francisco Torres-Arles (Chile).


In 2007, Samprit Banerjee, Brian Yandell and Neng jun Yi co-authored a seminal paper ‘Bayesian Quantitative Trait Loci Mapping for Multiple Traits’ in Genetics.

    In 2008 Brian Yandell taught a course on Quantitative Trait Loci (QTL) mapping at the University of Wisconsin-Madison. The goals of his QTL study included (1) discovering underlying biochemistry (2) finding useful candidates for medical intervention (3) discovering how the genome is organised (4) discerning units of natural selection, and (5) predicting phenotype or breeding value. An outstanding enterprise.

    In their 2008 paper in JASA, Dimitris Fouskasis and David Draper compared stochastic optimization methods for variable selections in binary outcome prediction, with application to health policy. They published two further important Bayesian papers in 2009, with Ioannis Ntzoutras, which related to Bayesian variable selection with application to cost-effective measurement of quality of health care. Fouskasis and Ntzoutras? They sound like two up-an-coming names for the future. 



 
Dimitris Fouskasis Ioannis Ntzoutras
 

    David Draper advocates ‘Bayesian-Frequentist fusion’, and thinks that Gauss, Galton and Fisher were early Bayesians who may have fused. He is a Professor of Applied Mathematics and Statistics at the University of California at Santa Cruz. A past president of ISBA, he is a prolific applied Bayesian researcher and prize-winning short-course teacher. There’s a great mug shot of him on the Internet.

    In their 2008 JASA papers, Chiara Sabatti and Kenneth Lange used Bayesian Gaussian mixture models to analyse high-density genotype arrays, Shane Jensen and Jun Liu used a Bayesian clustering procedure to analyse transcription factor binding motifs, Abel Rodriguez, David Dunson and Alan Gelfand drew nested Dirichlet processes to the world’s attention, and David Dunson, Ya Xue and Lawrence Carin used a matrix stick-breaking process to develop a flexible Bayesian meta-analysis. I enjoy breaking my matrix sticks at bedtime, Professor Dunson, and you’re one of the world’s leading Bayesian researchers.

    In the same year, the Spanish mermaid Carmen Fernandez and her three not-so-fishy co-authors reported their Bayesian analysis of a two stage biomass model for the Bay of Biscay anchovy, in the ICES Journal of Marine Science. Carmen is a very enthusiastic research scientist at the Spanish Institute of Oceanography, and she has published 33 very useful papers in Statistics and Fisheries journals, several of them jointly with her friend and mentor Mark Steel of the University of Warwick.

 
Carmen Fernandez
 

Also in 2008, Lei Sun and Murray Clayton co-authored ‘Bayesian analysis of cross-classification spatial data, and Murray Clayton and his six co-authors reported ‘Predicting spatial patterns of fire on a California landscape’ in the International Journal of Wildland Fire.

    Murray Clayton is Professor of Statistics and Plant Pathology at the University of Wisconsin-Madison. As one of the leading Canadian Bayesians he has published a number of high quality theoretical articles, and many application papers e.g, in the agricultural sciences. As a student of Don Berry of the University of Minnesota, Murray is well able to mix Bayesian theory with practical relevance. Don has done all sorts of important things too, as well as running a remunerative Bayesian consultancy business.

 
Murray Clayton
 

    Charles Franklin, Murray’s eminent colleague in the University of Wisconsin’s Department of Political Science, teaches multitudinous advanced level Bayesian methodology courses to social scientists. Up to a few years ago, he was using my book [15] with John Hsu for course material, with lots of emphasis on the first chapter on likelihood formulations and frequency procedures. I’m glad that his students could understand it. The Statistics graduate students at Wisconsin used to experience lots of difficulties constructing the likelihoods, particularly if they hadn’t bothered to take Advanced Calculus, and some of them didn’t even multiply from 1 to n. But after that Bayes was a snitch (as long as they were told what the prior was!). I was once advised that the best thing I taught them was ‘that neat formula for completing the sum of squares’. Anyway, Professor Franklin is getting more package orientated nowadays, so some of the arts of yore may be getting lost.

    Trevor Park and George Casella introduced the Bayesian Lasso in JASA in 2008, as an interpretation of the Lasso of Tibshirani which estimates linear regression coefficients through constrained least squares. The authors showed that the Lasso estimate can be interpreted as a Bayesian posterior modal estimate when the parameters have independent Laplace double exponential priors. A connection with independent double exponential distributions provides full conditional posterior distributions for MCMC, and the interval estimates provided by the Bayesian Lasso help to guide variable selection.

 
George Casella
 

Alan Izenman published his celebrated book Modern Multivariate Analysis Techniques in the same year. Alan takes a broad perspective on multivariate analysis in the light of the remarkable advances in computation and data storage and the ready availability of huge data sets which have been the keys to the growth of the new disciplines of data mining and machine engineering, Meanwhile, the enormous success of the Human Genome Project has opened up the field of bioinformatics. The book presents an integrated mixture of theory and applications, and of classical, Bayesian, and modern multivariate analysis techniques.

    Alan Izenman is Senior Research Professor in Statistics at Temple University. In 1976 he and Sandy Zabell investigated the 1965 New York City blackout and showed that this did not substantively affect the city’s birth-rate as previously advertised. See [15], p95. In this case ‘induced births’ provide the confounding variable. A wonderful message to popularist statisticians everywhere.

    Alan also researches on the interaction between Statistics and the Law, and he has used Bayesian methods to draw inferences about the amount of drugs previously smuggled in by a defendant in the pit of his stomach.

 

Alan Izenman

 

THE INLA PACKAGE:

During 2009, two professors from Norway and a Frenchman from Paris published a breath-taking paper in JRSSB which, together with the discussion thereof, graced 73 pages of the Society’s journal.

    Approximate Bayes inference for latent Gaussian models by using integrated nested Laplacian approximations is doubtlessly the most exciting and far-reaching Bayesian paper of the 21 st. century. The now much-celebrated co-authors were Håvard Rue and Sara Martino of the Norwegian University for Science and Technology in Trondheim, and Nicholas Chopin from the Research Centre for Economics and Statistics in Paris.

    Much of Scotland was once part of the Archdiocese of Trondheim, and maybe that should now include Edinburgh, and perhaps even London too. The co-authors completed all of their computations by application of INLA, their computer package which is thoroughly documented by Martino and Rue [71]. This manual refers to Sara Martino’s 2007 Ph.D. thesis and to review papers by Ludwig Fahrmeir and Gerhard Tutz, and others.

    The co-authors wrote, in summary,

Structurally additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others,(generalised) linear models, (generalised) additive models, smoothing spline models, state space models, semi-parametric regression, spatial and spatio-temporal models, log-Gaussian Cox processes, and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginal are not available in closed form owing to the non-Gaussian response variables. For such models, MCMC methods can be implemented, but they are not without problems, in terms of both convergence and computational time.


    In some practical applications, the extent of these problems is such that MCMC is simply not an appropriate tool for routine analysis. We show that, by using an INLA approximation and its simplified version, we can compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational; where MCMC algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures, so that models can be compared and the model under study can be challenged.


    What more can I say? Well, here goes-----The important influences of the numerically highly accurate conditional Laplacian approximations of the 1980s, and the soundly based Importance Sampling computations which began in 1978, took a fair drenching during the MCMC fever of the 1990s. It may take a little time, but the tide is beginning to turn towards techniques which can be algebraically justified and which will more fully test the mathematical expertise of our Ph.D. students and up-and-coming researchers as they strive to develop technical talents of their own.

    Generalisations of conditional Laplacian approximations can be used to address most sampling models under the sun, not just the wonderful range of structurally additive models considered by Rue, Martino, and Chopin. Our journals will become filled with wondrous algebra once again, and our computations will be extraordinarily accurate right down the tails of our marginal posteriors.

    It is as if the Valhalla-esque Gods Woden and Thor and the Goddess Freyja of Fólkvangr have returned to guide us. Maybe the more ardent of our never-ever-convergent simulators will decide to take the night bus to Vulcan, the Ninth World, or wherever.

 
 
Sara Martino Havard Rue
 

In their 2009 JASA paper, David Henderson, Richard Boys, Kim Krishnan, Conor Lawless and Darren Wilkinson described their Bayesian emulation and calibration of a stochastic computer model of mitochondrial DNA deletions in nigra neurons.

    Moreover, Athanasios Micheus and Christopher Wikle investigated their hierarchical non-overlapping random disc growth model, Pulak Ghosh, Sanjib Basu and Ram Tiwari performed a Bayesian analysis of cancer rates using parametric and semi-parametric joinpoint regression models, Jeff Gill and George Casella described their specification and estimation of non-parametric priors for ordinal social science models, and Susie Bayarri, Jim Berger and seven worthy co-authors predicted vehicle worthiness by validating computer models for functional and hierarchical data,

    Not to forget Christopher Paciorek and Jason McLachlan’s mapping of ancient forests. These worthy co-authors developed Bayesian inferential techniques for spatio-temporal trends in forest composition using fossil pollen proxy record.

    Beat that! JASA excelled itself in 2009. However, never one to take the back seat, our very own Bradley Efron described his empirical Bayes estimates for large-scale prediction problems. Do you remember the proud days of yore when you were shrinking group means towards zero and inventing the bootstrap, Brad?


Still in 2009, Eva Riccomagno and Jim Smith reported their geometry of causal probability trees which are algebraically constrained, in the co-edited volume Optimal Design and Related Areas in Optimization and Statistics.

    Gee whiz, Charlie Brown! I never knew that Statistics could prove causality. I’m sure that the eighteenth century philosopher David Hume (see my Ch.2) wouldn’t have approved of causal probability trees unless they were called something different.

    And Fabio Rigat and Jim Smith published their non-parametric dynamic time series approach, which they applied to the detection of neural dynamics, in the Annals of Applied Statistics, a wonderful contribution.

    Professor Jim Q. Smith’s highly innovative Bayesian research began at the University of Warwick during the 1970s, and it is still continuing unabated. He published his book Bayesian Decision Analysis with Cambridge University Press in 2010, and has recently worked on military training applications of his ‘decision making under conflict’ procedures.

    That’s an interesting application of posterior expected loss, guys. I hope that the soldiers benefit from maximising their expected utility. Of course, if they’re dead then optimal long term expectations won’t help them.

    Jim is currently the holder of an EPSRC grant with Liz Dowler and Rosemary Collier to investigate ways groups of experts can ensure coherence of their judgements when managing food crises. That’s a tall order, Jim. Many of our working population are currently starving and struggle to remain both coherent and in the land of the living.

    To cap Jim’s exploits, M.H. Rahaman Khan and Ewart Shaw published a paper in the International Journal of Interdisciplinary Social Sciences in 2009. In this paper the reported their hierarchical modeling approach for investigating for determinants of contraceptive use in Bangladesh.

    That reminds me of the time I got a rubber stuck in my ear during a chess tournament in Oshkosh, Wisconsin, guys. All the medics just stood around and laughed. But I won the tournament and became champion of North-East Wisconsin for 1992.

    Ewart Shaw is Principal Teaching Fellow in Statistics at the University of Warwick. He is also a well-published researcher, and his research interests include Bayesian Inference, numerical methods, number theory, coding theory, computer algebra in statistics, survival analysis, medical statistics and splines.

    Ewart’s a big contributor and deserves more of the cherry pie.

    Speaking of cherry pie, Lyle Broemeling’s 2009 text Bayesian Methods for Measures of Agreement draws on data taken from various studies at the University of Texas MD Anderson Cancer Center. An admirable enterprise.

    And Mohammad Raqab and Mohamed Madi co-authored an article in Metron in 2009 describing their Bayesian analysis for the exponentiated Raleigh distribution.

    Madi is Professor of Statistics and Associate Dean of Economics and Business at the United Arab Emirates University, and a prolific Bayesian researcher.

    Raqab is Professor of Statistics at the University of Jordan in Amman.



In 2009, ISBA awarded the prestigious De Groot Prize to Carl Edward Rasmussen and Christopher K.I. Williams for their book Gaussian Processes for Machine Learning.

    To cap that, Sandy Zabell described his philosophy of inductive logic from a Bayesian perspective in The Development of Modern Logic (ed.by Leila Haaparenta for Oxford University Press). Sandy would’ve doubtlessly got on well with Richard Price.

    In the same year, Sebastjan Strasek, Stefan Lang and numerous co-authors published a paper in the Annals of Epidemiology with the impressively long title,

Use of penalized splines in extended Cox-Type hazard regression to flexible estimate the effect of time-varying serum uric acid on risk of cancer-incidence: a prospective population study in 78850 men.


Stefan Lang
 

    Stefan Lang, is a University Professor of Statistics in Innsbruck. His research interests include Bayesian semi-parametric regression, and applications in marketing science, development economics and insurance mathematics.
 

Rob Kass wrote an excellent analysis of Sir Harold Jeffreys’ legacy in Statistical Science in 2009, largely by reference to Jeffreys’ Theory of Probability. Sir Harold often approximated the posterior distribution by a normal distribution centred on the maximum likelihood estimate, and he was also a great fan of Bayes factors. So he weren’t perfect.

    Gene Hwang, Jing Qiu and Zhigen Zhao of the Universities of Cornell and Missouri reported their empirical Bayes confidence intervals in 2009 in JRSSB. Their estimates and intervals smooth and shrink both the means and the variances in the heterogeneous one-way ANOVA model.

    Quite remarkably, the authors make exactly the same exchangeable prior assumptions for the treatment means and log-variances that I proposed in my 1973 Ph.D. thesis and in my 1975 paper in Technometrics. However, the authors derive some elegant, but approximate double-shrinkage confidence intervals, empirically estimate the hyperparameters in quite appealing fashion, and algebraically derive some outstanding frequency coverage probabilities. Perhaps I should put their solution into another Appendix. I only wish that I’d had the nous to derive these very useful results myself.


BAYESIAN GENOMICS
: In 2009, the much-respected high-flying professors Matthew Stephens and David Balding of the University of Chicago and Imperial College London published their paper, ‘Bayesian statistical methods for genetic association structure’ in Nature Reviews Genetics.

    The authors write,

 

Bayesian statistical methods have recently made great inroads into many areas of science, and this is now extending to the assessment of association between genetic and disease of other phenotypes. We review these methods, focussing on single SNP tests in genome-wide association studies. We discuss the advantages of the Bayesian approach over classical (frequency) approaches in this setting and provide tutorials in basic analysis steps, including practical guidelines for appropriate prior specification. We demonstrate the use of Bayesian methods for fine mapping in candidate regions, discuss meta-analysis and provide guidance for refereeing manuscripts that contain Bayesian analysis.


The approach subsequently reported by the authors depends almost entirely on the sorts of Bayes factors which I have critiqued during the course of this concise history, and Matthew and David do make some attempts to address the paradoxes, prior sensitive problems, and poor frequency properties that are usually associated with this ‘measures of evidence. However, Matthew and Stephen could circumvent these difficulties by associating a Baskurt-Evans-style Bayesian p-value with each of their Bayes factors.

    Why don’t you trying perturbing your conditional prior distributions under your composite alternative hypotheses with a tiny blob of probability way out in the right tail, guys? I think that your Bayes factors, however fine-tuned, would go bananas. Keep the mass of the blob of probability constant and zoom it right off towards infinity. I think that the tails of the ‘outlier prone’ mixture priors you select will still be too thin to adequately accommodate this.


[Professor Balding kindly responded to these comments in early January 2014 by partly agreeing with them. He feels that he and his co-author should have focussed more on an estimation, rather than a Bayes factor approach. However, Professor Stephen has advised us that their Bayes factors would remain stable if we let a blob of probability in the right tail zoom off to infinity. This surprises me since their conditional prior distributions under the alternative hypothesis refer to mixtures of thin-tailed normal distributions. However, the Bayes factors will anyway be highly dependent upon the choices of these conditional prior distributions. David advises me that Bayes factors were first proposed in this context by Peter Donnelly and his co-authors in 2007 in their landmark paper in Nature]

The derivations proposed by Stephens and Balding do depend heavily on specific genetic assumption. Perhaps we should look for something entirely different, like a genetic-assumption-free direct data analysis of the statistical observations provided, which could then be parametrized by a mainstream statistical sampling model.


During 2009, Byron Morgan and his several co-authors published their book Bayesian Analysis for Population Ecology with CRC Press.

 
Byron Morgan
 

    Byron Morgan is Professor of Applied Statistics at the University of Kent, and his research interests include Bayesian methods and population dynamics, stochastic models for molecular biology, and statistical ecology.


In their 2010 article in the Annals of Applied Statistics, Xia Wang and Dipak Dey applied their Bayesian generalised extreme value regression methodology for binary response data to an application to electronic payments system adoption. Xia is an assistant professor at the University of Cincinnati, with an interest in applying her Bayesian ideas to genomics and proteomics data, and spatial and spatio-temporal statistics. She is clearly a very bright young researcher. In their article in the JSM 2011 Proceedings, she and Nell Sedransk reported their analysis of some Bayesian models on biomarker discovery using spectral count data in the label-free environment.

 
Xia Wang
 

    The eminent Indian statistician Dipak Dey has co-authored several papers with Xia Wang. He is a Distinguished Professor of Statistics at the University of Connecticut, and his many influential Bayesian and decision theoretic publications include his 1998 book Practical nonparametric and semi-parametric Bayesian statistics,

    Xia is in good company. Her buddy Nell Sedransk is the Associate Director of the U.S. National Institute of Statistical Sciences, and Professor of Statistics at North Carolina State University.

    During the late 1970s, Nell and her husband Joe were two of the last faculty members to leave the once celebrated Department of Statistics at SUNY at Buffalo, which did not resurrect itself until several years afterwards. Nell and Joe were always very dynamic, and Nell has made many Bayesian contributions. Her application areas of interest include physiology, medicine, multi-observer scoring in the social sciences, and ethical designs for clinical trials.

 
Nell Sedransk
 

    Joe Sedransk is currently Professor of Statistics at Case Western University in Cleveland. He has also published a number of important Bayesian papers. I met Nell and Joe when I gave a seminar at SUNY at Albany in 1978. They were very kind and hospitable and took me for a drive in Vermont. I ate a massive T-bone steak for lunch, and felt guilty afterwards because it cost so much.


In their 2010 articles in JASA, J. McLean Sloughter, Tilmann Gneiting and Adrian Raftery used Bayesian model averaging and ensembles for probabilistic wind forecasting, and Lu Ren, David Dunson, Scott Lindroth and Lawrence Carin attained the fabled heights by analysing music using dynamic semi-parametric Bayesian models. Scott Lindroth is Professor of Music and Vice-Provost for the Arts at Duke University.

    Moreover, Kwang Woo Ahn, Kung Sik Chan and my old friend Michael Kosov addressed a problem in pathogen diversity using their Bayesian inferences for incomplete multinomial data, and Soma Dhavala and six determined co-authors performed a gene expression analysis of their bovine salmonella data by reference to their Bayesian modeling of MRSS data.

    Not to be outdone, Jonathan Stroud and four quite predictable co-authors developed an ensemble Kalman filter and smoother for assimilating satellite data, and Morgan C.Wang, Mike Daniels, Daniel Scharfstein and Susan Land proposed a Bayesian shrinkage model for incomplete longitudinal data and applied it to a breast cancer prevention trial.


Also in 2010, Teddy Seidenfeld, Mark Schervish and Jay Kadane reported their coherent loss functions on uncertainty in Synthese, and Jay Kadane published ‘Amalgamating Bayesian experts: a sceptical view’ in Rethinking Risk Measurement and Reporting, Vol. 1. All deep blue high Bayesian stuff.

    Good on you, Jay! I’m sceptical too.

    In the same year, Tom Fearn et al reported their inverse, classical, and non-parametric calibrations in a Bayesian framework, in the context of infrared spectroscopy, in the Journal of Near Infrared Spectroscopy.

    Tom is Head of Statistical Science at UCL. He has worked in many application areas, including food and agriculture, analytic chemistry, and medicine. His 1975 Biometrika paper ‘A Bayesian Approach to Growth Curves’ came out of his UCL Ph.D. thesis. His Ph.D. supervisor was Dennis Lindley, and his department is still as active as ever, though in less charismatic ways.    

    Manuel Wiesenfarth and Thomas Kneib attempted in 2010 to use Bayesian geoadditive selection models in JRSSC to correct for non-randomly selected data in a two-way hierarchy, a most ambitious task worthy of Hercules himself.

    The authors’ selection equation was formulated as a two-way binary probit model, and the correlations between the response variables were induced by a latent Gaussian model representation. Uniform priors were assumed for the parametric effects, Bayesian P-splines were used to model the non-parametric effects, a Markov random-field was used to model the spatial effects, and further prior assumptions were made to facilitate a full hierarchical Bayesian analysis. The MCMC computed posterior inferences led to a very interesting analysis of a set of relief supply data from communities in Pakistan which were affected by the 2005 earthquake in Azad Jammu Kashmir province. A wonderful piece of work.

    In the same year, Qi Long, Rod Little, and Xihong Lin addressed, in their paper in Applied Statistics, the conceptually formidable task of estimating ‘causal’ effects in trials involving multi-treatment arms. They applied their theoretically intense Bayesian procedure to the analysis of the ‘women take pride’ cardiac bothersome data, and reported the posterior mean and standard deviation of a variety of ‘causal’ effects, together with the corresponding 95% Bayesian credible intervals. A difficult article to unravel, but I am sure that it was all good stuff.

    Christophe Andreu, Arnaud Doucet and Roman Holenstein read a long invited discussion paper to the Royal Statistical Society in 2009 and it was published the following year in the Series B of the Society’s journal , at which time the new terminology ‘Particle MCMC’ was thrust like a ray of beaming sunlight into the Bayesian literature. Particle MCMC can be used to deal with high dimensionality and complex patterns of dependence in statistical models. Whether PMCMC works well in practice, only time will tell. Maybe we need a special journal for purely computational articles like this.

    In their 2010 article ‘Perceiving is believing: A Bayesian approach to explaining the positive symptoms of schizophrenia’ in Nature Reviews: Neurosciences, Paul Fletcher and Chris Firth of the Universities of Cambridge and Aarhus write:


Advances in cognitive neuroscience offer us new ways to understand the symptoms of mental illness by uniting basic neurochemical and neurophysiological observations with the conscious experiences that characterize these symptoms. Cognitive theories about the positive symptoms of schizophrenia —hallucinations and delusions--- have tended to treat perception and belief formation as distinct processes. However, recent advances in computational neuroscience have led us to consider the unusual perceptive experiences of patients and their sometimes bizarre beliefs as part of the same core abnormality---a disturbance in error-dependant updating of inferences and beliefs about the world. We suggest that it is possible to understand these symptoms in terms of a disturbed hierarchical Bayesian framework, without recourse to separate considerations of experience and belief.

Thank you, gentlemen. Perhaps you should consider calcification of the pineal gland as a confounding variable.

    Sonia Petrone and Piero Veronese co-authored ‘Feller Operators and mixture priors in Bayesian Non-Parametrics’ in 2010 in Statistica Sinica.

    Sonia is a Bocconi Full Professor of Statistics at Bocconi University. She will be the next president of ISBA in 2014. Good luck, Sonia!

 
 
Sonia Petrone (ISBA President 2014)
 

    In 2010, Murray Aitkin published his path breaking book Statistical Inference: An Integrated Bayesian/Likelihood Approach. Rather than using Bayes factors or DIC, Murray refers to likelihood ratios as the primary measure of evidence for statistical model parameters and for the models themselves. He then uses Bayesian non-informative inferences to interpret the likelihood ratios.

    For further discussion, see Author’s Notes (below). I have always admired Murray Aitken’s ingeniously pragmatic approach to Bayesian inference.


Alicia Carriquiry and three courageous co-authors reported their Bayesian assessment of the effect of highway passes on crashes and crash rates in 2011 in the Journal of Safety Research.

    Alicia is Professor of Statistics and Associate Provost at Iowa State. She served as president of ISBA in 2001.

    In 2011, Jay Kadane was awarded the much-sought-after De Groot prize by ISBA for his eloquently written book Principles of Uncertainty.

    Daniel Thorburn published his conference paper ‘Bayesian probability and methods’ in the Proceedings of the first Africa-Sweden Conference in Mathematics in the same year.

    Thorburn is Professor of Statistics at the University of Stockholm, and he has, with Håvard Rue, done much to foster the Bayesian paradigm in Scandinavia.

    Rob Kass published an invited discussion paper in Statistical Science in 2011 entitled ‘Statistical Inference; The Big Picture.’

    Kass wrote to the effect that,

Statistics has moved beyond the frequentist-Bayesian controversy of the past. Instead a philosophy compatible with statistical practice which I refer to here as ‘statistical pragmatism’ serves as a foundation for inference. Statistical pragmatism is inclusive and emphasise the assumptions that connect statistical models with observed data.

    In his diagram of the ‘big picture’, Rob connects the data of the real world with the interactions between the scientific models and statistical models of the theoretical world, and indicates that the data and the interaction between the scientific and statistical models together imply the statistical and scientific conclusions.

    That’s all too true, Professor Kass. You’re teaching some of us Grannys how to suck eggs. But don’t forget to emphasise that the statistician should be interacting with the scientific expert as often as he can during this process.


In the same year, William Kleber, Adrian Raftery and Tilmann Gneiting co-authored a paper in JASA entitled ‘Geostatistical model averaging for locally calibrated probabilistic quantitative precipitation forecasting’. I am sure that they took Rob Kass’s notion of statistical pragmatism to heart.

    In 2011, Jennifer Hill published her paper Bayesian Non-Parametric Modeling for Causal Inference in the Journal of Computational and Graphical Statistics.

    Jennifer's very insightful second line of work pursues strategies for exploring the impact of violations of typical assumptions that require that all possibly confounding variables have been measured. She is a very dynamic Associate Professor of Applied Statistics at NYU.
 
Jennifer Hill
 

Brian Reich, Montserrat Fuentes and David Dunson proposed some brand new methods for Bayesian spatial quantile regression in their outstanding 2011 lead article in JASA.

    As they were a touch keener on money supply, Alexandro Cruz-Marcelo, Katherine Ensor and Gary Rossner investigated corporate bonds by using a semi-parametric hierarchical model to estimate term structure.

    David Dunson retaliated by investigating non-parametric Bayes stochastically ordered latent class models with Hong-Xia Yang and Sean O’Brien.

    In a rare, high quality, single-authored paper, Yhua Zhao analysed high throughput assays by reference to posterior probabilities and expected rates of discovery in multiple hypothesis testing.

    And Shane Jensen and Stephen Shore rolled out the barrel by using semi-parametric Bayesian modeling to investigate volatility heterogeneity,

    Other highlights of 2011 included the tutorial Bayesian Non-Parametrics with Peter Orbanz of Columbia University and Yeh Whye Tey of Oxford University, which was taught in the context of Machine Learning,

    A novel data-augmentation approach in JASA by J. Ghosh and Merlise Clyde employinb Rao-Blackwellization for variable selection and model averaging in linear and binary regression,

    A paper in Demography by Adrian Raftery and six co-authors entitled, ‘Probabilistic Properties of The Total Fertility Rate of All Countries’,

    An article in the Proceedings of Maximum Entropy by Jonathan Botts and Ning Xiang on Bayesian inference for acoustic impedence,

And,

    A splendid paper in JASA by Qing Zhou on multi-domain sampling with application to the structural inference of Bayesian networks.

 
 

George Casella (1951-2012), a leading figure in the field of Statistics passed away in June 2012, after a nine-year battle with multiple myeloma. He was 61.

    According to Ed George and Christian Robert,

    George Casella’s influence on research and education in Statistics was broad and profound. He published over 200 articles, co-authored nine books, and mentored 48 MS and Ph.D. students. His publications included high impact contributions to Bayesian analysis, clustering, confidence estimation, empirical Bayes, frequentist decision theory, hypothesis testing, model selection, Monte Carlo methods, and ridge regression. Of his books, Statistical Inference (with Roger Berger) became the introduction of choice to mathematical statistics for vast numbers of graduate students; this is certainly the book that had the most impact on the community at large.

    In 1996, George joined a legendary figure of Statistics, Erich Lehmann, to write a thorough revision of the already classical Theory of Point Estimation which Lehmann had written himself in 1983. This collaboration resulted in a more modern, broader, and more profound book that continues to be a key reference for courses in mathematical statistics.

    An ISI highly cited researcher, George Casella was elected a Foreign Member of the Spanish Royal Academy of Sciences, selected as a Medallion lecturer for the IMS, and received the Distinguished Alumnus Award from Purdue University. His laughter remains with us. Forever.


In 2012, David Rios Insua, Fabrizio Ruggeri and Michael Wiper published their splendid volume on Bayesian Analysis of Stochastic Process Models. This follows Rios Insua’s and Ruggeri’s publication in 2000 of their co-edited collection of lecture notes, Robust Bayesian Analysis.

    Michael Wiper is on the faculty of the Carlos the Third University of Madrid. His fields of interest include Bayesian Statistics, inference for stochastic processes, and software reliability.

    Fabrizio Ruggeri is a research director at the Italian National Research Council’s Institute of Mathematical Applications and Information Technology in Milan. He was an outstanding president of ISBA in 2012, and his areas of Bayesian application include healthcare, quality and reliability.

    David Rios Insua is full professor at Rey Juan Carlos University in Madrid. He is the son and disciple of Sixto Rios (1913-2008), the father of Spanish Statistics. Sixto founded the Decision Analysis group of the Spanish Royal Academy of Sciences during the 1960s, and David is the youngest Fellow of the same illustrious academy. He has applied his Bayesian methodology to neurononal networks, adversial risk analysis, counter-terrorism, and many other areas.

 
David Rios Insua
 

    Jésus Palomo, who was one of Rios Insua’s and Ruggeri’s Ph.D. students, was a cited finalist for ISBA’s Savage Prize in 2004. His thesis title was Bayesian methods in bidding processes.


In his JASA paper of 2012, Tristan Zajoric reported his Bayesian inferences for dynamic treatment regimes, and used them to improve mobility, equity and efficiency in student tracking.

    In the same issue, Alexandro Rodrigues and Peter Diggle used Bayesian estimation and prediction for low-rank doubly stochastic log-Gaussian Poisson process models, with fascinating applications in criminal surveillance.

    If that wasn’t enough to put me into a neurosis, Lane Burgette and Jerome Reiter described some non-parametric Bayesian imputation techniques when some data are missing due to the mid-study switching of measurement methods, and, to cap that, the ubiquitous Valen Johnson and David Roswell investigated Bayesian model selection criteria in high-dimensional settings.

    In their 2012 JASA discussion paper, the remarkable Bayesian quintet consisting of Ioanna Manolopoulov, Melanie Matthew, Michael Calaban, Mike West and Thomas Kepler employed Bayesian spatio-dynamic modeling in cell motality studies, in reference to nonlinear taxic fields guiding the immune response.

    Meanwhile, the hyperactive quartet of Laura Hatfield, Mark Boye, Michelle Hackshaw and Bradley Carlin used multi-level models to predict survival times and longitudinal patent-reported outcomes with many zeros.

    And in the lead article of the 500 th. issue of the Journal of the American Statistical Association, the Famous Five, namely William Astle, Maria De Iorio, Sylvia Richardson, David Stephens and Timothy Ebbels, investigated their Bayesian model of NMR spectra for the deconvolution and quantification of metabolites in complex biological mixtures.

    Also in the celebrated 500 th. issue, Michelle Danaher, Anindya Roy, Zhen Chen, Sunni Mumford and Enrique Silverman analysed the Biocycle study using Minkowski-Wehl priors for models with parameter constraints. A vintage year for the connoisseurs!


During August 2012, Tony O’ Hagan led an interdisciplinary ISBA online debate entitled Higgs Boson-Digest and Discussion about the statistics relating to the Higgs Boson and the Hadron Collider, which concerned the physicists predetermined standards for concluding that a particle resembles the elusive Boson. The physicists require a test statistic to be at least five standard errors from a null hypothesis, but didn’t understand how to interpret this e.g. in relation to practical significance when the confidence interval is very narrow. Some of the participants in the debate ignored the possibility that the test statistic might not be approximately normally distributed. While the physicists take their observed counts to be Poisson distributed, this assumption could itself be inaccurate since it could be influenced by overdispersion. However, a number of imaginative solutions were proposed by the Bayesian participants.

 
 
Simulated particle traces from an LHC collision in which a Higgs Boson is produced.
Image Credit: Lucas Taylor
 

Other highlights of the year 2012 included the following papers in the Royal Statistical Society’s journals:

Species non-exchangeability in probabilistic ecotoxicological risk assessment, by Peter Craig, Graeme Hickey, Robert Luttik and Andy Hart,

Space-time modelling of coupled spatiotemporal environmental variables, by Luigi Ippoliti, Pasquale Valentini and Dani Gamerman,

Bayesian L-optimal exact design for experiments with biological kinetic models, by Steven Gilmour and Luzia Trinca

Combining outputs from the North American Regional Climate Change Assessment Program by using a Bayesian hierarchical model, by Emily Kang, Noel Cressie and Stephan Sain,

And,

Variable selection for high dimensional Bayesian density estimation to human exposure simulation, by Brian Reich and his four worthy but under exposed co-authors.


Thomas Bayes was the real winner in the US presidential elections on 9 th. November 2012, according to a message to ISBA members from Charles Hogg. In 2010, Charles had published a magnificent discussion paper in Bayesian Analysis with Jay Kadane, Jong Soo Lee and Sara Majetich concerning their inferential Bayesian error analysis for small angle neutron scattering data sets.

    As reported by Charles Hogg, Nate Silver constructed a Bayesian model in 2008 to forecast the US general election results. Silver won fame for correctly predicting 49 of the 50 States, as well as every Senate race. That brought him a New York Times column and a much higher profile.

    In 2012, Nate’s continuing predictions that Obama would win earned him a curious backlash among pundits. While few of the criticisms had any merit, most were mathematically illiterate, indignantly mocking the idea that the race was anything other than a toss-up, Nevertheless, Nate confounded his critics by correctly predicting every single state.

    Charles Hogg was quick to advise us that Nate did strike lucky in Florida. Nate ‘called’ this state with a 50.3% Bayesian probability, essentially the proverbial coin-toss, a lucky gold coin perhaps. Way to go, Mr. Silver! Did you use a Jeffreys prior or a conjugate one?

    In his 2013 Kindle book The Signal and the Noise, which is about Bayesian prediction in general, Nate Silver assigned a prior probability of 1/20000 to the event that at least one plane is intentionally crashed into a Manhattan skyscraper on a given day. He then used Bayes theorem to update his prior probability to a probability of 0.385 that one plane crash is part of a Terrorist attack, and then to a probability of 99.99% that two plane crashes amount to a terrorist attack.

 
Nate Silver
 

    Maybe Nate should rename his book The Sound and the Fury, though a guy called Bill Faulkner once used a title like that. Calling it The Power and the Glory would, nowadays, be much too naff.

    In July 2013, Nate left the New York Times for a role at ESPN and ABC news. Perhaps he will provide us with a Bayesian commentary on the next Superbowl game. I have a prior probability of 0.2179 that the Green Bay Packers will win, but a sneaking suspicion that the Cowboys will take it.
 

In the June 2013 issue of Scientific American, Hans Christian Bauer debates whether Quantum Bayesianism can fix the paradoxes of Quantum Mechanics. The author writes:

A new version of quantum theory sweeps away the bizarre paradoxes of the microscopic world.

The cost? Quantum information only exists in your imagination.

That’s a great application of the Bayesian paradigm, Mr. Bauer, or I at least imagine so.


The 2013 volume Bayesian Theory and Applications, edited by Paul Damien, Petros Dellaportas, Nicholas Polson and David Stephens, contains 33 exciting papers and includes outstanding sections on dynamic models, and exchangeability.

    Andrew Lawson’s 2013 book Bayesian Disease Mapping: hierarchical modeling in spatial epidemiology highlighted some of the recent applications of the 2008 Rue-Martino computer package [71] for conditional Laplacian methodology.

    Various applications of INLA to Bayesian non-phylodynamics are described by Julia Palacios and Vladimir Minin in the Proceedings of the 28 th. Conference on Uncertainty in Artificial Intelligence.

    In the 2013 issue of Biometrics & Biostatistics, Xiao-Feng Wang of the Cleveland Learner Research Institute describes some applications of INLA to Bayesian non-parametric regression and density estimation.

    INLA has arrived and the pendulum is beginning to swing! Maybe right back to the Halcyon days of the mathematically Bayesian theoretical research era of the 1970s and 1980s, no less, when Bayesians needed to know all sorts of stuff.


In his 2013 YouTube video What are Bayesian Methods?  Professor Simon French said that after he took his first statistics course in 1970 he thought that ANOVA was a Russian mathematician. However, his first course in Bayesian Statistics changed his life, and he subsequently learnt much more about the paradigm from Adrian Smith.

    Simon French is Director of the Risk Initiative and Statistical Consultancy Unit at the University of Warwick. His 1986 text Decision Theory was followed by his book Statistical Decision Theory with David Rios Insua. However, Simon’s work has now become generally more applied. He is looking at ways of supporting real-life decision makers facing major strategic and risk issues.


In their JASA 2013 articles, Roee Gutman, Christopher Afendulis and Alan Zaslavsky proposed a Bayesian file-linking procedure for analysing end-of-life costs, and Man-Wai Ho, Wanzhu Tu, Pulak Ghosh and Ram Tiwari performed a nested Dirichlet process analysis of cluster randomized trial data for geriatric care assessment;

    Riten Mitra, Peter Müller, Shoudan Liang, Lu Yue and Yuan Ji investigated a Bayesian graphical model for chIP-Seq data on histone modifications, Drew Linzer recounted the dynamic Bayesian forecasting of U.S. presidential elections, and Curtis Storlie and his five reliable co-authors reported their Bayesian reliability analysis of neutron-induced errors in high performance computing software;

    Yueqing Wang, Xin Jiang, Bin Yu and Ming Jiang threw aside their parasols and reported their hierarchical approach for aerosol retrieval using MISR data, Josue Martinez, Kirsten Bohn, Ray Carroll and Jeffrey Morris put an end to their idle chatter and described their study of Mexican free-tailed bat chirp principles, which employed Bayesian functional mixed models for non-stationary acoustic time series, while Juhee Lee, Peter Müller and Yuan Ji chummed up and described their nonparametric Bayesian model for local clustering with application to proteomics;

    Jonathan Rougier, Michael Goldstein and Leanna House exchanged vows and reported their second-order exchangeability analysis for multimodel ensembles, and Francesco Stingo, Michele Guindani, Marina Vannucci and Vince Calhoun presented the best possible image while describing their integrative modeling approach to imaging genetics.

 
 

In March 2013, the entire statistical world mourned the death of George Edward Pelham Box F.R.S. (1929-2013) in Madison-Wisconsin in March 2013 at the ripe old age of ninety-four, in the arms of his third wife Claire. As a wit, a kind man, and a statistician, he and his ‘Mr. Regression’ buddy Norman Draper have fostered many successful careers in American industry and academia. Pel was a son-in-law of Sir Ronald Fisher and the father, with his second wife Joan Fisher Box, of two of Fisher’s grandchildren. Pel’s life, works and immortality should be celebrated by all Bayesians, because of the multitudinous ways in which his virtues have enhanced the diversity, richness and scientific reputation of our paradigm.

    Born in Gravesend, Box was the co-inventor of the Box-Cox and Box-Müller transformations and the Box-Pierce and Ljung-Box tests, a one-time army sergeant who worked on the Second World War defences to Nazi poisonous gases, and an erstwhile scientist with Imperial Chemical Industries. His much-publicised first divorce transformed his career, and his older son Simon Box also survives him.

 
Soren Bisgaard making a presentation to George Box
 

    I envision Pel sitting there watching us from the twelfth floor of the crimson cube of Heaven, kicking his dog while the Archangels Stephen One and Stephen Two flutter to his bidding, and with a fine-tuned secromobile connection to the, as ever obliging, Chairman of Statistics in hometown Madison. Even in his retirement and maybe in death, George Box was the quintessential ripple from above


On 27th August 2013, Alan Gelfand of Duke University received a Distinguished Achievement Medal from the ASA Section on Statistics and the Environment at the Joint Statistical Meetings in Montreal. The award recognizes his seminal work and leadership in Bayesian spatial statistics, in particular hierarchical modeling with applications in the natural and environmental sciences. Well done, Alan! You’re not just an MCMC whiz kid.

    I once nodded off during one of Alan’s seminars and missed out on the beef. When I woke up, I asked an utterly inane question. Ah well. You win some and you lose some.

    The other highlights of the year 2013 included the article in Applied Statistics by David Lunn, Jessica Barrett, Michael Sweeting and Simon Thompson of the MRC Biostatistics Unit in Cambridge on fully Bayesian hierarchical modelling with applications to meta-analysis. The authors applied their multi-stage generalised linear models to data sets concerning pre-eclampsia and abdominal aortic aneurism.

    Also, a paper in Statistics in Society by Susan Paddock and Terrance Savitsky of the RAND Corporation, Santa Monica on the Bayesian hierarchical semi-parametric modelling of longitudinal post-treatment outcomes from open enrolment therapy groups. The authors apply their methodology to a case-study which compares the post-treatment depressive symptom score for patients on a building discovery program with the scores for usual care clients.

    After reading the large number of Bayesian papers published in the March 2013 issue of JASA, in particular in the Applications and Case Studies section,  Steve Scott declared, on Google,

It clearly goes to show that Bayes has gone a long way from the intellectually appealing but too hard to implement approach to the approach that many practitioners now feel to be both natural and easy to use.
 

A RESPONSE TO SOME OF THE ISSUES RAISED IN DENNIS LINDLEY'S YOUTUBE INTERVIEW:

 

During the Fall of 2013, I participated in an on-line ISBA debate concerning some of issues raised in Dennis Lindley’s 2013 YouTube interview by Tony O’Hagan, who phrased some of his rather probing questions quite craftily. I was somewhat concerned when Dennis, now in his 90 th year, confirmed his diehard opposition to improper priors, criticised his own student Jose Bernardo, and confirmed that he’d encouraged Florence David to leave UCL in 1967. He, moreover, advised Tony that he’d wanted two colleagues at UCL to retire early because they weren’t ‘sufficiently Bayesian’. This led me to wonder how many other statistical (e.g. applied Bayesian) careers Dennis had negatively influenced due to his individualistic opinions about his subject. And whatever did happen to Dennis’s fabled velvet glove?

    I was also concerned about Dennis’s apparent long-surviving naivety in relation to the Axioms of Coherence and Expected Utility. He was still maintaining that only very simple sets of axioms are needed to imply their required objective, i.e. that you should be ‘coherent’ and behave like a Bayesian, and I found it difficult to decide whether he was ‘pulling the wool’, or unfortunately misguided, or, for some bizarre reason, perfectly correct. I was also more than a little concerned about Dennis’s rather bland adherence to the Likelihood Principle (LP) despite all the ongoing controversy regarding Birnbaum’s 1962 justification.

    I wonder what the unassuming Uruguayan mathematician Cesareo Villegas, who refuted the De Finetti axiom system as long ago as 1964, would have thought about all of this. He may well have been too tactful to say anything.



 

   

I was, quite surprisingly , able to effectively resolve some of the the preceding academic issues during subsequent very helpful e-mail conversations with Peter Wakker of Erasmus University Rotterdam and Michael Evans of the University of Toronto. These effective resolutions are discussed in my Ch. 5 and, concerning Bayes Factors, in my Ch. 2.


BIRNBAUM’S THEOREM
: As a side-track to the highly confidential ISBA Internet debate, I was e-mailed from out of nowhere, and while sitting quietly in my flat watching the nuns in skimpy gowns floating through the former Royal Botanical Gardens opposite, by Professor Deborah Mayo of the Department of Philosophy at Virginia Tech who advised me that she’d been publishing counterexamples to Allan Birnbaum’s 1962 justification of the Likelihood Principle (LP) for years. This was the first of a torrent of e-mails from Deborah; she seems to have been in some sort of dispute with Michael Evans over this fundamental issue. But life as a pragmatic Bayesian is pleasantly quieter since I switched her off.

    See, for example, Deborah Mayo’s 2013 article On the Birnbaum argument for the strong likelihood principle, which is googlable online, and the various references traceable from therein, some of which are co-authored with Sir David Cox and reflect his apparent agreement with her conclusions. See also [80]. Mayo’s website contains even more anti-Bayesian rhetoric, and lots of wild and wonderful ideas and suggestions.

 
Deborah Mayo
 

    However, Michael would appear to have swallowed the pill and resolved the issue in his 2013 article What does the proof of Birnbaum’s Theorem prove? [81]

 

Michael Evans, President, Statistical Society of Canada

 

    Birnbaum claimed that the Sufficiency Principle (SP) and Conditionality Principle (CP) together imply SP. Various concerns have been raised in the literature about the proof of this result e.g. by Jim Durbin, Jack Kalbfleisch, and Birnbaum himself. Michael has advised me, by personal communication and by reference to his 2013 paper, that

1. SP and CP do not in themselves imply LP.

2. Most of Deborah Mayo’s counterexamples to Birnbaum’s attempted proof are, quite surprisingly, perfectly correct and on the ball.

3. If we accept SP, CP, and ‘a class of equivalences associated with SP and CP’, then this is equivalent to accepting LP.

4. His 1986 claim, with Don Fraser and G. Monette, in the Canadian Journal of Statistics, that CP in itself implies LP, is incorrect.

5. If we accept CP and ‘a class of equivalences associated with CP’, then this is equivalent to accepting LP.

6. As to how should you try to ensure that the extra assumptions in (3) and (5) are satisfied: That, at the moment, appears to be quite problematic.

    The repercussions are potentially enormous. Assuming that Michael’s latest proofs are sound, we might even be permitted to justify Bayesian procedures by their frequency coverage properties without breaking any rule at all. Stopping rules could become important again, and there might no reason to slag off Jeffreys invariance priors, reference priors, or uniform minimum variance unbiased estimation. Whoopee!


NOTE  ADDED APRIL 2021: I have since learnt from Phil Dawid that  Birnbaum's Proof is perfectly correct and non-tautologous, and that Mayo's approach is both incorrect and misleading.

.



 



Author’s Notes: As an alternative to just comparing DIC for different choices of sampling model, it is possible to compare the candidate models inferentially by calculating the entire posterior distribution of the deviance for each specified model, and then contrasting any two of these distributions e.g. by graphical density plots or ‘quasi-significance probabilities’ associated with an upper convex boundary on a quantile-quantile plot, or by entropy distances. This relates to the approach described by Murray Aitkin in his seminal 2010 book.

    Let MD denote the minimum deviance. Then, for many models the posterior distribution of D(θ)-MD will be approximately chi-squared with p degrees of freedom when the sample size is large, and MD may be approximated by ED-p.

    Suppose we are comparing Model A with Model B, where Model A has larger deviance but smaller p. Then Model A can be said to be preferable to Model B if Shannon’s entropy distance between the two posterior distributions of deviance is less than zero. This provides an interesting alternative to DIC, and exact frequency calculations can be performed when contrasting nested models which are special cases of the linear statistical model. John Hsu and I are currently preparing a paper for possible publication.

    We are also considering ways that this approach can be used to derive an alternative to Box’s 1980 model-checking criterion [67], which Box believed to be appropriate whenever you do not have any alternative model is mind.

 
 

Here are the relevant details for a representative selection of the Ph.D. students who have received a Savage Prize as ‘best thesis’ award from ISBA since the year 2000:

2000 (Co-Winner) Peter Hoff (University of Wisconsin-Madison)

Constrained Non-Parametric Estimation by Mixtures

2001 Luis Nieto-Barajas (University of Bath)

Bayesian Non-Parametric Survival Analysis via Markov Processes

2002 Marc Suchard (UCLA)

Model Building Processes and Selection in Bayesian Phylogenetic Reconstruction

2004 Shane Jensen (Harvard)

Statistical Techniques for examining gene regulation

2006 Surya Toddar (Purdue)

Exploring Dirichlet Mixture and Logistic Gaussian Process Priors in Density Estimation, Regression and Sufficient Dimension Reduction

2010 Ricardo Lemos (University of Lisbon)

Hierarchical Bayesian Methods for the Marine Sciences: Analysis of Climate Variability and Fish Abundance.

2011 Kaisey Mandel (Harvard)

Improving cosmological distances to improve dark energy: hierarchical Bayesian models for type 1a supernovae in the optical and near infra-red.


All 26 recipients between 2000 and 2012 of the ‘Theory and Methods’ and ‘Applied Methodology’ versions of the Savage Prize demonstrated a similar degree of acumen and creativity. Arnold Zellner would have been very proud.  

    In 2009, Ryan Prescott Adams of the University of Cambridge proceeded in the tradition, 51 years later, of the great exponent of Bayesian density estimation Peter Whittle by receiving an honourable Mitchell Prize mention for his Ph.D. thesis Kernel Methods for Nonparametric Bayesian Inference of Probability Densities and Point Processes.

    Ryan’s thesis work was supervised by the physicist and Regius Professor of Engineering David J.C. Mackay F.R.S. who also has research interests in Bayesian Inference.

    Ryan is currently an Assistant Professor of Computer Science at Harvard. His research interests include Artificial Intelligence, Computer Linguistics, and Computer Vision.

    The other brilliant young Bayesians who either received a Savage prize or an honourable mention between 2008 and 2012 were, in reverse order, Anirban Bhattacharya, Bernardo Nipoti, Arissek Chakraborty, Rebecca C. Steorts, Juan Carlos Martinez-Ovando, Guan Ho Jang, Fabian Scheipl, Julien Comebise, Daniel Williamson, Ricardo Lemos, Robin Ryder, James G. Scott, Emily B. Fox, and Matthew A. Taddy, Lorenza Trippa, Alejandro Jarra, and Astrid Jullian.

 

May the force be with them!

 
Valencia 6 Cabaret (Archived from Brad Carlin's Collection)
 
 


L

 














 



 
 

 

I was, quite surprisingly , able to effectively resolve all the preceding academic issues during subsequent very helpful e-mail conversations with Peter Wakker of Erasmus University Rotterdam and Michael Evans of the University of Toronto. These effective resolutions are discussed in my Ch. 5 and, concerning Bayes Factors, in my Ch. 2.


BIRNBAUM’S THEOREM
: As a side-track to the highly confidential ISBA Internet debate, I was e-mailed from out of nowhere, and while sitting quietly in my flat watching the nuns in skimpy gowns floating through the former Royal Botanical Gardens opposite, by Professor Deborah Mayo of the Department of Philosophy at Virginia Tech who advised me that she’d been publishing counterexamples to Allan Birnbaum’s 1962 justification of the Likelihood Principle (LP) for years. This was the first of a torrent of e-mails from Deborah; she seems to have been in some sort of dispute with Michael Evans over this fundamental issue. But life as a pragmatic Bayesian is pleasantly quieter since I switched her off.  NOTE ADDED MUCH LATER.: PHIL DAWID AND MICHAEL EVANS HAVE SINCE ADVISED  ME THAT BIRNBAUM'S PROOF  IS COMPLETELY SOUND


    See, for example, Deborah Mayo’s 2013 article On the Birnbaum argument for the strong likelihood principle, which is googlable online, and the various references traceable from therein, some of which are co-authored with Sir David Cox and reflect his apparent agreement with her conclusions. See also [80]. Mayo’s website contains even more anti-Bayesian rhetoric, and lots of wild and wonderful ideas and suggestions.

 
Deborah Mayo
 

    However, Michael would appear to have swallowed the pill and resolved the issue in his 2013 article What does the proof of Birnbaum’s Theorem prove? [81]

 

Michael Evans, President, Statistical Society of Canada

 

    Birnbaum claimed that the Sufficiency Principle (SP) and Conditionality Principle (CP) together imply SP. Various concerns have been raised in the literature about the proof of this result e.g. by Jim Durbin, Jack Kalbfleisch, and Birnbaum himself. Michael has advised me, by personal communication and by reference to his 2013 paper, that

1. SP and CP do not in themselves imply LP.

2. Most of Deborah Mayo’s counterexamples to Birnbaum’s attempted proof are, quite surprisingly, perfectly correct and on the ball.

3. If we accept SP, CP, and ‘a class of equivalences associated with SP and CP’, then this is equivalent to accepting LP.

4. His 1986 claim, with Don Fraser and G. Monette, in the Canadian Journal of Statistics, that CP in itself implies LP, is incorrect.

5. If we accept CP and ‘a class of equivalences associated with CP’, then this is equivalent to accepting LP.

6. As to how should you try to ensure that the extra assumptions in (3) and (5) are satisfied: That, at the moment, appears to be quite problematic.

  



 



 

Author’s Notes: As an alternative to just comparing DIC for different choices of sampling model, it is possible to compare the candidate models inferentially by calculating the entire posterior distribution of the deviance for each specified model, and then contrasting any two of these distributions e.g. by graphical density plots or ‘quasi-significance probabilities’ associated with an upper convex boundary on a quantile-quantile plot, or by entropy distances. This relates to the approach described by Murray Aitkin in his seminal 2010 book.

    Let MD denote the minimum deviance. Then, for many models the posterior distribution of D(θ)-MD will be approximately chi-squared with p degrees of freedom when the sample size is large, and MD may be approximated by ED-p.

    Suppose we are comparing Model A with Model B, where Model A has larger deviance but smaller p. Then Model A can be said to be preferable to Model B if Shannon’s entropy distance between the two posterior distributions of deviance is less than zero. This provides an interesting alternative to DIC, and exact frequency calculations can be performed when contrasting nested models which are special cases of the linear statistical model. John Hsu and I are currently preparing a paper for possible publication.

    We are also considering ways that this approach can be used to derive an alternative to Box’s 1980 model-checking criterion [67], which Box believed to be appropriate whenever you do not have any alternative model is mind.

 
 

Here are the relevant details for a representative selection of the Ph.D. students who have received a Savage Prize as ‘best thesis’ award from ISBA since the year 2000:

2000 (Co-Winner) Peter Hoff (University of Wisconsin-Madison)

Constrained Non-Parametric Estimation by Mixtures

2001 Luis Nieto-Barajas (University of Bath)

Bayesian Non-Parametric Survival Analysis via Markov Processes

2002 Marc Suchard (UCLA)

Model Building Processes and Selection in Bayesian Phylogenetic Reconstruction

2004 Shane Jensen (Harvard)

Statistical Techniques for examining gene regulation

2006 Surya Toddar (Purdue)

Exploring Dirichlet Mixture and Logistic Gaussian Process Priors in Density Estimation, Regression and Sufficient Dimension Reduction

2010 Ricardo Lemos (University of Lisbon)

Hierarchical Bayesian Methods for the Marine Sciences: Analysis of Climate Variability and Fish Abundance.

2011 Kaisey Mandel (Harvard)

Improving cosmological distances to improve dark energy: hierarchical Bayesian models for type 1a supernovae in the optical and near infra-red.


All 26 recipients between 2000 and 2012 of the ‘Theory and Methods’ and ‘Applied Methodology’ versions of the Savage Prize demonstrated a similar degree of acumen and creativity. Arnold Zellner would have been very proud.  

    In 2009, Ryan Prescott Adams of the University of Cambridge proceeded in the tradition, 51 years later, of the great exponent of Bayesian density estimation Peter Whittle by receiving an honourable Mitchell Prize mention for his Ph.D. thesis Kernel Methods for Nonparametric Bayesian Inference of Probability Densities and Point Processes.

    Ryan’s thesis work was supervised by the physicist and Regius Professor of Engineering David J.C. Mackay F.R.S. who also has research interests in Bayesian Inference.

    Ryan is currently an Assistant Professor of Computer Science at Harvard. His research interests include Artificial Intelligence, Computer Linguistics, and Computer Vision.

    The other brilliant young Bayesians who either received a Savage prize or an honourable mention between 2008 and 2012 were, in reverse order, Anirban Bhattacharya, Bernardo Nipoti, Arissek Chakraborty, Rebecca C. Steorts, Juan Carlos Martinez-Ovando, Guan Ho Jang, Fabian Scheipl, Julien Comebise, Daniel Williamson, Ricardo Lemos, Robin Ryder, James G. Scott, Emily B. Fox, and Matthew A. Taddy, Lorenza Trippa, Alejandro Jarra, and Astrid Jullian.

 


Valencia 6 Cabaret (Archived from Brad Carlin's Collection)
 
 

No comments:

Post a Comment