Bayesian

From Intamap

Contents

General Bayesian ideas

Without trying to provoke debate just a few words on the Bayesian view of the world. In classical frequentist statistics probability is viewed as the proportion of outcomes of the event that occur 'in the long run'. This requires us to have some belief that the events we are trying to model probabilistically are somehow repeatable. Of course in our application this is not really the case; the event just is, for example the radiation field at a given time is; there is no repeatability, and in the world, at least above the quantum level, no uncertainty. The uncertainty arises from our incomplete knowledge. This is what Bayesians try to model; probability is a degree of belief.


A bit about Bayesian inference

Most Bayesian inference, for some parameter &theta, proceeds on the basis of:

  1. define the prior probability distribution over the parameter \theta, Failed to parse (unknown error): p(\theta)
(this expresses any knowledge we have about theta before we observe any data).
  1. specify your likelhood model for your observations, $x$, p(x|\theta) (this expresses your noise model on the observations and the manner in which the observations are linked to your parameter)
  2. update your prior beliefs about \theta to your posterior distribution using the definition of conditional probability (Bayes' theorem): p(\theta|x) = p(x|\theta)p(\theta)/p(x)

Note the the problem in Bayesian inference is generally the calculation of p(x) = \int p(x|\theta)p(\theta) d\theta which is a normalising constant, often called the "evidence" (statistics / machine learning / information theory) or "partition function" (statistical physics) and generally denote by Z. If this integral cannot be evaluated we cannot normalise the posterior denisty and we will not have a probability distribution.

Sampling based Bayesian inference

The most common appraoch to evaluating this integral, where it cannot be analytically derived, is to use Monte Carlo methods; in particular Markov Chain Monte Carlo techniques. The problem with these is that they take a long time, because to be sure we have converged to the correct stationary distribution we have to run the chain for a long time, and ensure it is mixing well.

Variational Bayesian inference

At Aston, and in many other places we have been developing variational alternatives to solve the integral problem in Bayesian inference. The approach we adopt here is to replace the hard integral that we cannot evaluate with an approximation to the posterior distribution for which we can evaluate the integral, and then make that approximation as good as possible using optimisation. This (at least partially, often fully) removes the requirement to sample, replacing this with (a hard) optimisation problem, which typically will run much faster. This means we can treat Bayesian approaches in almost real time, but there are several limitations:

  1. it is not super fast; the optimisation typically requires several iterations to converge.
  2. we do not undertake a full uncertainty analysis; higher level parameters tend to be fixed in any iteration
  3. we have only a few years of experience and many issues have yet to be addressed such as: change of support, robust estimation, treating huge datasets, robust parameter estimation,