- Maria Joao Rosa
- SPM Homecoming 2008
- Wellcome Trust Centre for Neuroimaging
Statistic formulations - P(A): probability of event A occurring
- P(A|B): probability of A occurring given B occurred
- P(B|A): probability of B occurring given A occurred
- P(A,B): probability of A and B occurring simultaneously (joint probability of A and B)
- Joint probability of A and B
- P(A,B) = P(A|B)*P(B) = P(B|A)*P(A)
- P(B|A) = P(A|B)*P(B)/P(A)
- Which is Bayes Rule
- Bayes’ Rule is very often referred to Bayes’ Theorem, but it is not really a theorem, and should more properly be referred to as Bayes’ Rule (Hacking, 2001).
Reverend Thomas Bayes (1702 – 1761) - Reverend Thomas Bayes was a minister interested in probability and stated a form of his famous rule in the context of solving a somewhat complex problem involving billiard balls
- It was first stated by Bayes in his ‘Essay towards solving a problem in the doctrine of chances’, published in the Philosophical Transactions of the Royal Society of London in 1764.
Conditional probability - P(A|B): conditional probability of A given B
- Q: When are we considering conditional probabilities?
- A: Almost always!
- Examples:
- Lottery chances
- Dice tossing
Conditional probability - Examples (cont’):
- P(Brown eyes|Male): (P(A|B) with A := Brown eyes, B := Male)
- What is the probability that a person has brown eyes, ignoring everyone who is not a male?
- Ratio: (being a male with brown eyes)/(being a male)
- Probability ratio: probability that a person is both male and has brown eyes to the probability that a person is male
- P(Male) = P(B) = 0.52
- P(Brown eyes) = P(A) = 0.78
- P(Male with brown eyes) = P(A,B) = 0.38
- P(A|B) = P(B|A)*P(A)/P(B) = P(A,B)/P(B) = 0.38/0.52 = 0.73..
- Flipping it around (Bayes idea):
- You could also calculate now what’s the prob. of being a male if you have brown eyes P(B|A) = P(A|B)*P(B)/P(A) = 0.73*0.52/0.78 = 0.4871…
Statistic terminology - P(A) is called the marginal or prior probability of A (since it is the probability of A prior to having any information about B)
- Similarly:
- P(B): the marginal or prior probability of B
- P(A|B) is called the likelihood function for A given B.
- P(B|A): the posterior probability of B given A (since it depends on having information about A)
- Bayes Rule
- P(B|A) = P(A|B)*P(B)/P(A)
- “likelihood” function for B (for fixed A)
- “posterior” probability of B given A
- prior probabilities of B, A (“priors”)
- It relates to the conditional density of a parameter (posterior probability) with its unconditional density (prior, since depends on information present before the experiment).
- The likelihood is the probability of the data given the parameter and represents the data now available.
- Bayes’ Theorem for a given parameter
- p (data) = p (data) p () / p (data)
- 1/P (data) is basically
- a normalizing constant
- The prior is the probability of the parameter and represents what was thought before seeing the data.
- The posterior represents what is thought given both prior information and the data just seen.
Data and hypotheses… - We have a hypotheses H0 (null), H1
- We have data (Y)
- We want to check if the model that we have (H1) fits our data (accept H1 / reject H0) or not (H0)
- Inferential statistics:
- what is the probability that we can reject H0 and accept H1 at some level of significance (, P)
- These are a-priori decisions even when we don’t know what the data will be and how it will behave.
- Bayes:
- We get some evidence for the model (“likelihood”) and then can even compare “likelihoods” of different models
Where does Bayes Rule come at hand? - In diagnostic cases where we’re are trying to calculate P(Disease | Symptom) we often know P(Symptom | Disease), the probability that you have the symptom given the disease, because this data has been collected from previous confirmed cases.
- In scientific cases where we want to know P(Hypothesis | Result), the probability that a hypothesis is true given some relevant result, we may know P(Result | Hypothesis), the probability that we would obtain that result given that the hypothesis is true- this is often statistically calculable, as when we have a p-value.
Applicability to (f)mri - Let’s take fMRI as a relevant example
- We have:
- Measured data : Y
- Model : X
- Model estimates: , (/variance)
What do we get with inferential statistics? - T-statistics on the betas ( = (1,2,…)) (taking error into account) for a specific voxel we would ONLY get that there is a chance (e.g. < 5%) that there is NO effect of (e.g. 1 > 2), given the data
- But what about the likelihood of the model???
- What are the chances/likelihood that 1 > 2 at some voxel or region
- Could we get some quantitative measure on that?
What do we get with Bayes statistics? - Here, the idea (Bayes) is to use our post-hoc knowledge (our data) to estimate the model, ( also allowing us to compare hypotheses (models) and see which fits our data best)
- “posterior” distribution for X given Y
- “likelihood” of Y given X
- prior probabilities of Y, X (“priors”)
- Now to Steve about the practical sides in SPM…
- P(X|Y) = P(Y|X)*P(X)/P(Y)
- i.e. P(|Y) = P(Y|)*P()/P(Y)
Bayes for Beginners: Applications SPM uses priors for estimation in… - spatial normalization
- segmentation
- EEG source localisation
- and Bayesian inference in…
-
- Posterior Probability Maps (PPM)
- Dynamic Causal Modelling (DCM)
Null hypothesis significance testing - Standard approach in science is the null hypothesis significance test (NHST)
- Low p value suggests “there is not nothing”
- Assumption is H0 = noise; randomness
- H0 = molecules are randomly arranged in space
- Looking unlikely…
- Kreuger (2001) American Psychologist
Something vs nothing - …If there is any effect..
- Our interpretations ultimately depend on p(H0)
- “Risky” vs “safe” research…
- Better to be explicit – incorporate subjectivity when specifying hypotheses.
- Belief change = p(H0) – p(H0 | D)
- If the underlying effect δ ~= 0, no matter how small, the test statistic grows in size – is this physiological?
The case for the defence - Law of large numbers means that the test statistic will identify a consistent trend (δ ~= 0) with a sufficient sample size
- In SPM, we look at images of statistics, not effect sizes
- A highly significant statistic may reflect a small non-physiological difference, with large N
- BUT… as long as we are aware of this, classical inference works well for common sample sizes
- post = d + p Mpost = d Md + p Mp
- post
- Posterior Probability Distribution
(1) Bayesian model comparison - BUT!!! What is p(H0) for randomness?!
- Reframe the question – compare alternative hypotheses/models:
- If only one model, then p(y) is a normalising constant…
Practical example (1) Dynamic causal modelling (DCM) - In “classical” SPM, no (flat) priors
- In “full” Bayes, priors might be from theoretical arguments or from independent data
- In “empirical” Bayes, priors derive from the same data, assuming a hierarchical model for generation of the data
- Parameters of one level can be made priors on distribution of parameters at lower level
- Parameters and hyperparameters at each level can be estimated using EM algorithm
Shrinkage prior - In the absence of evidence
- to the contrary, parameters
- will shrink to zero
Practical example (2) - (2) Posterior Probability Maps
- Posterior probability distribution p( |Y)
(3) Use informative priors (cutting edge!) - Spatial constraints on fMRI activity (e.g. grey matter)
- Spatial constraints on EEG sources, e.g. using fMRI blobs
(4) Tasters – The Bayesian Brain (4a) Taster: Modelling behaviour… - Ernst & Banks (2002) Nature
(4a) Taster: Modelling behaviour… - Ernst & Banks (2002) Nature
(4b) Taster: Modelling the brain… - Friston (2005) Phil Trans R Soc B
Acknowledgements and further reading - Previous MFD talks
- Jean & Guillame’s SPM course slides
- Krueger (2001) Null hypothesis significance testing Am Psychol 56: 16-26
- Penny et al. (2004) Comparing dynamic causal models. Neuroimage 22: 1157-1172
- Friston & Penny (2003) Posterior probability maps and SPMs Neuroimage 19: 1240-1249
- Friston (2005) A theory of cortical responses Phil Trans R Soc B
- www.ualberta.ca/~chrisw/BayesForBeginners.pdf
- www.fil.ion.ucl.ac.uk/spm/doc/books/hbf2/pdfs/Ch17.pdf
Bayes’ ending - Bunhill Fields Burial Ground
- off City Road, EC1
Share with your friends: |