Here is presented a python module used to fit models to data using Bayesian inference to extract the full probability of the model function. The original code can be found on the github page.
The following notebook is a walkthrough of the module code with explanations.
Imports¶
We start by importing the modules we need for our Bayesian analysis fitting.
The important module here is emcee
. It will do the heavy lifting of running the Monte Carlo Markov Chains (MCMC).
import numpy as np
import emcee
import pandas as pd
# plotting
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import ipywidgets as ipw
import corner
# plot parameters
plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["font.size"] = 14
Main function for fitting¶
We define in the following the function bayesian_fit
that is used to wrap around the emcee
package
with the specific goal of fitting functions to data.
We start from a list of data points $\vec x$ and $\vec y$ that we want to fit to a model as $y = f(x, \theta) + e$, where $f$ is a function that depends on a list of parameter $\theta$ and $e$ is random noise. The goal is to normalize the probability distribution for the model given by $$ P(\theta | \vec x, \vec y, I ) \propto P(\vec x, \vec y | \theta, I) P(\theta | I), $$ where $$ P(\vec x, \vec y | \theta, I) = (\sqrt{2\pi}\sigma^2)^{-N/2} \exp \big( -\frac{ \sum_i (y_i - f(x_i, \theta) )^2) } {2 \sigma^2} \big) $$ is the probability distribution for the data points with the model $f$ and given the random noise (modelled as Gaussian noise of standard deviation $\sigma$), and $P(\theta | I)$ is the prior distribution for the parameters $\theta$.
We define the function lnprob
that evaluates this unnormalized probability function
(on a logarithmic scale)
$\ln P(\theta | \vec x, \vec y, I )$, to be given to the emcee.EnsembleSampler
.
It is itself made of two parts.
First, lnprior
computes the prior (log-scale) probability
$\ln P(\theta | I)$
of the combination of the parameters (called theta
).
Second, lnlike
is the likelihood function for our model
$\ln P(\vec x, \vec y | \theta, I)$.
The resulting data is the sum of all the parameter values of the different chains (or walkers) as they were run. The data is packaged in a pandas.DataFrame
object for convenience.
def bayesian_fit(func, data_x, data_y, lnprior_parameters, init_params, n_walkers=10, n_iterations=1000, varnames = None):
def lnprior(theta):
# the noise sigma is the last parameter and is treated separately
sigma = theta[-1]
lnprior_sum = lnprior_parameters[-1](sigma)
# Loop over the prior functions given as argument lnprior_parameters
for p, lnprior_p in zip(theta[:-1], lnprior_parameters[:-1]):
lnprior_sum += lnprior_p(p)
return lnprior_sum
def lnlike(theta, x, y):
sigma = theta[-1]
ymod = func(x, theta[:-1])
return -0.5 * np.sum(
(((y.real - ymod.real)**2 + (y.imag-ymod.imag)**2)/sigma**2)
+ 2*np.log(sigma)
)
def lnprob(theta, x, y):
# The log probabilities are summed
# the case of zero probability (or -inf log prob) is treated separately
lp = lnprior(theta)
if not np.isfinite(lp):
return -np.inf
else:
return lnprior(theta) + lnlike(theta, x, y)
# The parameters are initialized with the functions init_params given as argument
# they are given as functions so they can be randomly generated
params_0 = [
np.array([ip() for ip in init_params])
for i in range(n_walkers)]
ndim = len(init_params)
# An EnsembleSampler from emcee is constructed and ran
sampler = emcee.EnsembleSampler(n_walkers, ndim, lnprob,
args=(data_x, data_y))
sampler.run_mcmc(params_0, n_iterations)
# The output is formatted as a Pandas DataFrame, with named variables
# if varnames are not given as arguments, variable names are generated
if varnames is None:
varnames = [ 'p{}'.format(i) for i in range(ndim-1) ]
varnames.append('sigma')
varnames = list(varnames)
# an (multi) index for the data frame is generated with each walkers and its iterations
iterations = range(n_iterations)
walkers = range(n_walkers)
index = pd.MultiIndex.from_product([walkers, iterations],
names=('Walker', 'Iteration'))
samples_df = pd.DataFrame(
sampler.chain.reshape((n_walkers*n_iterations, len(varnames))),
index=index, columns=varnames)
return samples_df
Creating the artificial data¶
# We choose a lorentzian as a model
def lorentzian(f, params):
f0, a, w, c = params
return c+a* (w/2.)**2/ ((f-f0)**2 + (w/2.)**2)
# Artificial experimental data is generated by combining the true model with Gaussian noise
f0_0=1.0; c_0=1.0; a_0=0.2; w_0=0.02 # True model parameters
N=251
er=0.1 # noise sigma
freq = np.linspace(0.9, 1.1, N)
ideal = lorentzian(freq, (f0_0, a_0, w_0, c_0)) # ideal model
ampl = ideal + np.random.normal(0, er, N) # data with noise
# A new data frame including data and true model is generated for quick plotting
data_with_model = pd.concat([
pd.DataFrame(dict(freq=freq, ampl=ampl, type='data')),
pd.DataFrame(dict(freq=freq, ampl=ideal, type='model'))
])
sns.lineplot(data=data_with_model, x='freq', y='ampl', hue='type')
The graph above shows the artificial data that we wish to fit. In orange is the "true" model and in blue is the artificial data with random noise added to the ideal case.
Setting the priors¶
We define in the following two functions to use for prior. If the scale of the parameter is not clear at priori, it is useful to use a prior such as an exponential distribution. We will use here exponential distributions for all parameters but the "resonance" frequency of the Lorentzian, that will have a uniform prior bounded by the minimal and maximal x coordinates.
For data that gives a lot of information about the model (as is the case here), the priors have no influence on the outcome and are not too important.
def prior_exponential(pexp, p0=0, p1=np.inf):
def prior(p):
if p < p0 or p > p1:
return -np.inf
else:
return -p/pexp
return prior
def prior_uniform(p0=-np.inf, p1=np.inf):
def prior(p):
if p < p0 or p > p1:
return -np.inf
else:
return 0
return prior
lnprior_parameters = (
prior_uniform(p0=min(freq), p1=max(freq)),
prior_exponential(a_0, p0=0),
prior_exponential(w_0, p0=0),
prior_exponential(c_0, p0=0),
prior_exponential(er, p0=0),
)
Setting the initial values¶
Each independent Markov chain can start with its own set of initial values for all the parameter. To explore as much parameter space as possible and avoid getting stuck in local maxima of likelihood, it is useful to use random values for the initial parameters. Note that if the values are too far from the optimal, the chains can take a large number of iterations to reach their steady state.
import random as r
init_params = [
lambda: r.gauss(f0_0, np.sqrt(1e-5)),
lambda: r.expovariate(1/a_0),
lambda: r.expovariate(1/w_0),
lambda: r.expovariate(1/c_0),
lambda: r.expovariate(1/er)
]
Running the MCMC algorithm¶
result = bayesian_fit(
lorentzian, freq, ampl,
lnprior_parameters, init_params,
varnames = ('f0', 'a', 'w', 'c', 'sigma'),
n_walkers=20)
result_noind = result.reset_index()
result.describe()
result.head()
Checking the walkers behavior¶
Walker autocorrelation¶
The walkers of MCMC algorithm sample the desired probability distribution only once they reach their steady state. An necessary step after running the algorithm is to make sure of this. One way is to check the autocorrelation of the parameters in the chain. Far from the equilibrium, the walker is being attracted by high probability regions and there are significant correlations between subsequent values of the parameter. Once the chain is in the steady state, then it samples the probability distribution and the consecutive values of the parameter are decorrelated and the autocorrelation is close to zero.
In the following, we define the autocorr
function to compute the autocorrelation.
def autocorr(x):
x_var = x - x.mean()
result = np.correlate(x_var, x_var, mode='full')
autocor = result[result.size // 2:]
norm = np.sum(x_var**2)
autocor = autocor/ norm
return autocor
def plot_walker_statistics(param, walkers):
height=400
fig, (ax1,ax2) = plt.subplots(2)
#fig.clear()
#fig1 = px.line(result_noind[result_noind.Walker.isin(walkers)], x='Iteration', y=param, color='Walker', height=height)
sns.lineplot(data=result_noind[result_noind.Walker.isin(walkers)],
x='Iteration', y=param, hue='Walker', ax=ax1)
temp = []
for w in walkers:
ac = autocorr(result.loc[w][param].values)
iters = range(len(ac))
df = pd.DataFrame(dict(Iteration=iters, Autocorrelation=ac, Walker=w))
temp.append(df)
data_temp_autocor = pd.concat(temp)
sns.lineplot(data=data_temp_autocor, x='Iteration', y='Autocorrelation', hue='Walker', ax=ax2)
plt.show()
walker_selector = ipw.SelectMultiple(
options=result.index.levels[0],
value=[0],
disabled=False
)
ipw.interactive(plot_walker_statistics, param=result.columns, walkers=walker_selector)
Pick a conversion iteration step¶
Statistics about the actual distributions can only be taken after the chains have converged to their steady state. From observating the previous graphs, we find that in this case we have convergence after a few hundred steps.
iter_stable = 200
Corner plot¶
This is a convenient way to look at the correlations between the paramaters in the final distribution.
We skip the first iter_stable
number of steps in each chain such that the walkers have converged and are sampling the correct probability distribution.
varnames = result.columns
inds_after_transient = result.index.get_level_values('Iteration') > iter_stable #indices when it has converged
means = result.loc[inds_after_transient].mean() # best values for each parameter
fig = corner.corner(result[inds_after_transient],
labels=varnames,
truths= means,
bins=25,)
The graph above represents the different probability distributions of the parameters
f0
, a
, w
, c
, sigma
.
It gives the joint probability distribution pair-wise as the density plots
and the marginal distribution of each parameter as the bar plots.
The blue lines show the expected value for each parameter, i.e. the best guess for the parameter value.
Uncertainty in data space¶
Above, we plotted the probability distribution for the different parameters of the fit.
Another important tool is to be able to see the uncertainty in fitting in data space.
In the following, we take sample_size
-many possible fits by sampling the parameter distribution.
Alongside the original data and the best fit (that is given by the expectation values of the parameters),
we can plot an interval of one standard deviation around the average y-value by sampling
the distribution of fits.
import plotly.graph_objects as go
sample_size = 2000 # number of fits to evaluate
inds_after_transient = result.index.get_level_values('Iteration') > iter_stable
means = result.loc[inds_after_transient].mean() # best guess for parameter values
x_array_fit = np.linspace(freq.min(), freq.max(), 401) # x-array to use for fits
# we randomly sample the parameters and store the fitted values for each sample
y_models = np.zeros((x_array_fit.size, sample_size))
i=0
for ind, fparams in result[inds_after_transient].sample(n=sample_size).iterrows():
y_models[:,i] = lorentzian(x_array_fit, fparams[:-1])
i +=1
fig, ax = plt.subplots()
ax.scatter(freq, ampl, color='xkcd:cerulean')
ax.fill_between(x_array_fit,
np.mean(y_models, axis=1) - np.std(y_models, axis=1),
np.mean(y_models, axis=1) + np.std(y_models, axis=1),
color='gray', alpha=0.7, zorder=3)
ax.plot(x_array_fit, lorentzian(x_array_fit, means[:-1]),
'-', color='xkcd:crimson', zorder=4)
ax.set_xlabel('Frequency')
ax.set_ylabel('Amplitude')
ax.set_xlim(freq.min(), freq.max())