Model fitting with Bayes

Model fitting with Bayes

Here is presented a python module used to fit models to data using Bayesian inference to extract the full probability of the model function. The original code can be found on the github page.

The following notebook is a walkthrough of the module code with explanations.

Bayesian_fitting_walkthrough_notebook

Imports

We start by importing the modules we need for our Bayesian analysis fitting. The important module here is emcee. It will do the heavy lifting of running the Monte Carlo Markov Chains (MCMC).

In [1]:
import numpy as np
import emcee
import pandas as pd

# plotting
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import ipywidgets as ipw
import corner
In [2]:
# plot parameters

plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["font.size"] = 14

Main function for fitting

We define in the following the function bayesian_fit that is used to wrap around the emcee package with the specific goal of fitting functions to data.

We start from a list of data points $\vec x$ and $\vec y$ that we want to fit to a model as $y = f(x, \theta) + e$, where $f$ is a function that depends on a list of parameter $\theta$ and $e$ is random noise. The goal is to normalize the probability distribution for the model given by $$ P(\theta | \vec x, \vec y, I ) \propto P(\vec x, \vec y | \theta, I) P(\theta | I), $$ where $$ P(\vec x, \vec y | \theta, I) = (\sqrt{2\pi}\sigma^2)^{-N/2} \exp \big( -\frac{ \sum_i (y_i - f(x_i, \theta) )^2) } {2 \sigma^2} \big) $$ is the probability distribution for the data points with the model $f$ and given the random noise (modelled as Gaussian noise of standard deviation $\sigma$), and $P(\theta | I)$ is the prior distribution for the parameters $\theta$.

We define the function lnprob that evaluates this unnormalized probability function (on a logarithmic scale) $\ln P(\theta | \vec x, \vec y, I )$, to be given to the emcee.EnsembleSampler. It is itself made of two parts. First, lnprior computes the prior (log-scale) probability $\ln P(\theta | I)$ of the combination of the parameters (called theta). Second, lnlike is the likelihood function for our model $\ln P(\vec x, \vec y | \theta, I)$.

The resulting data is the sum of all the parameter values of the different chains (or walkers) as they were run. The data is packaged in a pandas.DataFrame object for convenience.

In [3]:
def bayesian_fit(func, data_x, data_y, lnprior_parameters, init_params, n_walkers=10, n_iterations=1000, varnames = None):

    def lnprior(theta):
        # the noise sigma is the last parameter and is treated separately
        sigma = theta[-1] 
        lnprior_sum = lnprior_parameters[-1](sigma)
        # Loop over the prior functions given as argument lnprior_parameters
        for p, lnprior_p in zip(theta[:-1], lnprior_parameters[:-1]):
            lnprior_sum += lnprior_p(p)
        return lnprior_sum

    def lnlike(theta, x, y):
        sigma = theta[-1]
        ymod = func(x, theta[:-1])
        return -0.5 * np.sum( 
                (((y.real - ymod.real)**2 + (y.imag-ymod.imag)**2)/sigma**2) 
                + 2*np.log(sigma) 
                )

    def lnprob(theta, x, y):
        # The log probabilities are summed 
        # the case of zero probability (or -inf log prob) is treated separately
        lp = lnprior(theta)
        if not np.isfinite(lp):
            return -np.inf
        else:
            return lnprior(theta) + lnlike(theta, x, y)

    # The parameters are initialized with the functions init_params given as argument
    # they are given as functions so they can be randomly generated
    params_0 = [
            np.array([ip() for ip in init_params])
            for i in range(n_walkers)]

    ndim = len(init_params)
    # An EnsembleSampler from emcee is constructed and ran
    sampler = emcee.EnsembleSampler(n_walkers, ndim, lnprob, 
            args=(data_x, data_y))
    sampler.run_mcmc(params_0, n_iterations)

    
    # The output is formatted as a Pandas DataFrame, with named variables
    
    # if varnames are not given as arguments, variable names are generated
    if varnames is None:
        varnames = [ 'p{}'.format(i) for i in range(ndim-1) ]
        varnames.append('sigma')
    varnames = list(varnames)

    # an (multi) index for the data frame is generated with each walkers and its iterations
    iterations = range(n_iterations)
    walkers = range(n_walkers)
    index = pd.MultiIndex.from_product([walkers, iterations], 
            names=('Walker', 'Iteration'))
    samples_df = pd.DataFrame(
            sampler.chain.reshape((n_walkers*n_iterations, len(varnames))),
                index=index, columns=varnames)

    return samples_df

Creating the artificial data

In [4]:
# We choose a lorentzian as a model
def lorentzian(f, params):
    f0, a, w, c = params
    return c+a* (w/2.)**2/ ((f-f0)**2 + (w/2.)**2)


# Artificial experimental data is generated by combining the true model with Gaussian noise
f0_0=1.0; c_0=1.0; a_0=0.2; w_0=0.02 # True model parameters
N=251
er=0.1 # noise sigma
freq = np.linspace(0.9, 1.1, N)
ideal = lorentzian(freq, (f0_0, a_0, w_0, c_0)) # ideal model
ampl = ideal + np.random.normal(0, er, N) # data with noise

# A new data frame including data and true model is generated for quick plotting
data_with_model = pd.concat([
    pd.DataFrame(dict(freq=freq, ampl=ampl, type='data')), 
    pd.DataFrame(dict(freq=freq, ampl=ideal, type='model'))
    ])

sns.lineplot(data=data_with_model, x='freq', y='ampl', hue='type')
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x121c8b7d0>

The graph above shows the artificial data that we wish to fit. In orange is the "true" model and in blue is the artificial data with random noise added to the ideal case.

Setting the priors

We define in the following two functions to use for prior. If the scale of the parameter is not clear at priori, it is useful to use a prior such as an exponential distribution. We will use here exponential distributions for all parameters but the "resonance" frequency of the Lorentzian, that will have a uniform prior bounded by the minimal and maximal x coordinates.

For data that gives a lot of information about the model (as is the case here), the priors have no influence on the outcome and are not too important.

In [5]:
def prior_exponential(pexp, p0=0, p1=np.inf):
    def prior(p):
        if p < p0 or p > p1:
            return -np.inf
        else:
            return -p/pexp
    return prior

def prior_uniform(p0=-np.inf, p1=np.inf):
    def prior(p):
        if p < p0 or p > p1:
            return -np.inf
        else:
            return 0
    return prior

lnprior_parameters = (
        prior_uniform(p0=min(freq), p1=max(freq)),
        prior_exponential(a_0, p0=0),
        prior_exponential(w_0, p0=0),
        prior_exponential(c_0, p0=0),
        prior_exponential(er, p0=0),
        )

Setting the initial values

Each independent Markov chain can start with its own set of initial values for all the parameter. To explore as much parameter space as possible and avoid getting stuck in local maxima of likelihood, it is useful to use random values for the initial parameters. Note that if the values are too far from the optimal, the chains can take a large number of iterations to reach their steady state.

In [6]:
import random as r

init_params = [ 
    lambda: r.gauss(f0_0, np.sqrt(1e-5)), 
    lambda: r.expovariate(1/a_0), 
    lambda: r.expovariate(1/w_0), 
    lambda: r.expovariate(1/c_0),
    lambda: r.expovariate(1/er)
    ]

Running the MCMC algorithm

In [7]:
result = bayesian_fit(
    lorentzian, freq, ampl,
    lnprior_parameters, init_params, 
    varnames = ('f0', 'a', 'w', 'c', 'sigma'), 
    n_walkers=20)
result_noind = result.reset_index()
In [8]:
result.describe()
Out[8]:
f0 a w c sigma
count 20000.000000 20000.000000 20000.000000 20000.000000 20000.000000
mean 0.999767 0.172092 0.017253 1.000919 0.098480
std 0.001912 0.039168 0.008004 0.049344 0.007436
min 0.986814 0.014477 0.001673 0.087887 0.009479
25% 0.998627 0.145172 0.012444 0.998352 0.095148
50% 0.999915 0.169716 0.015785 1.004483 0.098227
75% 1.001103 0.195491 0.019900 1.010475 0.101493
max 1.005690 0.408553 0.126936 2.401208 0.282673
In [9]:
result.head()
Out[9]:
f0 a w c sigma
Walker Iteration
0 0 0.999097 0.192663 0.022864 0.593881 0.098276
1 0.999651 0.177869 0.023431 0.850410 0.116356
2 0.999651 0.177869 0.023431 0.850410 0.116356
3 0.999651 0.177869 0.023431 0.850410 0.116356
4 0.999651 0.177869 0.023431 0.850410 0.116356

Checking the walkers behavior

Walker autocorrelation

The walkers of MCMC algorithm sample the desired probability distribution only once they reach their steady state. An necessary step after running the algorithm is to make sure of this. One way is to check the autocorrelation of the parameters in the chain. Far from the equilibrium, the walker is being attracted by high probability regions and there are significant correlations between subsequent values of the parameter. Once the chain is in the steady state, then it samples the probability distribution and the consecutive values of the parameter are decorrelated and the autocorrelation is close to zero.

In the following, we define the autocorr function to compute the autocorrelation.

In [10]:
def autocorr(x):
    x_var = x - x.mean()
    result = np.correlate(x_var, x_var, mode='full')
    autocor = result[result.size // 2:]
    norm = np.sum(x_var**2)
    autocor = autocor/ norm
    return autocor

def plot_walker_statistics(param, walkers):
    height=400
    fig, (ax1,ax2) = plt.subplots(2)
    #fig.clear()
    #fig1 = px.line(result_noind[result_noind.Walker.isin(walkers)], x='Iteration', y=param, color='Walker', height=height)
    sns.lineplot(data=result_noind[result_noind.Walker.isin(walkers)], 
                        x='Iteration', y=param, hue='Walker', ax=ax1)
    
    temp = []
    for w in walkers:
        ac = autocorr(result.loc[w][param].values)
        iters = range(len(ac))
        df = pd.DataFrame(dict(Iteration=iters, Autocorrelation=ac, Walker=w))
        temp.append(df)
    data_temp_autocor = pd.concat(temp)
    sns.lineplot(data=data_temp_autocor, x='Iteration', y='Autocorrelation', hue='Walker', ax=ax2)
    plt.show()

walker_selector = ipw.SelectMultiple(
    options=result.index.levels[0],
    value=[0],
    disabled=False
)
    
ipw.interactive(plot_walker_statistics, param=result.columns, walkers=walker_selector)

Pick a conversion iteration step

Statistics about the actual distributions can only be taken after the chains have converged to their steady state. From observating the previous graphs, we find that in this case we have convergence after a few hundred steps.

In [11]:
iter_stable = 200

Corner plot

This is a convenient way to look at the correlations between the paramaters in the final distribution. We skip the first iter_stable number of steps in each chain such that the walkers have converged and are sampling the correct probability distribution.

In [12]:
varnames = result.columns
inds_after_transient = result.index.get_level_values('Iteration') > iter_stable #indices when it has converged
means = result.loc[inds_after_transient].mean() # best values for each parameter

fig = corner.corner(result[inds_after_transient], 
        labels=varnames,
        truths= means,
        bins=25,)

The graph above represents the different probability distributions of the parameters f0, a, w, c, sigma. It gives the joint probability distribution pair-wise as the density plots and the marginal distribution of each parameter as the bar plots. The blue lines show the expected value for each parameter, i.e. the best guess for the parameter value.

Uncertainty in data space

Above, we plotted the probability distribution for the different parameters of the fit. Another important tool is to be able to see the uncertainty in fitting in data space. In the following, we take sample_size-many possible fits by sampling the parameter distribution. Alongside the original data and the best fit (that is given by the expectation values of the parameters), we can plot an interval of one standard deviation around the average y-value by sampling the distribution of fits.

In [13]:
import plotly.graph_objects as go

sample_size = 2000 # number of fits to evaluate

inds_after_transient = result.index.get_level_values('Iteration') > iter_stable

means = result.loc[inds_after_transient].mean() # best guess for parameter values

x_array_fit = np.linspace(freq.min(), freq.max(), 401) # x-array to use for fits

# we randomly sample the parameters and store the fitted values for each sample
y_models = np.zeros((x_array_fit.size, sample_size)) 
i=0
for ind, fparams in result[inds_after_transient].sample(n=sample_size).iterrows():
    y_models[:,i] = lorentzian(x_array_fit, fparams[:-1])
    i +=1

fig, ax = plt.subplots()
ax.scatter(freq, ampl, color='xkcd:cerulean')


ax.fill_between(x_array_fit, 
        np.mean(y_models, axis=1) - np.std(y_models, axis=1),
        np.mean(y_models, axis=1) + np.std(y_models, axis=1),
        color='gray', alpha=0.7, zorder=3)
ax.plot(x_array_fit, lorentzian(x_array_fit, means[:-1]), 
        '-', color='xkcd:crimson', zorder=4)
ax.set_xlabel('Frequency')
ax.set_ylabel('Amplitude')
ax.set_xlim(freq.min(), freq.max())
Out[13]:
(0.9, 1.1)
In [ ]: