Skip to Tutorial Content

MXB341 Blackboard Page   MXB341 Statistical Inference

“In theory, there is no difference between practice and theory. In practice there is.”

— Jan van de Snepscheut

Week 12: MCMC Diganostics

MCMC Diagnostics

The Monte Carlo Markov Chain techniques that we have learned so far are based on the idea of drawing random samples from a posterior distribution in order to implement Monte Carlo integration in order to make inferences about our parameters. Because these techniques are based on random variables and the schemes we have discussed are based on drawing a sequence of samples, there are several opportunities for errors or inefficiencies to come up.

Notably we need to know:

  1. Are the samples drawn from the stationary target density?
  2. Are the samples a good approximation of a representative sample of independent draws from the the target density
  3. Finally, how does my selected model compare to other possible models?

We can use tools like cummulative means plots and the Brooks-Gelman-Rubin Potential Scale Reduction Factor to determine convergence, and tools like autocorrelation plots and effective sample size to measure the quality of our samples. Finally we will introduce the deviance information criteria (DIC) as a means of making comparisons between models.

Convergence and Stationarity

Cummulative Means

For a target variable \(\theta\) and an MCMC scheme generating a sequence of samples \(\theta^{(t)},\theta^{(t+1)},\ldots\) from the target posterior \(\pi(\theta|\mathbf{y})\) the most basic definition of stationarity is \[ E\left(\theta^{(t)}|\mathbf{y}\right)=E\left(\theta^{(t+1)}|\mathbf{y}\right). \] From this definition it is reasonable to assume that monitoring the cumulative average of the samples generated by the MCMC scheme can be used as a tool for monitoring convergence, by plotting \(t\) versus the cumulative mean, i.e. for \(t=1,\ldots,T\) plot \(t\) versus \[ \bar{\theta}^{(t)}=\frac{1}{t}\sum_{s=1}^t\theta^{(s)} \] The resulting plot should reach an stable value for \(\bar{\theta}^{(t)}\), indicating the point where the MCMC scheme achieves stationarity.

The Brooks-Gelman-Rubin (BGR) potential scale reduction factor:

For \(m\) independent chains from diverse starting points, compute the variance estimate for each chain \(j\): \[ s^2_j=\frac{1}{n-1}\sum_{i=1}^n(\theta_{ij}-\bar{\theta}_j)^2 \] and let \[ W=\frac{1}{m}\sum_{j=1}^ms^2_j \] be the average of the variance across the chains and define the overall mean as \[ \bar{\bar{\theta}}=\frac{1}{m}\sum_{j=1}^m\bar{\theta}_j \] and \[ B=\frac{n}{m-1}\sum_{j=1}^m(\bar{\theta}_j-\bar{\bar{\theta}})^2 \] and \[ \widehat{\operatorname{Var}(\theta)}=\left(1-\frac{1}{n}\right)W+\frac{1}{n}B \] Then the B-G-R potential scale reduction factor is \[ \hat{R}=\sqrt{\frac{\widehat{\operatorname{Var}(\theta)}}{W}}. \] Plotting the chains together is a useful tool for graphically assessing whether or not the chains have converged, but calculating \(\hat{R}\) is a bit more precise means of assessing convergence. Ideally, \(\hat{R}\) should be close to \(1\), and as a rule of thumb, a value of \(\hat{R}<1.1\) is considered sufficiently close to \(1\).

Assessing Samples

Autocorrelation:

The autocorrelation between two samples in an MCMC chain is the Pearson correlation coefficient between the two samples. It is a function of the lag or number of samples between them. \[ \operatorname{Cor}\left(\theta^{(t)},\theta^{(t+k)}\right)=\rho^k. \] The autocorrelation can be easily calculated and plotted to gain some understanding of the degree of correlation present in the sequence.

This can be done automatically in R using the function acf().

acf(theta)

Effective Sample Size (ESS):

Given the autocorrelation function \(\rho^k\) can be used to compute the effective sample size that is less than the actual sample size \(n\). \[ ESS=\frac{n}{1+2\sum_{k=1}^\infty\rho_k}. \] The effective sample size is a more quantitative measure of the efficiency of MCMC schemes and can be a useful measure of the reliability of estimates based on the samples.

This can be computed automatically in R using the function ess() in the mcmcse package.

library(mcmcse)

ess(theta)

The Deviance Information Criteria

Deviance Information Criteria

The deviance information criterion (DIC) is an information criterion much like the AIC or BIC. It is a measure that takes into account both the model complexity and goodness of fit in order for use as a model selection criteria. Defining the deviance as \[ D(\theta) = -2\log(f(\mathbf{y}|\theta)) \] or \(-2\) times the log-likelihood, it is a measure of goodness of fit. Model complexity can also be measured in terms of deviance by one of two measures: \[ p_D= \overline{D(\theta)}-D(\bar{\theta}) \] where \(\overline{D(\theta)}\) is the expected value of the deviance, and \(D(\bar{\theta})\) is the deviance evaluated at the expected value of \(\theta\). Alternatively, the measure \[ p_V = \frac{1}{2}\operatorname{Var}(D(\theta)) \] can be used a a complexity measure, and the resulting DIC is \[ DIC = p_D+\overline{D(\theta)},\mbox{ or }D(\bar{\theta})+2p_D \] with the second form highlighting the relationship with AIC. Note that \(p_V\) can be substituted for \(p_D\) if desired, but this is not standard in most implementations.

One can implement the DIC in any Monte Carlo scheme desired, \(\overline{D(\theta)}\) and \(p_V\) can be easily calculated by computing the log-likelihood at each iteration and computing the mean and variance. Calculating the \(p_D\) consists of calculating the log-likelihood using the posterior mean of \(\theta\).

Theory Questions

Question 1

Given data \(x, y\)
set.seed(14061972)

x<-runif(20,-1,1)

lambda<-exp(3.1*x-1.7)

y<-rpois(20,lambda=lambda)
  1. Fit a model assuming that \(y_i\sim Pois(\lambda_i)\), \(\log(\lambda_i)=\beta_0+\beta_1x_i\).
  2. Plot the cumulative mean of the parameters and calculate the BGR for three chains.
library(mvtnorm)

x<-c(0.59, -0.67, 0.7, -0.86, -0.99, 0.92, 0.57, -0.04, -0.57, -0.74, 0.98, -0.81, -0.79, 0.51, 0.09, 0.13, 0.28, 0.14, 0.37, 0.6)
y<-c(4, 0, 2, 0, 0, 4, 2, 0, 0, 0, 4, 0, 0, 3, 0, 0, 0, 0, 0, 1 )

X<-cbind(1,x)

##  log-likelihood function

log_like<-function(beta)
{
  nu<-X%*%beta%>%exp()
  dpois(y,nu,log=TRUE)%>%sum()
}

bayes_pois<-function(x,y,init,n)
{
  
  ##  Initialise beta

  beta<-matrix(NA,ncol = 2, nrow = n)
  beta[1,]<-init

  ##  Set up MCMC

  beta_cov<-crossprod(X)%>%solve()
  beta_sd<-1

  n.accept<-0

  set.seed(14061972)

  for(i in 2:n)
  {
    bc<-rmvnorm(1,beta[i-1,],beta_sd*beta_cov)%>%t()
    alpha<-(log_like(bc)-log_like(beta[i-1,]))%>%exp()
    u<-runif(1)    
    if(u<alpha)
    {
      beta[i,]<-bc
      n.accept%<>%+(1)
    }
    else
    {
      beta[i,]<-beta[i-1,]
    }
    accept.rate<-n.accept/i
  
    ##  Automatically tune the candidate distribution
  
    if(accept.rate<0.2|accept.rate>0.4)
    {
      beta_sd%<>%multiply_by(0.7+accept.rate)
    }
  }
  return(list(beta = beta, accept.rate = accept.rate))
}

##  Now run multiple chains

N<-10000

beta0<-numeric()
beta1<-numeric()

init<-rbind(c(-1,-1),c(0,0),c(1,1))

for(j in 1:3)
{
  out<-bayes_pois(x,y,init[j,],N)
  beta0%<>%c(out$beta[,1])
  beta1%<>%c(out$beta[,2])
}

##  Plot the results

iteration<-rep(1:N,3)
chain<-c(rep("1",N),rep("2",N),rep("3",N))
name<-c(rep("beta[0]",3*N),rep("beta[1]",3*N))

df<-tibble(beta=c(beta0,beta1),iteration = rep(iteration,2),chain = rep(chain,2),name)

##  Cumulative Mean Plots

ggplot(filter(df,chain=="1")%>%group_by(name)%>%summarise(beta=cummean(beta), iteration = iteration), aes(x = iteration, y = beta, group = name))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)

##  BGR Plot

ggplot(df,aes(x=iteration, y = beta, group = chain, color = chain ))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)



##  Compute BGR for beta_0

S<-aggregate(beta~chain,FUN = var, data = filter(df, name =="beta[0]"))
W<-mean(S$beta)
B<-var(S$beta)
VAR<-(1-1/N)*W+B
Rhat0<-(VAR/W)%>%sqrt()

##  Compute BGR for beta_1

S<-aggregate(beta~chain,FUN = var, data = filter(df, name =="beta[1]"))
W<-mean(S$beta)
B<-var(S$beta)
VAR<-(1-1/N)*W+B
Rhat1<-(VAR/W)%>%sqrt()

print(paste0("The Brooks-Gelman-Rubin potential scale reduction factor for \\(\\beta_0\\) is \\(\\hat{R}=\\)", round(Rhat0,6)))
print(paste0("The Brooks-Gelman-Rubin potential scale reduction factor for \\(\\beta_1\\) is \\(\\hat{R}=\\)", round(Rhat1,6)))

Question 2

Given data \(x, y\)
set.seed(14061972)

x<-runif(20,-1,1)

lambda<-exp(3.1*x-1.7)

y<-rpois(20,lambda=lambda)
  1. Fit a model assuming that \(y_i\sim Pois(\lambda_i)\), \(\log(\lambda_i)=\beta_0+\beta_1x_i\).
  2. Plot the ACF and compute the effective sample size for each parameter.
library(mvtnorm)

x<-c(0.59, -0.67, 0.7, -0.86, -0.99, 0.92, 0.57, -0.04, -0.57, -0.74, 0.98, -0.81, -0.79, 0.51, 0.09, 0.13, 0.28, 0.14, 0.37, 0.6)
y<-c(4, 0, 2, 0, 0, 4, 2, 0, 0, 0, 4, 0, 0, 3, 0, 0, 0, 0, 0, 1 )

X<-cbind(1,x)

##  log-likelihood function

log_like<-function(beta)
{
  nu<-X%*%beta%>%exp()
  dpois(y,nu,log=TRUE)%>%sum()
}

bayes_pois<-function(x,y,init,n)
{
  
  ##  Initialise beta

  beta<-matrix(NA,ncol = 2, nrow = n)
  beta[1,]<-init

  ##  Set up MCMC

  beta_cov<-crossprod(X)%>%solve()
  beta_sd<-1

  n.accept<-0

  set.seed(14061972)

  for(i in 2:n)
  {
    bc<-rmvnorm(1,beta[i-1,],beta_sd*beta_cov)%>%t()
    alpha<-(log_like(bc)-log_like(beta[i-1,]))%>%exp()
    u<-runif(1)    
    if(u<alpha)
    {
      beta[i,]<-bc
      n.accept%<>%+(1)
    }
    else
    {
      beta[i,]<-beta[i-1,]
    }
    accept.rate<-n.accept/i
  
    ##  Automatically tune the candidate distribution
  
    if(accept.rate<0.2|accept.rate>0.4)
    {
      beta_sd%<>%multiply_by(0.7+accept.rate)
    }
  }
  return(list(beta = beta, accept.rate = accept.rate))
}

##  Then we can generate samples for computing ACF and ESS

N<-10000

out<-bayes_pois(x,y,c(0,0),N)

library(mcmcse)

acf_beta_0<-acf(out$beta[,1],plot = FALSE)$acf
acf_beta_1<-acf(out$beta[,2],plot = FALSE)$acf
lag<-rep(1:length(acf_beta_0),2)
name<-c(rep("beta[0]",length(acf_beta_0)),rep("beta[1]",length(acf_beta_1)))

df<-tibble(acf = c(acf_beta_0,acf_beta_1),lag,name)

ggplot(df,aes(x = lag, y = acf))+
  geom_bar(stat = "identity",alpha = 0.5)+
  geom_hline(yintercept = 0.05,linetype = "dashed")+
  facet_wrap(~name,labeller = label_parsed)

ESS<-mcmcse::ess(out$beta)

print(paste0("The effective sample sizes for \\(\\beta_0\\) and \\(\\beta_1\\) are", round(ESS,2)))

Question 3

Given data \(x, y\)
set.seed(14061972)

x<-runif(20,-1,1)

lambda<-exp(3.1*x-1.7)

y<-rpois(20,lambda=lambda)
  1. Fit a model assuming that \(y_i\sim Pois(\lambda_i)\), \(\log(\lambda_i)=\beta_0+\beta_1x_i\)
  2. Compute the \(p_D\), \(p_V\), \(\bar{D}\) and \(DIC\) and compare the results.
library(mvtnorm)

x<-c(0.59, -0.67, 0.7, -0.86, -0.99, 0.92, 0.57, -0.04, -0.57, -0.74, 0.98, -0.81, -0.79, 0.51, 0.09, 0.13, 0.28, 0.14, 0.37, 0.6)
y<-c(4, 0, 2, 0, 0, 4, 2, 0, 0, 0, 4, 0, 0, 3, 0, 0, 0, 0, 0, 1 )

X<-cbind(1,x)

##  log-likelihood function

log_like<-function(beta)
{
  nu<-X%*%beta%>%exp()
  dpois(y,nu,log=TRUE)%>%sum()
}

bayes_pois<-function(x,y,init,n)
{
  
  ##  Initialise beta

  beta<-matrix(NA,ncol = 2, nrow = n)
  beta[1,]<-init
  dev<-numeric(length = n)
  dev[1]<--2*log_like(init)

  ##  Set up MCMC

  beta_cov<-crossprod(X)%>%solve()
  beta_sd<-1

  n.accept<-0

  set.seed(14061972)

  for(i in 2:n)
  {
    bc<-rmvnorm(1,beta[i-1,],beta_sd*beta_cov)%>%t()
    alpha<-(log_like(bc)-log_like(beta[i-1,]))%>%exp()
    u<-runif(1)    
    if(u<alpha)
    {
      beta[i,]<-bc
      n.accept%<>%+(1)
    }
    else
    {
      beta[i,]<-beta[i-1,]
    }
    accept.rate<-n.accept/i
  
    ##  Automatically tune the candidate distribution
  
    if(accept.rate<0.2|accept.rate>0.4)
    {
      beta_sd%<>%multiply_by(0.7+accept.rate)
    }
    dev[i]<--2*log_like(beta[i,])
  }
  pD<-mean(dev)+2*log_like(colMeans(beta))
  pV<-var(dev)/2
  Dbar<-mean(dev)
  DICd<-Dbar+pD
  DICv<-Dbar+pV
  
  return(list(beta = beta, accept.rate = accept.rate, DICd = DICd, DICv = DICv, pD = pD, pV = pV, Dbar = Dbar))
}

out<-bayes_pois(x,y,c(0,0),10000)
DICd<-out$DICd%>%round(2)
DICv<-out$DICv%>%round(2)
pD<-out$pD%>%round(2)
pV<-out$pV%>%round(2)
Dbar<-out$Dbar%>%round(2)

Practical Questions

Question 4

Given data \(x, y\)
set.seed(14061972)

x<-runif(20,-1,1)

lambda<-exp(3.1*x-1.7)

y<-rnbinom(20,mu=lambda,size = 0.7)
  1. Fit a model assuming that \(y_i\sim Pois(\lambda_i)\), \(\log(\lambda_i)=\beta_0+\beta_1x_i\)
  2. Fit a model assuming that \(y_i\sim NegBinom(\mu_i,\eta)\), \(\log(\mu_i)=\beta_0+\beta_1x_i\) For each model:
  3. Plot the trace of the parameters;
  4. plot the cumulative mean of the parameters and calculate the BGR for three chains;
  5. plot the ACF and compute the effective sample size for each parameter; and
  6. compute the \(p_D\), \(p_V\), \(\bar{D}\) and \(DIC\) for each of these models and compare the results.
library(mvtnorm)

x<-c(0.59, -0.67, 0.7, -0.86, -0.99, 0.92, 0.57, -0.04, -0.57, -0.74, 0.98, -0.81, -0.79, 0.51, 0.09, 0.13, 0.28, 0.14, 0.37, 0.6)
y<-c(6, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0)

##  log-likelihood function

bayes_pois<-function(x,y,init,n)
{

set.seed(14061972)
  
  X<-cbind(1,x)
  
  log_like<-function(beta)
    {
      nu<-X%*%beta%>%exp()
      dpois(y,nu,log=TRUE)%>%sum()
    }
  
  ##  Initialise beta

  beta<-matrix(NA,ncol = 2, nrow = n)
  beta[1,]<-init
  dev<-numeric(length = n)
  dev[1]<--2*log_like(init)

  ##  Set up MCMC

  beta_cov<-crossprod(X)%>%solve()
  beta_sd<-1

  n.accept<-0

  set.seed(14061972)

  for(i in 2:n)
  {
    bc<-rmvnorm(1,beta[i-1,],beta_sd*beta_cov)%>%t()
    alpha<-(log_like(bc)-log_like(beta[i-1,]))%>%exp()
    u<-runif(1)    
    if(u<alpha)
    {
      beta[i,]<-bc
      n.accept%<>%+(1)
    }
    else
    {
      beta[i,]<-beta[i-1,]
    }
    accept.rate<-n.accept/i
  
    ##  Automatically tune the candidate distribution
  
    if(accept.rate<0.2|accept.rate>0.4)
    {
      beta_sd%<>%multiply_by(0.7+accept.rate)
    }
    dev[i]<--2*log_like(beta[i,])
  }
  pD<-mean(dev)+2*log_like(colMeans(beta))
  pV<-var(dev)/2
  Dbar<-mean(dev)
  DICd<-Dbar+pD
  DICv<-Dbar+pV
  return(list(beta = beta, accept.rate = accept.rate, Dbar = Dbar, DICd = DICd, DICv = DICv, pD = pD, pV= pV))
}

#####################

bayes_nbinom<-function(x,y,init,n)
{
  
set.seed(14061972)
  
  log_like<-function(BETA,ETA)
  {
    mu<-X%*%BETA%>%exp()
    dnbinom(y,size = ETA,mu = mu, log = TRUE)%>%sum()
  }
  
  X<-cbind(1,x)
  
  beta<-matrix(NA,ncol = 2, nrow = n)
  eta<-numeric(length = n)
  dev<-numeric(length = n)
  dev[1]<-dnbinom(y,size = init[3],mu = exp(X%*%init[1:2]),log = TRUE)%>%sum()%>%multiply_by(-2)
  beta[1,]<-init[1:2]
  eta[1]<-init[3]
  beta_cov<-crossprod(X)%>%solve()
  beta_sd<-1.5
  eta_var<-1
  beta.accept<-0
  eta.accept<-0
  
    for(i in 2:n)
  {
    bc<-rmvnorm(1,mean = beta[i-1,],sigma = beta_sd*beta_cov)%>%t()
    alpha<-(log_like(bc,eta[i-1])-log_like(beta[i-1,],eta[i-1]))%>%exp()
    u<-runif(1)
    if(alpha>u)
    {
      beta[i,]<-bc
      beta.accept%<>%+(1)
    }
    else
    {
      beta[i,]<-beta[i-1,]
    }
    beta.accept.rate<-beta.accept/i
        
    ##  Automatically tune the candidate distribution
  
    if(beta.accept.rate<0.2|beta.accept.rate>0.4)
    {
      beta_sd%<>%multiply_by(0.7+beta.accept.rate)
    }
    

  ##  update eta

    ac<-eta[i-1]^2/eta_var
    bc<-eta[i-1]/eta_var

    ec<-rgamma(1,ac,bc)

    a<-ec^2/eta_var
    b<-ec/eta_var

    alpha<-(log_like(beta[i,],ec)-log_like(beta[i,],eta[i-1])+dgamma(eta[i-1],a,b,log=TRUE)-dgamma(ec,ac,bc,log = TRUE)-ec+eta[i-1])%>%exp()

  u<-runif(1)
  if(alpha>u)
  {
    eta[i]<-ec
    eta.accept%<>%+(1)
  }
  else
  {
    eta[i]<-eta[i-1]
  }

    ##  Automatically tune the candidate distribution
  
    eta.accept.rate<-eta.accept/i
  
    if(eta.accept.rate<0.2|eta.accept.rate>0.4)
    {
      eta_var%<>%multiply_by(0.7+eta.accept.rate)
    }
  
    dev[i]<--2*log_like(beta[i,],eta[i])
    }
  pD<-mean(dev)+2*log_like(colMeans(beta),mean(eta))
  pV<-var(dev)/2
  Dbar<-mean(dev)
  DICd<-Dbar+pD
  DICv<-Dbar+pV
  return(list(beta = beta, eta = eta, beta.accept = beta.accept, beta.accept.rate = beta.accept.rate,eta.accept = eta.accept, eta.accept.rate = eta.accept.rate, Dbar = Dbar, DICd = DICd, DICv = DICv, pD = pD, pV= pV))
}

##  Now run multiple chains

N<-10000

pois_out<-list()
nb_out<-list()

init<-rbind(c(-1.7,3.1,1),c(0,1,2),c(0,0,3))


for(j in 1:3)
{
  pois_out[[j]]<-bayes_pois(x,y,init[j,1:2],N)
  nb_out[[j]]<-bayes_nbinom(x,y,init[j,],N)
}

p.Dbar<-pois_out[[1]]$Dbar
p.pD<-pois_out[[1]]$pD
p.pV<-pois_out[[1]]$pV
p.DICd<-pois_out[[1]]$DICd
p.DICv<-pois_out[[1]]$DICv

nb.Dbar<-nb_out[[1]]$Dbar
nb.pD<-nb_out[[1]]$pD
nb.pV<-nb_out[[1]]$pV
nb.DICd<-nb_out[[1]]$DICd
nb.DICv<-nb_out[[1]]$DICv

##  Plot the results for Poisson

iteration<-rep(1:N,3)
chain<-c(rep("1",N),rep("2",N),rep("3",N))
name<-c(rep("beta[0]",3*N),rep("beta[1]",3*N))

beta.pois<-rbind(pois_out[[1]]$beta,pois_out[[2]]$beta,pois_out[[3]]$beta)

df.pois<-tibble(beta=c(beta.pois),iteration = rep(iteration,2),chain = rep(chain,2),name)

##  Trace Plots

ggplot(filter(df.pois,chain =="1"), aes(x= iteration, y = beta))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)

##  Cummulative Mean Plots

ggplot(filter(df.pois,chain=="1")%>%group_by(name)%>%summarise(beta=cummean(beta), iteration = iteration), aes(x = iteration, y = beta, group = name))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)

##  BGR Plot

ggplot(df.pois,aes(x=iteration, y = beta, group = chain, color = chain ))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)

##  Plot the results for Negative Binomial

iteration<-rep(1:N,3)
chain<-c(rep("1",N),rep("2",N),rep("3",N))
name<-c(rep("beta[0]",3*N),rep("beta[1]",3*N),rep("eta",3*N))

beta.nb<-rbind(nb_out[[1]]$beta,nb_out[[2]]$beta,nb_out[[3]]$beta)
eta<-c(nb_out[[1]]$eta,nb_out[[2]]$eta, nb_out[[3]]$eta)

df.nb<-tibble(beta=c(beta.nb,eta),iteration = rep(iteration,3),chain = rep(chain,3),name)

##  Trace Plots

ggplot(filter(df.nb,chain =="1"), aes(x= iteration, y = beta))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)

##  Cummulative Mean Plots

ggplot(filter(df.nb,chain=="1")%>%group_by(name)%>%summarise(beta=cummean(beta), iteration = iteration), aes(x = iteration, y = beta, group = name))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)

##  BGR Plot

ggplot(df.nb,aes(x=iteration, y = beta, group = chain, color = chain ))+
  geom_line()+
  facet_wrap(~name, labeller = label_parsed)

##  BGR Computations

BGR<-function(par,DF)
{
  S<-aggregate(beta~chain,FUN = var, data = filter(DF, name == par))
  W<-mean(S$beta)
  B<-var(S$beta)
  VAR<-(1-1/N)*W+B
  Rhat<-(VAR/W)%>%sqrt()
  return(Rhat)
}

bgr.pois<-sapply(unique(df.pois$name),function(x) BGR(x, df.pois))
bgr.nb<-sapply(unique(df.nb$name),function(x) BGR(x, df.nb))

##  ACF and ESS

##  For Poisson

acf_beta_0<-acf(pois_out[[1]]$beta[,1],plot = FALSE)$acf
acf_beta_1<-acf(pois_out[[1]]$beta[,2],plot = FALSE)$acf
lag<-rep(1:length(acf_beta_0),2)
name<-c(rep("beta[0]",length(acf_beta_0)),rep("beta[1]",length(acf_beta_1)))

df<-tibble(acf = c(acf_beta_0,acf_beta_1),lag,name)

ggplot(df,aes(x = lag, y = acf))+
  geom_bar(stat = "identity",alpha = 0.5)+
  geom_hline(yintercept = 0.05,linetype = "dashed")+
  facet_wrap(~name,labeller = label_parsed)

##  For negative-binomial

acf_beta_0<-acf(nb_out[[1]]$beta[,1],plot = FALSE)$acf
acf_beta_1<-acf(nb_out[[1]]$beta[,2],plot = FALSE)$acf
acf_eta<-acf(nb_out[[1]]$eta,plot = FALSE)$acf
lag<-rep(1:length(acf_beta_0),3)
name<-c(rep("beta[0]",length(acf_beta_0)),rep("beta[1]",length(acf_beta_1)),rep("eta",length(acf_eta)))

df<-tibble(acf = c(acf_beta_0,acf_beta_1,acf_eta),lag,name)

ggplot(df,aes(x = lag, y = acf))+
  geom_bar(stat = "identity",alpha = 0.5)+
  geom_hline(yintercept = 0.05,linetype = "dashed")+
  facet_wrap(~name,labeller = label_parsed)

pois.ess<-mcmcse::ess(pois_out[[1]]$beta)
nb.ess<-mcmcse::ess(cbind(nb_out[[1]]$beta,nb_out[[1]]$eta))

MXB341 Worksheet 12 - MCMC Diganostics