Politicians use statistics in the same way that a drunk uses lampposts — for support rather than illumination.

—Andrew Lang (1910)

Week 2: Some properties of estimators

We would like to know some “good” properties of the estimators we use. These properties help us compare estimators and understand their limitations¹. Recall that an estimator \(\hat{\theta}_{n} = t(y_{1},y_{2},\ldots,y_{n})\), is a transformation of the observations, which are assumed to follow a given probability distributions. Hence, we can use probability theory to determine the following characteristics. The properties we explored in the lectures were:

Properties of Estimators

Bias

Bias is the difference between the expected value of our estimator and the true value of the parameter. If the first goal is to be correct, we would expect that it would be desirable that our estimator would (on average) share the same value as the parameter. Because our estimator is a function of data and itself a random variable, it is reasonable to assume that in any given circumstance (i.e. for a given sample of data), the estimator’s value may exceed or fall short of the true value of the parameter. That is for some estimator \(\hat{\theta}_n = g(x_1,x_2,\ldots,x_n)\) of the parameter \(\theta\) there will be some difference. Still, ideally, if we averaged these differences over all possible samples, they would cancel out. \[ \operatorname{E}\left(\hat{\theta}_n-\theta\right)=0\equiv \operatorname{E}\left(\hat{\theta}_n\right)-\theta. \]

Depending on how we calculate the estimator, it may be biased or tend to be larger or smaller than the parameter’s true value. So we formally define bias in terms of the this average result \[ \text{Bias} = \operatorname{E}\left(\hat{\theta}_{n}\right) - \theta \] or the difference between the expected value of our estimator and the true value of the parameter.

If \(\text{E}(\hat{\theta}_{n}) = \theta\), i.e. Bias \(=0\), then the estimator \(\hat{\theta}\) is unbiased.
If \(\text{E}(\hat{\theta}_{n}) \neq \theta\), i.e. Bias \(\neq 0\) then the estimator is biased.

How can we tell if politicians, or any other individual or group, are misusing statistics if we don’t know their limitations?↩︎

Mean squared error

Bias tells us if an estimator is “on target” and is one measure of estimators, but what if we have more than one unbiased estimator to choose from? In this case, we might also want to consider the variance of the estimator. Recall that variance is the expected squared distance between the estimator and its expected value \[ \operatorname{Var}(\hat{\theta}_n)=\operatorname{E}\left[\left(\hat{\theta}_n- \operatorname{E}\left(\hat{\theta}_n\right)\right)^2\right]. \] Given two unbiased estimators, it makes sense that we would prefer the one with the smaller variance because, on average, the estimator with the smaller variance is on average closer to its expected value.

This preference implies a general measure for assessing an estimator, the mean squared error. The mean squared error (MSE) is a measure of how good an estimator performs; essentially, it is on average how far the estimator is from the true value of the parameter \[ \text{MSE} = \operatorname{E}\left[\left(\hat{\theta}_{n} - \theta\right)^2\right]. \] Note that this definition is slightly different from the definition of variance. In the case of unbiased estimators, it simply reduces to the variance, as \(\operatorname{E}(\hat{\theta}_n)=\theta\).

Instead, we can see that we can decompose the MSE can into the sum of two parts, the variance and the square of the bias \[ \text{MSE}=\operatorname{Var}\left(\hat{\theta_n}\right)+\text{Bias}\left(\hat{\theta}_n\right)^2 \]

Consistency

While bias is a property of the estimator with respect to its expectation from the sampling distribution, the property of consistency describes the property of the estimator in the limit as the sample size approaches infinity. An estimator is said to be consistent if \[ \lim_{n\rightarrow\infty}\left|\hat{\theta}_n-\theta\right|=0. \]

Both of the traits of bias and consistency are important to evaluating estimators; it is important to understand the nuance of their slightly different meanings,

If an estimator is unbiased, then \[ \text{MSE}\left(\hat{\theta}_n\right)=\operatorname{Var}\left(\hat{\theta}_n\right) \] If an estimator is consistent, then \[ \text{MSE} \rightarrow 0 \text{ as } n \rightarrow \infty. \] We can think of unbiasedness and consistency in other terms as well. An unbiased estimator will, on average, be equal to the true value of the parameter, regardless of sample size. A consistent estimator will eventually recover the true value of the parameter with infinite data.

Efficiency

For a given probability density function, \(f(x;\theta)\) under some regularity conditions

\[ \text{Var}(\hat{\theta}_{n}) \geq \mathcal{I}_{n}(\theta)^{-1} \] In other words, the variance of any estimator has a lower bound defined by the inverse of the Fisher information computed using the log-likelihood. Note that there are no conditions on the estimator \(\hat{\theta}_n\) in terms of bias or consistency, but any unbiased estimator that attains the Cramér-Rao lower bound is said to be efficient, i.e. if \[ \operatorname{Var}\left(\hat{\theta}_n\right)=\mathcal{I}\left(\theta\right)^{-1} \] then \(\hat{\theta}_n\) is said to be efficient.

Theory questions

Question 1

Let \(y_{1},y_{2},\ldots,y_{n}\) be a random sample of observations from a population described by the binomial probability model: \[ p(y~\vert~\theta) = \left(\begin{array}{c} k\\y\end{array}\right)\theta^y(1-\theta)^{k-y}, \quad \text{for} \quad y=0,1,2,\ldots,k,\quad \text{and} \quad \theta \in (0,1). \] where \(k\) is known.

What is the distribution of \(s=\sum_{i=1}^{n} y_{i}\)? Is the distribution Binomial?
Determine the Cramér-Rao lower bound (CRLB) for unbiased estimates of \(\theta\).
Determine the MLE estimate of \(\theta\). Is the maximum likelihood estimate \(\hat{\theta}\) unbiased?
How does the variance of \(\hat{\theta}\) compare with the MVB? Explain the significance of your answer.

Solution:
a. If \(y_{i} \sim \text{Bin}(k,\theta)\), then \(s = \sum_{i=1}^n y_i \sim \text{Bin}(nk,\theta)\). Proof omitted, using MGF is one option.
b. \[ \mathcal{I}_n(\theta) = \operatorname{E}\left[- \frac{d^2 \ell(\theta)}{d \theta^2}\right] \] \[ = \operatorname{E}\left[\sum_{i=1}^n \frac{y_i}{\theta^2} - \frac{k-y_i}{(1-\theta)^2} \right] \] \[ = \sum_{i=1}^n \left[\frac{\operatorname{E}(y_i)}{\theta^2}+ \frac{k-\operatorname{E}(y_i)}{(1-\theta)^2} \right] \] \[ = \sum_{i=1}^n \left[\frac{k}{\theta}+ \frac{k}{1-\theta} \right] \] noting that \(\operatorname{E}(y_i)=k\theta\) \[ = \frac{nk}{\theta(1-\theta)}. \] thus the CRB is: \(\operatorname{Var}(\hat{\theta}) \geq \frac{\theta(1-\theta)}{nk}\).
c. The log-likelihood is \[ l(\theta) = \sum_{i=1}^{n} \left\{ y_{i} \log \theta + (k-y_i)\log (1-\theta) \right\} + C. \] The first and second derivatives are \[ \frac{d l(\theta)}{d \theta} = \sum_{i=1}^n \frac{y_i}{\theta} - \frac{k-y_i}{1-\theta} \] \[ \frac{d^2 \ell(\theta)}{d \theta^2} = -\sum_{i=1}^n \frac{y_i}{\theta^2} - \frac{k-y_i}{(1-\theta)^2}. \] The MLE \(\hat{\theta}\) is found by solving \[ \frac{d l(\theta)}{d \theta}\biggr\rvert_{\theta=\hat{\theta}} = \sum_{i=1}^n \frac{y_i}{\hat{\theta}} - \frac{k-y_i}{1-\hat{\theta}}=0 \] yielding \[ \hat{\theta} = \frac{\sum_{i=1}^ny_i}{nk}, \] the pooled proportion of successes. Note that \(-\frac{d^2 l(\theta)}{d \theta^2}\) is \(>0\) for all \(\theta\in (0,1)\) (a convex optimisation) so that the MLE is a maximum and unique. The expectation of the MLE is given by \[ \operatorname{E}(\hat{\theta}) = \operatorname{E}\left(\frac{\sum_{i=1}^n y_i}{nk}\right) \] \[ \operatorname{E}(\hat{\theta})= \sum_{i=1}^n\frac{E\left(y_i\right)}{nk} \] \[ \operatorname{E}(\hat{\theta})= \sum_{i=1}^n \frac{\theta}{n}, \] by noting that \(\operatorname{E}(y_i)=k\theta\) \[ \operatorname{E}(\hat{\theta})=\theta, \] and hence the maximum likelihood estimate \(\hat{\theta}\) is an unbiased estimate of \(\theta\).
d. The variance of the MLE is given by \[ \operatorname{Var}(\hat{\theta}) = \text{var}\left(\frac{1}{nk}\sum_{i=1}^n y_i\right) \] \[ =\frac{1}{n^2k^2}\text{var}\left(\sum_{i=1}^n y_i\right) \] \[ =\frac{1}{n^2k^2}\text{var}(s) \] \[ =\frac{nk\theta(1-\theta)}{n^2k^2} \quad \text{using part (a)} \] \[ = \frac{\theta(1-\theta)}{nk}. \] Hence we note that the MLE \(\hat{\theta}\) is an efficient estimator since its variance is equal to the Cramér-Rao minimum variance bound.

Question 2

Let \(y_{1},y_{2},\ldots,y_{n}\) be a random sample of observations from a population described by the Uniform probability density function \[ p(y~\vert~\theta)=\begin{cases}\frac{1}{\theta}, \qquad 0 \leq y \leq \theta\\ 0, \qquad \text{otherwise}. \end{cases} \] Two potential estimates of \(\theta\) are \[ \hat{\theta}_{A}=\frac{2}{n}\sum_{i=1}^n y_{i} \] \[ \hat{\theta}_{B} = \frac{n+1}{n}\max\{y_{1},y_{2},\ldots,y_{n}\} \]

Find the mean and variance of statistics \(s_{1} = \sum_{i=1}^{n} y_{i}\) and \(s_{2} = \max(y_{1},y_{2},\ldots,y_{n})\)².
Show that both these estimates are unbiased.
Which has the smaller sampling variance?
Are either estimators consistent? Explain your answer.
Is it possible to determine if either estimators are efficient? Explain your answer.

Solution:
a. For the mean of \(s_{1}\): \(E(y_{i}) = \theta/2\), therefore \(E(s_{1}) = n\theta/2\). Whereas for the variance: \(\operatorname{var}(y_{i}) = \theta^{2}/12\), so \(\operatorname{var}(s_{1}) = n\theta^{2}/12\).
Let \(y_{(n)}\) be the maximum of the sample (\(n\)th order statistic). \[P(y_{(n)} < x) = P(y_{1} < x, \ldots, y_{n} < x) = \prod_{i=1}^{n}P(y_{i} < x) = (x/\theta)^{n}.\] For \(0 \leq x \leq \theta\). From this, the PDF for \(y_{(n)}\) is \[ p(x) = \frac{nx^{n-1}}{\theta^{n}} \quad \operatorname{ for } 0 \leq x \leq \theta, \] the expectation is \[ \operatorname{E}(y_{(n)}) = \frac{n \theta}{n+1}, \] and the variance is \[ \operatorname{Var}(y_{(n)}) = \frac{n \theta^{2}}{(n+1)^{2}(n+2)}. \]
b. Using part a., \[ \operatorname{E}(\hat{\theta}_{A}) = \frac{2}{n}\frac{n\theta}{2} = \theta \] and \[ \operatorname{E}(\hat{\theta}_{B}) = \frac{n+1}{n}\frac{n\theta}{n+1} = \theta \] therefore both estimators are unbiased.
c. Using part a., \[ \operatorname{Var}(\hat{\theta}_{A}) = \left(\frac{2}{n}\right)^{2}\operatorname{Var}(s_{1}) = \left(\frac{2}{n}\right)^{2} \frac{n\theta^{2}}{12} = \frac{\theta^{2}}{3n} \] \[ \operatorname{Var}(\hat{\theta}_{B}) = \frac{(n+1)^{2}}{n^{2}}\operatorname{Var}(s_{2}) = \frac{(n+1)^{2}}{n^{2}} \frac{n \theta^{2}}{(n+1)^{2}(n+2)} = \frac{\theta^{2}}{n(n+2)}. \] Since \(n(n+2) \geq 3n\) for \(n \geq 1\), \(\operatorname{Var}(\hat{\theta}_{B})\) is always smaller of the two variances.

d. Yes, both are consistent. Since both estimator are unbiased, it suffices to show that \(\operatorname{Var}(\hat{\theta}_{A}) \rightarrow 0\) and \(\operatorname{Var}(\hat{\theta}_{B}) \rightarrow 0\) as \(n \rightarrow \infty\). Note that \(\operatorname{Var}(\hat{\theta}_{B}) \rightarrow 0\) approaches 0 much faster, which is a desirable property.

e. The CRLB does not apply in this case because one of the regularity conditions does not hold. In this case, it is because the parameter, \(\theta\), determines the support of the distribution (This stops the order of integration and differentiation being swapped in the CRLB proof.). While we can't determine absolute efficiency of either estimator, we can say that the \(\hat{\theta}_{B}\) is more efficient since its variance is always smaller.

Hint: For the second statistic, determine the distribution of \(s_{2}\), which is the \(n\)th order statistic \(y_{(n)}\) where \(y_{(1)} \leq y_{(2)}\leq \ldots \leq y_{(n)}\). Start by finding the CDF of the maximum.↩︎

Question 3

Suppose \(x_{1}, x_{2}, \ldots, x_{n} \overset{\text{iid}}{\sim} N(\mu,\sigma^2)\), where the MLE \(\hat{\mu}_{ML} = \bar{x}\).

Show the MLE of the variance is \(\widehat{\sigma^2}_{ML} = n^{-1}\sum_{i=1}^{n}(x_{i} - \bar{x})^{2}\).
Show that \(\widehat{\sigma^2}_{ML}\) is biased and suggest an unbiased estimator³.
Calculate the expected Fisher information for the data.
What is the observed Fisher information?
Is \(\hat{\mu}_{ML}\) efficient? Is \(\widehat{\sigma^2}_{ML}\) efficient?

Solution:
a. The log-likelihood is \[ \log L (\mu, \sigma) = -\frac{n}{2}\left( \log \sigma^2 + \log (2 \pi) \right) - \frac{1}{2}\sum_{i=1}^{n}\left( \frac{(x_{i} - \mu)^2}{\sigma^2} \right) \] \[ \frac{\partial \log L}{\partial \sigma^2} = -\frac{n}{2}\sigma^{-2} + \frac{1}{2}\sum_{i=1}^{n}\left( \frac{(x_{i} - \mu)^2}{\sigma^4} \right) \] Solving \(\frac{\partial \log L}{\partial \sigma^2} = 0\) gives \[ \widehat{\sigma^{2}}_{ML} = \frac{\sum_{i=1}^{n}(x_{i} - \mu)^2}{n} = \frac{\sum_{i=1}^{n}(x_{i} - \bar{x})^2}{n} \]

b. From the hints we have - \(\sum_{i=1}^{n}(x_{i} - \bar{x})^{2} = \sum_{i=1}^{n}(x_{i} - \mu)^2 - n(\bar{x} - \mu)^2\) (1) - \(\frac{\sum_{i=1}^{n}(x_{i} - \mu)^{2}}{\sigma^2} \sim \chi^{2}_{n}\) (with mean \(n\)) - \(\bar{x} \sim N(\mu, \sigma^2/n)\) You should justify each of these to yourself. We will find the expectation for each term in (1) then combine: For the first term in (1), if we know that \(\operatorname{E}\left[\frac{\sum_{i=1}^{n}(x_{i} - \mu)^{2}}{\sigma^2}\right] = n\), therefore \(\operatorname{E}\left[\sum_{i=1}^{n}(x_{i} - \mu)^{2}\right] = \sigma^2n\) by linearity. As for the second term, \(E[n(\bar{x} - \mu)^2] = n\operatorname{E}[(\bar{x} - \mu)^2] = n \operatorname{Var}(\bar{x}) = \sigma^{2}\). Bringing the two together: \[ \operatorname{E}\left[ \frac{\sum_{i=1}^{n}(x_{i} - \bar{x})^{2}}{n} \right] = \frac{\operatorname{E}\left[\sum_{i=1}^{n}(x_{i} - \mu)^2\right] - \operatorname{E}\left[n(\bar{x} - \mu)^2\right]}{n} \] \[ = \frac{n\sigma^{2} - \sigma^{2}}{n} \] \[ = \frac{n-1}{n} \sigma^{2} \] therefore it is not unbiased. An unbiased estimator would be \(\widehat{\sigma^2}_{U} = \frac{n-1}{n} \times \widehat{\sigma^2}_{ML} = \frac{\sum_{i=1}^{n}(x_{i} - \bar{x})^2}{n-1}\). This can be interpreted as losing one degree of freedom in the denominator for estimating \(\mu\).

c. The second order partial derivatives of the log-likelihood is \(l = \log L (\mu, \sigma)\) are - \(\frac{\partial ^2l}{\partial \mu^2} = - \frac{n}{\sigma^2}\) - \(\frac{\partial ^2l}{\partial (\sigma^2)^{2}} = \frac{n}{2 (\sigma^2)^2} - \frac{\sum_{i=1}^{n}(x_{i} - \mu)^2}{(\sigma^2)^3}\) - \(\frac{\partial ^2l}{\partial \mu \partial \sigma^2} = - \frac{n(\bar{x} - \mu)}{(\sigma^2)^2}\) and the expectations of each of these are - \(\operatorname{E}\left[\frac{\partial ^2l}{\partial \mu^2}\right] = - \frac{n}{\sigma^2}\) - \(\operatorname{E}\left[\frac{\partial ^2l}{\partial \sigma^2}\right] = \frac{n}{2 (\sigma^2)^2} - \frac{n \operatorname{Var}(x_{i})}{(\sigma^2)^3} = - \frac{n}{2(\sigma^2)^2}\) - \(\operatorname{E}\left[\frac{\partial ^2l}{\partial \mu \partial \sigma^2}\right] = - \frac{n(E(\bar{x}) - \mu)}{(\sigma^2)^2} = 0\) Therefore the expected Fisher information matrix is \[ \mathcal{I}_{n}(\mu,\sigma^2) = \begin{bmatrix} \frac{n}{\sigma^2} & 0 \\ 0 & \frac{n}{2(\sigma^2)^2} \end{bmatrix} \] as it is the negative of the log-likelihood Hessian matrix.

d. For the observed information, take the negative of the derivatives \(\frac{\partial ^2l}{\partial \mu^2}, \frac{\partial ^2l}{\partial (\sigma^2)^{2}} , \frac{\partial ^2l}{\partial \mu \partial \sigma^2}\) \[ \mathcal{J}(\mu,\sigma^2) = \begin{bmatrix} \frac{n}{\sigma^2} & \frac{n(\bar{x} - \mu)}{(\sigma^2)^2} \\ \frac{n(\bar{x} - \mu)}{(\sigma^2)^2} & \frac{n}{2 (\sigma^2)^2} - \frac{\sum_{i=1}^{n}(x_{i} - \mu)^2}{(\sigma^2)^3} \end{bmatrix} \] if evaluated at the MLE, this simplifies to \[ \mathcal{J}(\hat{\mu},\hat{\sigma}^2) = \begin{bmatrix} \frac{n}{\hat{\sigma}^2} & 0 \\ 0& \frac{n}{2 (\hat{\sigma}^2)^2} \end{bmatrix} \]

e. \(\hat{\mu}_{ML}\) is unbiased and \[ \operatorname{Var}(\hat{\mu}_{ML}) = \operatorname{Var}(\bar{x}) = \frac{\sigma^2}{n} \] For a fixed \(\sigma^2\), \(\mathcal{I}_{n}(\mu) = \frac{n}{\sigma^2}\). Therefore, \(\operatorname{Var}(\hat{\mu}_{ML}) = \mathcal{I}_{n}(\mu)^{-1}\) and \(\hat{\mu}_{ML}\) is efficient. \(\widehat{\sigma^2}_{ML}\) is not efficient as it is not unbiased.

Hint: Use the following identities to help you:
a - \(\sum_{i=1}^{n}(x_{i} - \bar{x})^{2} = \sum_{i=1}^{n}(x_{i} - \mu)^2 - n(\bar{x} - \mu)^2\).
b - \(\frac{\sum_{i=1}^{n}(x_{i} - \mu)^{2}}{\sigma^2} \sim \chi^{2}_{n}\) (with mean \(n\)),
c - \(\bar{x} \sim N(\mu, \sigma^2/n)\).↩︎

Performing simulation studies

This week we will perform a simulation study to see if we can replicate the estimator’s properties we found in the theory section. Here’s an example of running a simulation study in R for the scenario of estimating the number of poodle puppies in a litter.

Eleanor Roosevelt (the poodle) had a litter of 10 puppies!

Suppose a Poisson distribution describes the size, \(y\), of a randomly selected litter of poodle puppies, i.e. \(y \sim \text{Pois}(\lambda)\) where the pdf is \[ p(y~\vert~\lambda) = \frac{e^{-\lambda}\lambda^y}{y!}, \quad \text{ for } \quad y=0,1,2,\ldots \quad \text{and} \quad \lambda \in (0,\infty). \]

The MLE estimate of \(\lambda\) from an iid random sample of Poisson variables is

\[ \hat{\lambda} = n^{-1}\sum_{i=1}^{n} y_{i}, \]

i.e. the sample mean and the variance of this estimator is

\[ \text{Var}(\hat{\lambda}) = \lambda \] To perform a simulation study, we have to choose to assume a true value of \(\lambda\) as a baseline truth to compare. To begin, let’s set the true \(\lambda = 6\). We also have to select some default number of observations, \(n\); we expect to use in a real data analysis, say \(n = 30\) for now. The steps we will perform for the simulation study are:

Simulate data according to the appropriate distribution, using the constants chosen for this study, e.g. \(\lambda = 6\) and \(n = 30\).
Perform the method we are investigating, e.g. MLE estimation.
Save results and calculate some properties of the method.
Repeat steps 1-3 many times.
Calculate aggregates of the results and inspect summaries (e.g. average estimate, mean squared error etc.).


  # set fixed constants
  n_obs <- 30
  true_lambda <- 6
  
  # example code for one simulation
  
  # sample data
  y <- rpois(n = n_obs, lambda = true_lambda)
  
  # method: MLE
  mle_lambda <- mean(y)
  # in this case, plugin the estimate for variance
  var_mle_lambda <- mle_lambda 
  
  # bias
  bias_mle_lambda <- mle_lambda - true_lambda
  
  # squared error
  sq_err_mle_lambda <- (mle_lambda - true_lambda)^2
  
  # check that these steps work when you 
  # implement your simulation study!

After checking that the code works for one simulation, convert the code into a function:


  library(purrr)
  library(dplyr)
  library(ggplot2)
  # function that performs one simulation...
  
  pois_mle_sim <- function(n_obs, true_lambda){
    
    # sample data
    y <- rpois(n = n_obs, lambda = true_lambda)
  
    # method: MLE
    mle_lambda <- mean(y)
    # in this case, plugin the estimate for variance
    var_mle_lambda <- mle_lambda
    
    # bias
    bias_mle_lambda <- mle_lambda - true_lambda
  
    # squared error
    sq_err_mle_lambda <- (mle_lambda - true_lambda)^2
    
    # save in tibble
    out_tb <- tibble(
      mle = mle_lambda,
      var_mle = var_mle_lambda,
      bias_mle = bias_mle_lambda,
      sq_err = sq_err_mle_lambda
    )
    
    # return as output
    return(out_tb)
    
  }
  
  # check that the function works!
  
  # run many times...
  sim_results <- 
    rerun(.n = 100, 
          pois_mle_sim(n_obs = 30, true_lambda = 6) 
          ) %>%
    bind_rows()

  # summarise results (just a few example ways to summarise)
  
  # mean mle, mse
  sim_results  %>% select(mle, sq_err) %>% summarise_all(mean)
  
  # selects columns to plot then stacks columns
  sim_results %>% select(bias_mle, sq_err) %>% stack() %>%
  # sends to ggplot with "%>%"
  ggplot() +
    geom_boxplot(aes(x = ind, y = values)) + 
    xlab("") + ylab("Values from simulation") +
    theme_bw()

and plot the results


  sim_results %>% select(bias_mle) %>%
  ggplot() +
    geom_histogram(aes(x = bias_mle), binwidth = 0.1) + 
    theme_bw()

Practical questions

Question 4

Using the example simulation study code as a template, run a simulation study for the two estimators in Q2 (theory section).

Compare the theoretical results obtained in Q2 to the results from the simulation study. Use \(n = 25\) and the true value of \(\theta = 10\).
Write some code to run the simulation study automatically for \(n = 10, 25, 100, 250, 1000\). Plot some summaries for each estimator from the simulation study with \(n\) on the \(x\)-axis.
Comment on what the simulation study shows. Does it agree with the theoretical results? Why/why not?

Solution:


  
    library(purrr)
    library(dplyr)
    library(tidyr)
    library(ggplot2)
  
    # function that performs one simulation...
    
    unif_estimators_sim <- function(n_obs, true_theta){
      
      # sample data
      y <- runif(n = n_obs, min = 0, max = true_theta)
      
      # estimators:
      theta_est_A <- 2 * mean(y)
      theta_est_B <- max(y) * (n_obs + 1) / n_obs
      
      # estimator's estimated variance
      var_theta_est_A <- (theta_est_A^2)/(3 * n_obs)
      var_theta_est_B <- (theta_est_A^2)/(n_obs * (n_obs + 2)) 
      
      # empirical bias (true bias is zero)
      bias_theta_est_A <- theta_est_A - true_theta
      bias_theta_est_B <- theta_est_B - true_theta
      
      # empirical squared error
      sqerr_theta_est_A <- (theta_est_A - true_theta)^2
      sqerr_theta_est_B <- (theta_est_B - true_theta)^2
      
      # save in tibble
      out_tb <- tibble(
        est = c(theta_est_A, theta_est_B),
        var_est = c(var_theta_est_A, var_theta_est_B),
        bias = c(bias_theta_est_A, bias_theta_est_B),
        sq_err = c(sqerr_theta_est_A, sqerr_theta_est_B),
        estimator = c("A", "B")
      )
      
      # return as output
      return(out_tb)
      
    }


    # check that the function works!
    
    # run many times...
    sim_results <- 
      rerun(.n = 100, 
            unif_estimators_sim(n_obs = 25, true_theta = 10) 
      ) %>%
      bind_rows() %>% 
      group_by(estimator) 
    # need to group so that results for A/B are seperate
    
    # summarise results (just a few example ways to summarise)...
    
    # mean estimator, mse
    sim_results %>% 
      select(est, sq_err) %>% summarise_all(mean)
    
    # selects columns to plot then stacks columns
    sim_results %>% select(bias, sq_err) %>% 
      pivot_longer(cols = c(bias, sq_err)) %>% 
      # sends to ggplot with "%>%"
      ggplot() +
      geom_boxplot(aes(x = name, y = value)) + 
      facet_wrap(~estimator) +
      xlab("") + ylab("Values from simulation") +
      theme_bw()
    
    sim_results %>% select(bias) %>%
      ggplot() +
      geom_histogram(aes(x = bias), binwidth = 0.1) +
      facet_wrap(~estimator) +
      theme_bw()