Skip to Tutorial Content

MXB341 Blackboard Page   MXB341 Statistical Inference

Please beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion.

—Stephen Stigler

Week 4: Hypothesis testing

Once we have estimated statistical parameters in a model (week 1-3) and are confident that we have appropriately described the data, we can begin to make a statistical inference. Inferential statistics is concerned with inferring properties from a sample of the population to the entire population. In particular, conducting a hypothesis test on a parameter assesses the evidence (from the known observations) that a parameter takes a certain value (or is in a set of values).

Hypothesis testing is a cornerstone of frequentist statistics1 and modern inductive science2. We focus on likelihood ratio tests (LRT), which compare the evidence supporting the null hypothesis versus the alternative hypothesis. There are two cases we consider.

Likelihood Ratio Tests

Simple Hypotheses

A simple-versus-simple hypothesis test occurs the null, and the alternative hypothesis consists of single values. More specifically \[ H_{0}:\theta = \theta_{0}\mbox{ and } H_{1}:\theta = \theta_{1} \] The likelihood ratio is

\[\Lambda = \frac{L(\theta_{0}~\vert~\boldsymbol{y})}{L(\theta_{1}~\vert~\boldsymbol{y})}.\] For this case, the Neyman–Pearson Lemma states that this likelihood-ratio test is the most powerful for all \(\alpha\)-level tests. Note: \(\Lambda\) is a statistic and depends on the data; as such, before we can treat it as a random variable before observing the data. For a fixed constant \(c\), the decision rule for this hypothesis test is:

  • Do not reject \(H_{0}\) if \(\Lambda > c\)
  • Reject3 \(H_{0}\) if \(\Lambda < c\)

The constant \(c\) is chosen with respect to the hypothesis test’s properties (see ``Properties of hypothesis tests’’). As such, the method for determining \(c\) is

  1. Assume the null hypothesis, \(H_{0}\), is true.
  2. Simplify the rejection region so that it involves a quantity for which the distribution is known.
  3. Choose \(c\) to decide the Type I and/or Type II error based on part 2.

The hypothesis test assumes \(H_{0}\) is true so that evidence against the null occurs when we make unlikely observations (with respect to the probability distribution).

Question 1 explores this further.


  1. More on this later in the semester. But for a jump start, see here.↩︎

  2. Despite being the dominating paradigm for assessing scientific evidence, like any method, it is not without flaws and can be misused.↩︎

  3. The rejection region is defined as \(\Lambda < c\) but can usually be simplified.↩︎

Composite Hypotheses

We cab handle other hypotheses using the generalised likelihood test (GLRT). This hypothesis test partitions the parameter space, \(\Theta\), into two sets, \(\Theta_{0}\) and \(\Theta_{1}\):

  • The null hypothesis: \(H_{0} = \{\theta:\theta \in \Theta_{0}\}\), and
  • The alternative hypothesis4: \(H_{1} = \{\theta:\theta \in \Theta_{1}\}\)

The ratio for this test is: \[ \Lambda = \frac{\sup_{\theta \in \Theta_{0}} L(\theta~\vert~\boldsymbol{y})}{\sup_{\theta \in \Theta} L(\theta ~\vert~\boldsymbol{y})}. \] We compare the likelihoods for the most likely parameter under the null hypothesis to the most likely parameter in the entire parameter space.

Note that simple likelihood ratio tests are a specific type of GLRT.


  1. Note that \(\Theta_{1}\) is the complement of \(\Theta_{0}\), \(\Theta_{1}^{\complement} = \Theta_{0}\). Hence the two sets cover the entire parameter space.↩︎

Type of Errors

Considering the properties of hypothesis tests (just like the properties of estimators) is also important. Important properties include:

  • Type I error (Significance): \[ \alpha = P(\text{reject } H_{0}~\vert~H_{0} \text{ is true}) \]
  • Type II error: \[ \beta = P(\text{do not reject } H_{0}~\vert~H_{1} \text{ is true}) \]
  • Test power: \[ \text{power} = 1 - \beta = P(\text{reject } H_{0}~\vert~H_{1} \text{ is true}). \] It is important to consider these error rates and power when designing test and experiments. Because Type I error is considered the more serious error, we usually fix that rate and then adjust other parameters (typically difference between the true value and hypothesised value and sample size) to obtain the desired power.

Wilk’s theorem

For large samples (i.e. \(n\rightarrow \infty\)), a log-likelihood ratio has an approximate Chi-squared distribution \[ 2 \left( \ell(\hat{\boldsymbol{\theta}}) - \ell(\boldsymbol{\theta}) \right) \sim \chi^{2}_{p} \] where \(p\) is the number of parameters estimated. Often it is neccessary to use a Taylor series approximation on \(2 \left( \ell(\hat{\boldsymbol{\theta}}) - \ell(\boldsymbol{\theta}) \right)\) because it is hard to invert. This leads to the approximate distribution \[ \left( \hat{\boldsymbol{\theta}} - \boldsymbol{\theta} \right)^{\top}\mathcal{J} \left( \hat{\boldsymbol{\theta}} - \boldsymbol{\theta} \right) \sim \chi^{2}_{p} \] where \(\mathcal{J}\) is the observed information matrix.

Theory questions

Question 1

Let \(y_{1},y_{2},\ldots,y_{n}\) be a random sample of observations from a population described by a normal distribution with mean equal to zero and variance \(\sigma^2\).

  1. What is the probability distribution of \(T = \frac{1}{\sigma^2}\sum_{i=1}^n y_i^2\)? State what it is, no need to show proof.

  2. Determine the likelihood ratio, \(\Lambda\), for the most powerful test of \(H_0: \sigma^2=\sigma^2_0\) against \(H_1:\sigma^2=\sigma^2_1\).

  3. Simplify the rejection region5, \(\Lambda < c\), in terms of the statistic \(T\). Note that \(c\) is a constant6 with \(0<c<1\).

  4. Determine when the null hypothesis will be rejected using significance level \(\alpha\).


  1. Hint: Consider cases \(\sigma^2_1 > \sigma^2_0\) and \(\sigma^2_1 < \sigma^2_0\) separately.↩︎

  2. Hint: The constant \(c\) may be combined with other constants to simplify the derivation.↩︎

Question 2

Let \(y_1,y_2,\ldots,y_n\) be a random sample from a population described by the normal \(N(\theta,a\theta^2)\) distribution, that is the variance of the distribution is \(a\theta^2\) for \(a > 0\). Determine the form of the generalised likelihood ratio test of \(H_0:a=1\) against \(H_1:a\ne 1\). Start by finding the MLEs of the distribution7.


  1. Hint: the maximum likelihood estimates from the normal distribution \(N(\mu,\sigma^{2})\) are \(\hat{\mu} = \bar{y}\) and \(\hat{\sigma}^{2} = n^{-1}\sum_{i=1}^{n} (y_{i} - \bar{y})^{2}\). The MLEs have an invariance property, so that the MLEs of \(\theta\) and \(a\) can be determined from those of \(\theta\) and \(a\theta\) by transformation.↩︎

Practical questions

Note on random variables in R

For this section, you will need to use random variables in R. In base R; these have a particular naming convention:

  • d<rv>(): probability density (or mass) function. The univariate normal distribution has dnorm(x, mean = 0, sd = 1) for example.
  • p<rv>(): distribution function, \(P(X < q)\). The chi-squared has pchisq(q, df, lower.tail = TRUE) for example.
  • q<rv>(): quantile function, \(q\) such that \(P(X < q) = p\). The binomial distribution has qbinom(p, size, prob, lower.tail = TRUE) for example.
  • r<rv>(): random variable generator. The exponential distribution has rexp(n, rate = 1) for example.

Question 3

Suppose you observe the data, \(\boldsymbol{y}\) given below, and you wish to perform the hypothesis test in Question 1 (theory) with \[ H_{0}: \sigma^{2} = 100$ \mbox{ versus } $H_{1}: \sigma^{2} = 20. \]

  1. Do you reject the null hypothesis at \(\alpha = 0.05\)? What is the \(p\)-value8 of the test?

  2. Plot the distributions relating to the null and alternative hypothesis, along with where the observations occurred. Discuss how this graph relates to the hypothesis test.

The data are: \(\mathbf{y}=(9.91, 4.37, -1.06, 7.63, 0.78, 2.73, -1.29, 2.78, -4.98, 2.45)\)


  1. The \(p\)-value of a test is the probability of observing a test statistic equal to or more extreme than what was actually observed. In this case, the critical region is in the left-tail (\(\sigma^{2}_{0} < \sigma^{2}_{1}\)), so the p-value is \(P(T < t~\vert~H_{0} \text{ is true})\) where \(t\) is the observed statistic.↩︎

Question 4

Suppose you wish to conduct an experiment to gauge whether male and female Leonberger dogs have different rates of hip dysplasia9. Given you have access to a random sample of Leonbergers where 4 out of 20 males and 2 out of 25 females have hip dysplasia, answer the following questions

  1. Conduct a generalised likelihood ratio test10. to consider the evidence that the probability of disease occurrence is the same for male and female Leonbergers. Use Wilk’s theorem to approximate the distribution of the ratio.

  2. What is the \(p\)-value of the test?

  3. Assuming the MLEs are the true parameter values, conduct a simulation study to approximate the power of this (asymptotic) hypothesis test.

  4. Imagine you wish to conduct another sample of Leonberger dogs to test for the occurrence of dysplasia. By adapting the simulation in c. can you determine the sample size \(n\) (number of dogs) required for the power of the test to be 0.9? Assume that you will recruit an equal number of male and female dogs.

Wee jock and Loulou with deflated ball.

Wee jock and Loulou with deflated ball.


  1. Hip dysplasia is a serious condition in many giant dog breeds, see here, but the occurrence has been reduced in several types of dogs by careful breeding.↩︎

  2. Hint: Assume a Bernoulli probability distribution for occurrence of the disease.↩︎

MXB341 Worksheet 4 - Hypothesis testing