I offer a variety of research topics, some for applied students (typically involving simlation), and some for students with strong skills in complex statistical modelling.
Adventures in nonlinear and/or mixed analysis of count data:
This research is motivated by the analysis of fish counts from experiments designed to estimate the size (or species) selective performance of fish gears. A wide range of topics is available and suitable for BSc Hons project to PhD thesis. Potential topics include:
- How to weight individual hauls to account for catch size - this might largely be a simulation study of possible approaches.
- Bayesian implementation - currently, implementation is almost exclusively frequentist. There could be advantages to a Bayesian approach, especially when mixed effects are included.
- Exploration of the use of monotone splines.
- Use of the Tweedie distribution to analyse catch weights.
Spatio-temporal BACI models:
BACI (before-after-control-impact) and MBACI (multiple BACI) models are used to establish the causal effect of a potential impact. The impact could be natural (e.g., a storm event) or man-made (alteration of habitat; seismic underwater survey etc). The general objective would be to incorporate MBACI designs within a formal spatio-temporal model, implemented using recently developed Gaussian Markov Random Field (GMRF) tools available in R-INLA and/or TMB. [Suitable for PhD thesis.]
When can Taylor's variance power law be applied to real data?:
Taylor's power law (https://en.wikipedia.org/wiki/Taylor%27s_law) says that the variance of a count (or other measure of abundance, such as weight) is of the form var=a*mu^b, where mu is the expected value and a and b are parameters to be estimated.
General purpose model fitting software (such as Stan or TMB) enable Taylor's power law to be implemented for a wide range of distributions (e.g., negative binomial and lognormal) and classes of models. The question of interest is whether it provides a more parsimonious fit, or does it make the model overly complex or computationally unstable.
Using appropriate model performance criteria, this work will re-analyse existing datasets and perform simulation studies to determine when Taylor's law can safely be applied.BACI (before-after-control-impact) and MBACI (multiple BACI) models are used to establish the causal effect of a potential impact. The impact could be natural (e.g., a storm event) or man-made (alteration of habitat; seismic underwater survey etc). The general objective would be to incorporate MBACI designs within a formal spatio-temporal model, implemented using recently developed Gaussian Markov Random Field (GMRF) tools available in R-INLA and/or TMB. [Suitable for MSc project to PhD thesis.]
A better threshold for Cook's distance:
STATS20x uses a Cook's D threshold of 0.4 to label an observation as influential, but this is too low a threshold for small datasets and too high a threshold for large datasets. This project will use simulation to suggest a better threshold that depends on the number of observations (and possibly the number of parameters). The issue of false discover rate (FDR) may also be a consideration. [Suitable for BSc Hons project to MSc dissertation.]
Misuse of individual confidence intervals:
In the literature, some studies give a scatter plot that shows individual confidence intervals for each value of the x variable. However, these are often interpretted as a single simultaneous confidence region. The objective of this research is to use simulation to determine if this is a serious problem, and to compare different methods that have been proposed to produce simultaneous confidence regions. [Suitable for BSc Hons project to MSc dissertation.]
Analyses of popular games of chance in NZ:
The objective is to check for any lack of randomness in games-of-chance such as Lotto, Keno, Bullseye, Play3 etc. An important issue will be multiple comparisons since there are may be multiple hypotheses of interest, and multiple games examined (e.g., omnibus test of lotto ball probs, test of autocorrelation, single digit vs multiple digit, the four colours, unusual sequences, minimum range, change in equipment used. This work will be largely simulation based. [Suitable for BSc Hons project to MSc dissertation.]