Bootstrapping linear models, and its applications.

Download files
Access & Terms of Use
open access
Copyright: Tsang, Lester Hing Fung
The bootstrap is a computationally intensive data analysis technique. It is particularly useful for analysing small datasets, and for estimating the sampling distribution of a statistic when it is intractable. We focus on bootstrap hypothesis testing of linear models. In this context, at present, various versions of the bootstrap are available, and it is not entirely clear from the literature which method is optimal for each situation. The existing literature on bootstrapping linear models was reviewed, and three “rules'' were found in the literature. We confirmed these via simulation. We also identified two outstanding issues. Firstly, which variance estimator should be used when constructing a bootstrap test statistic? Secondly, if resampling residuals, should this be done using the model that was fitted under the null hypothesis (“null model'') or under the alternative hypothesis (“full model'')? To our knowledge, these two questions have not been previously addressed. We provided theoretical results to answer these questions, and subsequently confirmed these via simulation. Our simulations were designed to evaluate both the size and (size-adjusted) power characteristics of the proposed bootstrap schemes. We proposed the use of a sandwich variance estimator for case and score resampling, rather than the naive statistic that is commonly used in practice. Via simulation, we showed that bootstrap test statistics using the sandwich estimator tend to have superior Type I error for case and score resampling, but there was still an issue of which estimator (naive or sandwich) to use for the observed test statistic (t). Best results were achieved when using t-naive for score resampling and t-sandwich for case resampling. One possible explanation for this result is that score resampling conditions on X whereas case resampling does not, and instead treats X as random. We also studied full versus null model residual resampling. We showed that null model resampling has better Type I error in theory, having an asymptotic correlation of one with a "true bootstrap'' procedure, analogous to a result derived in the permutation testing case by Anderson and Robinson (2001). However in practice, this superiority holds only for non-pivotal statistics: for pivotal statistics, both null and full model resampling had accurate Type I error, a discrepancy which we were able to explain theoretically.
Persistent link to this record
Link to Publisher Version
Additional Link
Tsang, Lester Hing Fung
Warton, David
Galbraith, Sally
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
Resource Type
Degree Type
Masters Thesis
UNSW Faculty
download whole.pdf 715.15 KB Adobe Portable Document Format
Related dataset(s)