Publication Search Results
Results per page
Now showing 1 - 4 of 4
(2022) Nguyen, RobertThesisData-driven decision making is everywhere in the modern sporting world. The most well-known example of this is the Moneyball movement in Major League Baseball (MLB), which built on research by Sherri Nichols in the 1980s, but sport analytics has also driven major changes in strategy in basketball, the National Football League, and soccer. In Australia, sports analytics has not had quite the same influence in its major domestic codes. In this thesis, we develop tools to assist the analytics community in two major Australian commercial sports. For Australian Rules Football, the largest commercial sport in Australia, data was not readily accessible for the national competition, the Australian Football League (AFL). Data access is fundamental to data analysis, so this has been a major constraint on the capacity of the AFL analytics community to grow. In this thesis, this issued is solved by making AFL data readily accessible through the R package fitzRoy. This package has already proven to be quite successful and has seen uptake from the media, fans, and club analysts. Expected points models are widely used across sports to inform tactical decision making, but as currently implemented, they confound the effects of decisions on points scored and the situations that the decisions tend to be made in. In Chapter 3, a new expected points approach is proposed, which conditions on match situation when estimating the effect of decisions on expected points. Hence we call this a conditional Expected Points (cEP) model. Our cEP model is used to provide new insight into fourth Down (NFL) decision-making in the National Football League, and decision-making when awarded a penalty in Rugby League. The National Rugby League (NRL) is the leading competition of Australia’s second largest commercial sport it is played on a pitch that is 100m long and 70m wide, and the NRL have provided us with detailed event data from the previous five seasons, used in academic research for the first time in this thesis. We found that NRL teams should kick for goal from penalties much more often than is currently the case. In Chapter 4 we develop a live probability model for predicting the winner of a Rugby League game using data that is collected live. This model could be used by the National Rugby League during broadcasts to enhance their coverage by reporting live win probabilities. While most live probability models are constructed using scores only, the availability of live event data meant we could investigate whether models constructed using event data have better predictive performance. We were able to show that in addition to score differential that the addition of covariates such as missed tackles can improve the prediction. Clubs use their own domain knowledge to test their own live win probability theories with the R scripts that are provided to the NRL
(2022) Yang, YuThesisResearch in computational statistics develops numerically efficient methods to estimate statistical models, with Monte Carlo algorithms a subset of such methods. This thesis develops novel Monte Carlo methods to solve three important problems in Bayesian statistics. For many complex models, it is prohibitively expensive to run simulation methods such as Markov chain Monte Carlo (MCMC) on the model directly when the likelihood function includes an intractable term or is computationally challenging in some other way. The first two topics investigate models having such likelihoods. The third topic proposes a novel model to solve a popular question in causal inference, which requires solving a computationally challenging problem. The first application is to symbolic data analysis, where classical data are summarised and represented as symbolic objects. The likelihood function of such aggregated-level data is often intractable as it usually includes a high dimensional integral with large exponents. Bayesian inference on symbolic data is carried out in the thesis by using a pseudo-marginal method, which replaces the likelihood function with its unbiased estimate. The second application is to doubly intractable models, where the likelihood includes an intractable normalising constant. The pseudo-marginal method is combined with the introduction of an auxiliary variable to obtain simulation consistent inference. The proposed algorithm offers a generic solution to a wider range of problems, where the existing methods are often impractical as the assumptions required for their application do not hold. The last application is to causal inference using Bayesian additive regression trees (BART), a non-parametric Bayesian regression technique. The likelihood function is complex as it is based on a sum of trees whose structures change dynamically with the MCMC iterates. An extension to BART is developed to estimate the heterogeneous treatment effect, aiming to overcome the regularisation-induced confounding issue which is often observed in the direct application of BART in causal inference.
(2022) Balnozan, IgorThesisThis thesis explores the development and novel application of linear panel data methods that use latent grouping variables in the modelling of time-varying unobservable heterogeneity. The methods are tailored for use in microeconomic applications with observational panel data by: a) controlling for individual-specific intercepts; b) focusing on the economic interpretability of the time-varying heterogeneity component of the models; c) addressing the problem of estimating unknown group memberships across an unknown number of latent groups. The most general model studied also allows for latent group structures in the partial effects of observed covariates, where groups in the covariate effects can be independent from groups in the unobservable heterogeneity. Classical and Bayesian statistical methodologies are considered, with the main methodological contributions being in the development of Bayesian approaches. For the kinds of applications studied, the Bayesian methods are shown to have more favourable properties, both in principle and in practice. Empirical applications to retirement decumulation and smoking policy in Australia demonstrate how the methods developed in this thesis may be used to learn about economically meaningful latent behavioural patterns across a range of applications.
Pairwise versus mutual independence: visualisation, actuarial applications and central limit theorems(2023) Boglioni Beaulieu, GuillaumeThesisAccurately capturing the dependence between risks, if it exists, is an increasingly relevant topic of actuarial research. In recent years, several authors have started to relax the traditional 'independence assumption', in a variety of actuarial settings. While it is known that 'mutual independence' between random variables is not equivalent to their 'pairwise independence', this thesis aims to provide a better understanding of the materiality of this difference. The distinction between mutual and pairwise independence matters because, in practice, dependence is often assessed via pairs only, e.g., through correlation matrices, rank-based measures of association, scatterplot matrices, heat-maps, etc. Using such pairwise methods, it is possible to miss some forms of dependence. In this thesis, we explore how material the difference between pairwise and mutual independence is, and from several angles. We provide relevant background and motivation for this thesis in Chapter 1, then conduct a literature review in Chapter 2. In Chapter 3, we focus on visualising the difference between pairwise and mutual independence. To do so, we propose a series of theoretical examples (some of them new) where random variables are pairwise independent but (mutually) dependent, in short, PIBD. We then develop new visualisation tools and use them to illustrate what PIBD variables can look like. We showcase that the dependence involved is possibly very strong. We also use our visualisation tools to identify subtle forms of dependence, which would otherwise be hard to detect. In Chapter 4, we review common dependence models (such has elliptical distributions and Archimedean copulas) used in actuarial science and show that they do not allow for the possibility of PIBD data. We also investigate concrete consequences of the 'nonequivalence' between pairwise and mutual independence. We establish that many results which hold for mutually independent variables do not hold under sole pairwise independent. Those include results about finite sums of random variables, extreme value theory and bootstrap methods. This part thus illustrates what can potentially 'go wrong' if one assumes mutual independence where only pairwise independence holds. Lastly, in Chapters 5 and 6, we investigate the question of what happens for PIBD variables 'in the limit', i.e., when the sample size goes to infi nity. We want to see if the 'problems' caused by dependence vanish for sufficiently large samples. This is a broad question, and we concentrate on the important classical Central Limit Theorem (CLT), for which we fi nd that the answer is largely negative. In particular, we construct new sequences of PIBD variables (with arbitrary margins) for which a CLT does not hold. We derive explicitly the asymptotic distribution of the standardised mean of our sequences, which allows us to illustrate the extent of the 'failure' of a CLT for PIBD variables. We also propose a general methodology to construct dependent K-tuplewise independent (K an arbitrary integer) sequences of random variables with arbitrary margins. In the case K = 3, we use this methodology to derive explicit examples of triplewise independent sequences for which no CLT hold. Those results illustrate that mutual independence is a crucial assumption within CLTs, and that having larger samples is not always a viable solution to the problem of non-independent data.