Science

Publication Search Results

Now showing 1 - 3 of 3
  • (2022) Nguyen, Robert
    Thesis
    Data-driven decision making is everywhere in the modern sporting world. The most well-known example of this is the Moneyball movement in Major League Baseball (MLB), which built on research by Sherri Nichols in the 1980s, but sport analytics has also driven major changes in strategy in basketball, the National Football League, and soccer. In Australia, sports analytics has not had quite the same influence in its major domestic codes. In this thesis, we develop tools to assist the analytics community in two major Australian commercial sports. For Australian Rules Football, the largest commercial sport in Australia, data was not readily accessible for the national competition, the Australian Football League (AFL). Data access is fundamental to data analysis, so this has been a major constraint on the capacity of the AFL analytics community to grow. In this thesis, this issued is solved by making AFL data readily accessible through the R package fitzRoy. This package has already proven to be quite successful and has seen uptake from the media, fans, and club analysts. Expected points models are widely used across sports to inform tactical decision making, but as currently implemented, they confound the effects of decisions on points scored and the situations that the decisions tend to be made in. In Chapter 3, a new expected points approach is proposed, which conditions on match situation when estimating the effect of decisions on expected points. Hence we call this a conditional Expected Points (cEP) model. Our cEP model is used to provide new insight into fourth Down (NFL) decision-making in the National Football League, and decision-making when awarded a penalty in Rugby League. The National Rugby League (NRL) is the leading competition of Australia’s second largest commercial sport it is played on a pitch that is 100m long and 70m wide, and the NRL have provided us with detailed event data from the previous five seasons, used in academic research for the first time in this thesis. We found that NRL teams should kick for goal from penalties much more often than is currently the case. In Chapter 4 we develop a live probability model for predicting the winner of a Rugby League game using data that is collected live. This model could be used by the National Rugby League during broadcasts to enhance their coverage by reporting live win probabilities. While most live probability models are constructed using scores only, the availability of live event data meant we could investigate whether models constructed using event data have better predictive performance. We were able to show that in addition to score differential that the addition of covariates such as missed tackles can improve the prediction. Clubs use their own domain knowledge to test their own live win probability theories with the R scripts that are provided to the NRL

  • (2022) Yang, Yu
    Thesis
    Research in computational statistics develops numerically efficient methods to estimate statistical models, with Monte Carlo algorithms a subset of such methods. This thesis develops novel Monte Carlo methods to solve three important problems in Bayesian statistics. For many complex models, it is prohibitively expensive to run simulation methods such as Markov chain Monte Carlo (MCMC) on the model directly when the likelihood function includes an intractable term or is computationally challenging in some other way. The first two topics investigate models having such likelihoods. The third topic proposes a novel model to solve a popular question in causal inference, which requires solving a computationally challenging problem. The first application is to symbolic data analysis, where classical data are summarised and represented as symbolic objects. The likelihood function of such aggregated-level data is often intractable as it usually includes a high dimensional integral with large exponents. Bayesian inference on symbolic data is carried out in the thesis by using a pseudo-marginal method, which replaces the likelihood function with its unbiased estimate. The second application is to doubly intractable models, where the likelihood includes an intractable normalising constant. The pseudo-marginal method is combined with the introduction of an auxiliary variable to obtain simulation consistent inference. The proposed algorithm offers a generic solution to a wider range of problems, where the existing methods are often impractical as the assumptions required for their application do not hold. The last application is to causal inference using Bayesian additive regression trees (BART), a non-parametric Bayesian regression technique. The likelihood function is complex as it is based on a sum of trees whose structures change dynamically with the MCMC iterates. An extension to BART is developed to estimate the heterogeneous treatment effect, aiming to overcome the regularisation-induced confounding issue which is often observed in the direct application of BART in causal inference.

  • (2022) Balnozan, Igor
    Thesis
    This thesis explores the development and novel application of linear panel data methods that use latent grouping variables in the modelling of time-varying unobservable heterogeneity. The methods are tailored for use in microeconomic applications with observational panel data by: a) controlling for individual-specific intercepts; b) focusing on the economic interpretability of the time-varying heterogeneity component of the models; c) addressing the problem of estimating unknown group memberships across an unknown number of latent groups. The most general model studied also allows for latent group structures in the partial effects of observed covariates, where groups in the covariate effects can be independent from groups in the unobservable heterogeneity. Classical and Bayesian statistical methodologies are considered, with the main methodological contributions being in the development of Bayesian approaches. For the kinds of applications studied, the Bayesian methods are shown to have more favourable properties, both in principle and in practice. Empirical applications to retirement decumulation and smoking policy in Australia demonstrate how the methods developed in this thesis may be used to learn about economically meaningful latent behavioural patterns across a range of applications.