Department of Statistics
Seminars
Speaker: Mine Dogucu
Affiliation: Department of Statistics, University of California Irvine
When: Wednesday, 11 December 2024, 4:00 pm to 5:00 pm
Where: 303-310
Abstract: Bayesian statistics is becoming more popular in data science. Data scientists are often not trained in Bayesian statistics and if they are, it is usually part of their graduate training. During this talk, we will introduce an introductory course in Bayesian statistics for learners at the undergraduate level and comparably trained practitioners. We will share tools for teaching (and learning) the first course in Bayesian statistics, specifically the {bayesrules} package that accompanies the open-access Bayes Rules! An Introduction to Bayesian Modeling with R book. We will provide an outline of the curriculum and examples for novice learners and their instructors.
Speaker: Mine Dogucu, Associate Professor of Teaching and Vice Chair for Undergraduate Studies, Department of Statistics, University of California Irvine
Bio: Mine Dogucu is Associate Professor of Teaching and Vice Chair of Undergraduate Studies in the Department of Statistics at University of California Irvine. Her goal is to create educational resources for statistics and data science that are accessible physically and cognitively. Her work focuses on modern pedagogical approaches in the statistics curriculum, making data science education accessible, and undergraduate Bayesian education. She is the co-author of the book Bayes Rules! An Introduction to Applied Bayesian Modeling. She works on a few projects funded by the United States National Science Foundation and the National Institutes of Health. She writes blog posts about data, pedagogy, and data pedagogy at DataPedagogy.com.
Nonparametric Density Estimation for Compositional DataSpeaker: Jiajin (George) Xie
Affiliation: Department of Statistics, University of Auckland
When: Thursday, 28 November 2024, 12:00 pm to 1:00 pm
Where: 303-310
This study addresses the challenges of density estimation for compositional data, a type of data constrained to reflect relative proportions within a whole. Such data are prevalent across diverse fields, including microbiome analysis, geology, and machine learning. The research develops and evaluates nonparametric methods for high-dimensional compositional data density estimation, focusing on the mixture-based density estimation (MDE) approach. Two types of mixture components are explored: Gaussian distributions applied to log-ratio-transformed compositional data, which offer excellent flexibility, and Dirichlet distributions applied directly to compositions, effectively handling cases with zero values. The performance of these methods is assessed through simulation studies and compared with finite mixture and kernel density estimation techniques. Results demonstrate the superior accuracy and adaptability of the proposed methods in capturing intricate data structures across various scenarios.
(This is a PYR talk.)
Visualization and Analysis of Suicide Methods in Tokyo Using Interactive GraphsSpeaker: Takafumi KUBOTA
Affiliation: Tama University, Japan
When: Wednesday, 30 October 2024, 11:00 am to 12:00 pm
Where: 303-310
This study aims to visualize the trends in suicide methods in Tokyo, using Japan's regional suicide statistics to provide insights that can inform effective prevention strategies. Suicide is a significant social issue, and analyzing regional data can offer valuable perspectives for targeted interventions. The research focuses on visualizing the trends for different suicide methods by creating bar graphs, line charts, and choropleth maps. These visualizations are generated after data cleaning to clearly depict the occurrence and trends associated with each method.
The application is developed using the R packages shiny and plotly, enabling users to interactively explore the data. With shiny, users can select the items of interest, such as region,time period, or suicide method, from a menu, while plotly allows for the implementation of interactive graphs that dynamically update based on the selected parameters. This approach facilitates the identification of specific regional trends, such as railway suicides or jumps from high-rise buildings that are more prevalent in Tokyo.
Through the development and analysis of this application, the study aims to enhance the understanding of regional and method-specific suicide trends, providing recommendations for suicide prevention measures. The visualized data is expected to serve as a valuable tool for policymakers and researchers,contributing to the strengthening of suicide prevention efforts.
Test of clustering for Neyman-Scott processesSpeaker: Bethany Macdonald
Affiliation: Otago University
When: Wednesday, 23 October 2024, 11:00 am to 12:00 pm
Where: 303-310
Spatial point patterns can arise from a vast array of application areas including epidemiology, ecology and geoscience. A fundamental research question is whether the points within these patterns are independent or clustered. Somewhat surprisingly, there exists no formal statistical test for such a hypothesis. This is largely due to the long recognised fact that the likelihood of the Neyman-Scott process is intractable. Recent developments by Baddeley et al. (2022) have remedied this issue by reparametrising the Neyman-Scott model by cluster strength and cluster scale, where the Poisson process occurs when the cluster strength is zero. Using these developments, we establish a formal test of clustering for the Neyman-Scott process.
Bayesian and deep learning strategies for calibration and denoising in gravitational wave data analysisSpeaker: Ruiting Mao
Affiliation: Department of Statistics, University of Auckland
When: Thursday, 5 September 2024, 10:00 am to 11:00 am
Where: 303-B05
Bayesian statistical methods have played a pivotal role in signal detection and the physical parameter estimation of gravitational waveform models. The future space-based gravitational wave (GW) detector, the Laser Interferometer Space Antenna (LISA), which is sensitive to the millihertz frequency band, makes it possible to detect some promising sources of GWs. However, Bayesian inference for features of interest and noise characterization is often computationally expensive and subject to model misspecification with complex waveforms and nonstationary noise artifacts in the LISA data stream. Through this work, I will present the application of deep learning models to address these challenges inherent in LISA data analysis. Specifically, I will discuss two key issues: 1) Exploring calibration techniques to quantify and correct the approximation errors introduced by using computationally faster but less accurate waveform models in Bayesian parameter estimation, and 2) Investigating deep learning methods to fill in data gaps from the LISA data stream effectively.
(This is a PhD PYR talk.)
Engaging in, and teaching, ethical practice of statistics and data scienceSpeaker: Rochelle Tractenberg
Affiliation: Georgetown University, Washington DC
When: Tuesday, 23 July 2024, 2:00 pm to 3:00 pm
Where: 303-310
The American Statistical Association's Ethical Guidelines for Statistical Practice define "Statistical Practice" to include designing the collection of, summarizing, processing, analyzing, interpreting, or presenting, data; as well as model or algorithm development and deployment. The Guidelines are intended to support every individual who uses "statistical practice", irrespective of their level, training, degree or job title, to do so in an ethical way. When it comes to encouraging (and teaching) "ethical statistical practice", there are two dimensions that must be recognized:
(i) To practice ethically, i.e., execute each task in accordance with ethical practice standards (like the Guidelines); and
(ii) To identify, and respond to, unethical actions/requests.
In this talk we will explore how a Stakeholder Analysis can be used with the ASA Ethical Guidelines (or any guidance) to practice ethically, and teach ethical statistical practice. We will also consider an Ethical Reasoning paradigm that facilitates identifying and making an informed decision about responding to ethical dilemmas. This paradigm is also useful for both engaging in, and teaching, ethical statistical practice. Both of these tools will be examined in the context of a 7-task “statistics and data science pipeline", which itself can help instructors to reinforce student learning about the scientific method, the Problem, Plan, Data, Analysis, Conclusion cycle, and even the eight step UN-based Generic Statistical Business Process model which was developed to support "official statistics", a special case of statistical practice.
Bio: Rochelle Tractenberg is a tenured professor in the Department of Neurology, with appointments in Biostatistics, Bioinformatics & Biomathematics and Rehabilitation Medicine, at Georgetown University in Washington, DC. She is a multi-disciplinary research methodologist and ASA-accredited Professional Statistician (PStat®), as well as a cognitive scientist focused on higher education curriculum design and evaluation. Her clinical and translational work integrates theories and principles of statistics, psychometrics, and domain-specific measurement to problems of assessment and the determination of changes in cognition, brain aging, and other difficult-to-measure constructs, using qualitative and quantitative methods. She is also an internationally recognized expert on ethical statistics and data science practice, having published two books, Ethical Practice of Statistics and Data Science and Ethical Reasoning for a Data-Centered World, in 2022. In addition to ethical statistics and data science practice, she has also contributed to guidelines for ethical mathematical practice (US based) and particularly, on how to integrate ethical content into quantitative courses. She is developing a new edition of Ethical Practice of Statistics and Data Science, specifically for government settings (expected 2025) and is collaborating on a forthcoming UN Handbook on Ethical Practice in Official Statistics. Professor Tractenberg is an elected Fellow of the American Statistical Association, the International Statistics Institute, and the American Association for the Advancement of Science, and was nominated for the 2022 Einstein Foundation Award for Promoting Quality in Research. Each of these nominations highlighted her commitment to, and support for, ethical statistical practice and scientific stewardship.
Designing to Support Doing Data Science and Statistics in SchoolsSpeaker: Hollylynne Lee
Affiliation: NC State University
When: Tuesday, 16 July 2024, 4:00 pm to 5:00 pm
Where: 303-310
Abstract: The U.S. often looks to New Zealand for resources and research related to teaching and learning statistics. In this talk, Hollylynne will discuss two recent projects situated in the U.S. that are advancing the teaching and learning of statistics and data science for secondary schools. These projects have designed curricula and online professional learning experiences for teachers at all stages of their career, from undergraduate education through life-long learning as a practicing teacher. We collaborate with a team at CODAP to integrate advanced data experiences into classrooms. The presentation will have something for everyone related to research, design of educational materials, and ideas for secondary classrooms.
Bio: Dr. Hollylynne Lee is a Distinguished University Professor of Mathematics and Statistics Education in the STEM Education department at NC State University, Raleigh NC, USA. She is also a Senior Faculty Fellow at the Friday Institute for Educational Innovation where she directs the Hub for Innovation and Research in Statistics and Data Science Education (https://fi.ncsu.edu/teams/hirise/). With experience teaching in elementary, middle, and high school classrooms, she brings a depth of practical perspectives to her research, and ensures her research and designs of educational resources are directly applicable to teachers and students. Her current work includes a focus on teachers’ professional learning for teaching with data using tools like CODAP and transforming undergraduate teacher preparation related to teaching statistics and data science. She loves reading, kayaking, watching volleyball, spending time with family, and her dog and cat. https://ced.ncsu.edu/people/hstohl/
Investigating Statistical Literacy of Health Professionals in Papua New Guinea
Speaker: Deborah Kakis
Affiliation: UoA
When: Tuesday, 11 June 2024, 10:00 am to 11:00 am
Where: 303-310
Abstract:
In today’s healthcare landscape, where evidence-based practice is considered the gold standard, data and statistical literacy are important skills for healthcare professionals. These competencies enable practitioners to collect, store and manage medical data,
analyse data, interpret research findings, and make informed decisions. In Papua New Guinea (PNG), a developing nation with unique healthcare challenges, fostering these literacies becomes critical.
Healthcare professionals in PNG deal with data daily, whether patient records, public health data, or administrative information.
Ensuring the reliability and utility of this data for evidence-based practice requires strong data literacy skills to guarantee accurate
collection and storage, while statistical literacy enables practitioners to extract meaningful insights to inform their practice.
However, many challenges hinder healthcare professionals in PNG from developing strong foundations in these areas, leading to a lack of confidence in their data and statistical literacy skills, which are necessary for evidence-based practice.
To address this gap, this proposed study aims to assess the current data and statistical literacy levels among healthcare professionals in PNG. By evaluating their proficiency, we can identify areas for improvement and tailor target training programs accordingly. Enhancing statistical and data literacy equips healthcare professionals to evaluate treatment efficacy confidently, identify emerging trends, and actively contribute to evidence-based care.
This is the PYR seminar
Modern Variable Selection for Vector Generalized Linear ModelsSpeaker: Wenqi Zhao
Affiliation: UoA
When: Monday, 27 May 2024, 1:00 pm to 2:00 pm
Where: 303-257
Abstract:
The generalized linear model (GLM) is the framework in
statistics for modeling the relationship between a response variable
and one or more predictor variables, it is typically used to
fit random variables to linear regression to predict observations.
While GLMs offer relatively straightforward interpretation of
coefficients, they may not capture complex interactions or nonlinear
relationships in the data. Vector generalized linear models(VGLMs)
and vector generalized additive models (VGAMs) can greatly extend
GLMs, currently VGAM implements over 150 family functions, it has
a large flexible framework to vary model elements. Variable
selection is a crucial step in statistical modeling identifying the
most relevant observations for predicting the response variable.
In VGLM/VGAM framework, usually using the minimum value
of some information criterion (IC). Among such, the Akaike
IC (AIC) and Bayesian IC (BIC) are the most common.
VGAMs also can penalize regression splines using P-spline
smoothers, which we term ‘P-spline VGAMs’, however, fitting VGAMs
with penalized regression splines can be computationally intensive,
particularly when dealing with large datasets or high-dimensional
predictor spaces. When the variables are greater than the
observations,
In this project, we propose to combine elastic net and VGLM/VGAM
framework to create a new model selection method. Elastic net
regularization techniques can help prevent over#tting and
multidisciplinary. Elastic net can result in sparser models with fewer
predictors. This regularization path helps in identifying and handling
multicollinearity by favoring models with fewer predictors in
VGLM/VGAM framework.
This is the PYR seminar.
Childhood Risk and Resilience Factors for Pasifika Youth Respiratory Health: Accounting for Attrition and MissingnessSpeaker: Dawson Zhai
Affiliation: UoA
When: Friday, 24 May 2024, 1:00 pm to 2:00 pm
Where: 303-310
Abstract:
In New Zealand, 7% of deaths are related to respiratory diseases, with Pacific people at higher risk. Based on knowledge of lung development, lung function can be damaged in two ways: 1) Lung function reduction: early insults may lower the maximum lung function and/or accelerate its decline after the peak; 2) Predisposition to later respiratory disease: early disease raises the risk of later disease occurring. Conversely, some resilience factors can create beneficial effects on respiratory function and/or provide protection to stop subsequent respiratory diseases; among these factors are childhood levels of physical activity, smoke exposure, immunisation, housing conditions, and breastfeeding.
Using Pacific Island Family Study (PIFS) cohort data, this work will investigate the causal effects of identified early-life factors on early-adulthood lung function, quality of life and comorbidities. The PIFS cohort is a longitudinal cohort, the participants of which were enrolled at birth in Middlemore Hospital (n=1398) between March and December 2000. A respiratory assessment (n=466) was conducted within the cohort when participants were 18 years old. In this PIFS birth cohort respiratory study, the primary respiratory outcome was the z-score of the Forced Ejection Volume in 1 second (FEV1). Secondary outcomes consisted of FEV1 adjusted for height and sex; the healthy lung function (HLF) indicator, defined as the z-score exceeding -1.64; health-related and respiratory-health-related quality of life scores; and respiratory condition indicators. The attrition and missingness present in the group undergoing respiratory assessment will inform much of the analysis plan, as will the longitudinal character of the risk and protective factors and their confounders.
This is the PYR seminar.