Department of Statistics


Seminars

The health data paradox - simultaneously simple and complex

Speaker: Pernille Christensen

Affiliation: Noted

When: Wednesday, 26 March 2025, 11:00 am to 12:00 pm

Where: 303-310

Ask any one person involved with health data what they need it for, and they’ll likely be able to give you a straightforward, simple answer. These could be to know the proportion of the population struggling with mental health, to see if the improvement project had the intended effect, to meet funding requirements, to know the caseload and allocate resources, to communicate with a team about an individual patient, and so on.

Individually simple and straightforward. Each of these individual purposes, however, is linked together in an intricate, interconnected, and highly complex web. They are linked together by issues such as limitations in time and resources for data collection, differing priorities, siloed, rigid data systems, questions on ownership, privacy, and ethics rules. The complexity is further compounded by deeper questions such as: is the data truthful? Does it tell the whole story? Is it durable when the political scene changes or new discoveries are made?

In this session, we will explore this complex web from different points of view of the types of people involved with health data and question whether the complexities naturally occur or are introduced.

Biography:

As a data architect at Noted, I design and build data warehousing and business insight tools for Noted’s customers - giving them valuable insights into their business, supporting them in achieving the best health outcomes for their clients.

I have worked in the health sector for more than two decades, in both Denmark and New Zealand, and I hold a medical degree and PhD in Health and Medical sciences from the University of Copenhagen, Denmark, and a Masters degree in Applied Statistics from Pennsylvania State University, USA. My work includes clinical work as a medical doctor and researcher, with international research collaborations and scientific task force positions, and non-clinical work related to biostatistics, health intelligence and machine learning working with diverse data sets from national health surveys, complex research data, health systems data, and data related to wellbeing in general.

Practice to Research to Practice: My journey as a statistics teacher and statistics teacher educator

Speaker: Stephanie Casey

Affiliation: Eastern Michigan University

When: Wednesday, 2 April 2025, 11:00 am to 12:00 pm

Where: 303-310

ABSTRACT: In this talk, I will be sharing my journey from a high school teacher to a statistics teacher educator at the university level. My focus will be on how I’ve turned my teaching experiences as a practicing teacher into research efforts, and then the research results into products used by teacher educators to improve the preparation of teachers to teach statistics.

BIO: Dr. Stephanie Casey is a Professor of Mathematics Education at Eastern Michigan University, USA. She is a 2025 Fulbright Scholar, where she is researching students' interpretations of modern, big data visualizations in collaboration with the University of Canberra's STEM Education Research Centre (SERC). Her research focuses on the teaching and learning of data science and statistics, motivated by her experience teaching secondary mathematics for fourteen years. She has co-authored two sets of statistics teacher education curriculum materials that are widely used with preservice secondary STEM teachers throughout the United States.

Link: https://sites.google.com/site/stephaniecaseymath/

Top
Modeling Population-Scale Commuting Patterns in New Zealand

Speaker: Michael J. Kane

Affiliation: MD Anderson Cancer Center, The University of Texas

When: Wednesday, 5 March 2025, 11:00 am to 12:00 pm

Where: 303-310

Abstract : Human mobility patterns reveal how individuals—and populations—navigate spatial environments, offering critical insights for urban planning and transportation policy, among others. In this talk, we explore two modeling approaches applied to New Zealand’s 2018 census data to capture the dynamics of commuting behavior across Statistical Area 2 (SA2) regions. The first model identifies “loci” within the commuting network—locations that exhibit disproportionately high rates of both destination and transit movement. By analyzing the interconnectivity between SA2 areas, this approach reveals the hubs and corridors that shape everyday commuting patterns. The second model leverages an attention-based architecture, inspired by techniques used in large language models, to encode individual commuting trajectories. This model not only assesses the likelihood of a given sequence of locations but also enables the synthesis of plausible new trajectories. By capturing the dependencies in movement sequences, the attention model provides a powerful tool for predicting and simulating commuting behaviors.

Mathematical Reasoning over Multimodal Large Language Models

Speaker: Shuangyan Deng

Affiliation: UoA

When: Wednesday, 19 February 2025, 2:00 pm to 3:00 pm

Where: 303-310

Abstract: Mathematical reasoning is a fundamental challenge in artificial intelligence, particularly for Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), which must integrate textual and visual data to solve complex problems. Despite recent advancements, MLLMs still face significant challenges, including the lack of comprehensive benchmarks, the inability to learn effectively from reasoning errors, and difficulties in leveraging external tools for mathematical problem-solving. This research addresses three key problems: (1) evaluating MLLMs' mathematical reasoning capabilities through FinMR, a novel multimodal benchmark designed for financial reasoning tasks; (2) improving reasoning accuracy using in-context learning (ICL) with AI-generated error feedback, enabling models to refine their problem-solving strategies based on prior mistakes; and (3) developing a tool-integrated planner that allows MLLMs to dynamically utilize external computational tools for solving complex mathematical problems. By introducing these innovations, this study enhances the reasoning ability of MLLMs and paves the way for more reliable and interpretable AI-driven mathematical reasoning. Future work will focus on expanding dataset diversity, refining multimodal learning strategies, and improving the alignment between textual and visual modalities for greater accuracy and generalization.

This is a PYR seminar.

Image-Derived Phenotypes in Whole-Body MRI

Speaker: Brandon Whitcher

Affiliation: University of Westminster, London

When: Wednesday, 19 February 2025, 11:00 am to 12:00 pm

Where: 303-310

Population imaging studies, like the UK Biobank, provide us with an unprecedented amount of medical imaging data. At the Research Centre for Optimal Health we have focused on the abdominal protocol. Multiple data analysis pipelines have been developed to extract a wide variety of quantitative features from these data, what we call image-derived phenotypes (IDPs). The IDPs are then used to describe the UK adult population, investigate diseases, and provide input into genome-wide association studies with our collaborators.

Bio

Brandon Whitcher has spent the last 20+ years focused on quantitative imaging biomarkers for clinical, pharmaceutical and wellness applications. He has been employed in the pharmaceutical industry and a variety of startup companies. For the last seven years Dr Whitcher has been a member of the Research Centre for Optimal Health at The University of Westminster, London, UK. Since 2023 he divides his time between the university and as an NHS employee in the Radiology Department at the Royal Marsden Hospital, South London.

Exploring Deep Learning Techniques for Subtype Classification Modeling in Alzheimer's Disease

Speaker: Xiaoyan Sun

Affiliation: Department of Statistics, University of California Irvine

When: Wednesday, 22 January 2025, 10:00 am to 11:00 am

Where: 303-310

The growing prevalence of Alzheimer's disease poses a significant healthcare challenge, particularly with the ageing global population. Understanding disease subtypes through multi-omics data integration is pivotal for advancing personalised medicine and improving patient outcomes. However, existing methods often struggle with high-dimensional data, heterogeneous information integration, and effective disease subtype classification. In this study, we propose a novel deep learning framework designed to address three core challenges in multi-omics analysis: dimensionality reduction, data integration, and subtype classification. The framework employs a multi-head attention mechanism for feature selection, capturing complex relationships in high-dimensional data. It integrates Graph Convolutional Networks (GCN) for robust data fusion, while leveraging contrastive learning to enhance subtype classification accuracy. The framework effectively handles complex, high-dimensional data, addressing challenges of missing data and heterogeneity while capturing both global and local patterns in multi-omics data, all while maintaining interpretability.

(This is a talk for Confirmation Review.)

Teaching and Learning Bayesian Statistics with {bayesrules}

Speaker: Mine Dogucu

Affiliation: Department of Statistics, University of California Irvine

When: Wednesday, 11 December 2024, 4:00 pm to 5:00 pm

Where: 303-310

Abstract: Bayesian statistics is becoming more popular in data science. Data scientists are often not trained in Bayesian statistics and if they are, it is usually part of their graduate training. During this talk, we will introduce an introductory course in Bayesian statistics for learners at the undergraduate level and comparably trained practitioners. We will share tools for teaching (and learning) the first course in Bayesian statistics, specifically the {bayesrules} package that accompanies the open-access Bayes Rules! An Introduction to Bayesian Modeling with R book. We will provide an outline of the curriculum and examples for novice learners and their instructors.

Speaker: Mine Dogucu, Associate Professor of Teaching and Vice Chair for Undergraduate Studies, Department of Statistics, University of California Irvine

Bio: Mine Dogucu is Associate Professor of Teaching and Vice Chair of Undergraduate Studies in the Department of Statistics at University of California Irvine. Her goal is to create educational resources for statistics and data science that are accessible physically and cognitively. Her work focuses on modern pedagogical approaches in the statistics curriculum, making data science education accessible, and undergraduate Bayesian education. She is the co-author of the book Bayes Rules! An Introduction to Applied Bayesian Modeling. She works on a few projects funded by the United States National Science Foundation and the National Institutes of Health. She writes blog posts about data, pedagogy, and data pedagogy at DataPedagogy.com.

Nonparametric Density Estimation for Compositional Data

Speaker: Jiajin (George) Xie

Affiliation: Department of Statistics, University of Auckland

When: Thursday, 28 November 2024, 12:00 pm to 1:00 pm

Where: 303-310

This study addresses the challenges of density estimation for compositional data, a type of data constrained to reflect relative proportions within a whole. Such data are prevalent across diverse fields, including microbiome analysis, geology, and machine learning. The research develops and evaluates nonparametric methods for high-dimensional compositional data density estimation, focusing on the mixture-based density estimation (MDE) approach. Two types of mixture components are explored: Gaussian distributions applied to log-ratio-transformed compositional data, which offer excellent flexibility, and Dirichlet distributions applied directly to compositions, effectively handling cases with zero values. The performance of these methods is assessed through simulation studies and compared with finite mixture and kernel density estimation techniques. Results demonstrate the superior accuracy and adaptability of the proposed methods in capturing intricate data structures across various scenarios.

(This is a PYR talk.)

Visualization and Analysis of Suicide Methods in Tokyo Using Interactive Graphs

Speaker: Takafumi KUBOTA

Affiliation: Tama University, Japan

When: Wednesday, 30 October 2024, 11:00 am to 12:00 pm

Where: 303-310

This study aims to visualize the trends in suicide methods in Tokyo, using Japan's regional suicide statistics to provide insights that can inform effective prevention strategies. Suicide is a significant social issue, and analyzing regional data can offer valuable perspectives for targeted interventions. The research focuses on visualizing the trends for different suicide methods by creating bar graphs, line charts, and choropleth maps. These visualizations are generated after data cleaning to clearly depict the occurrence and trends associated with each method.

The application is developed using the R packages shiny and plotly, enabling users to interactively explore the data. With shiny, users can select the items of interest, such as region,time period, or suicide method, from a menu, while plotly allows for the implementation of interactive graphs that dynamically update based on the selected parameters. This approach facilitates the identification of specific regional trends, such as railway suicides or jumps from high-rise buildings that are more prevalent in Tokyo.

Through the development and analysis of this application, the study aims to enhance the understanding of regional and method-specific suicide trends, providing recommendations for suicide prevention measures. The visualized data is expected to serve as a valuable tool for policymakers and researchers,contributing to the strengthening of suicide prevention efforts.

Test of clustering for Neyman-Scott processes

Speaker: Bethany Macdonald

Affiliation: Otago University

When: Wednesday, 23 October 2024, 11:00 am to 12:00 pm

Where: 303-310

Spatial point patterns can arise from a vast array of application areas including epidemiology, ecology and geoscience. A fundamental research question is whether the points within these patterns are independent or clustered. Somewhat surprisingly, there exists no formal statistical test for such a hypothesis. This is largely due to the long recognised fact that the likelihood of the Neyman-Scott process is intractable. Recent developments by Baddeley et al. (2022) have remedied this issue by reparametrising the Neyman-Scott model by cluster strength and cluster scale, where the Poisson process occurs when the cluster strength is zero. Using these developments, we establish a formal test of clustering for the Neyman-Scott process.

Bayesian and deep learning strategies for calibration and denoising in gravitational wave data analysis

Speaker: Ruiting Mao

Affiliation: Department of Statistics, University of Auckland

When: Thursday, 5 September 2024, 10:00 am to 11:00 am

Where: 303-B05

Bayesian statistical methods have played a pivotal role in signal detection and the physical parameter estimation of gravitational waveform models. The future space-based gravitational wave (GW) detector, the Laser Interferometer Space Antenna (LISA), which is sensitive to the millihertz frequency band, makes it possible to detect some promising sources of GWs. However, Bayesian inference for features of interest and noise characterization is often computationally expensive and subject to model misspecification with complex waveforms and nonstationary noise artifacts in the LISA data stream. Through this work, I will present the application of deep learning models to address these challenges inherent in LISA data analysis. Specifically, I will discuss two key issues: 1) Exploring calibration techniques to quantify and correct the approximation errors introduced by using computationally faster but less accurate waveform models in Bayesian parameter estimation, and 2) Investigating deep learning methods to fill in data gaps from the LISA data stream effectively.

(This is a PhD PYR talk.)

Engaging in, and teaching, ethical practice of statistics and data science

Speaker: Rochelle Tractenberg

Affiliation: Georgetown University, Washington DC

When: Tuesday, 23 July 2024, 2:00 pm to 3:00 pm

Where: 303-310

The American Statistical Association's Ethical Guidelines for Statistical Practice define "Statistical Practice" to include designing the collection of, summarizing, processing, analyzing, interpreting, or presenting, data; as well as model or algorithm development and deployment. The Guidelines are intended to support every individual who uses "statistical practice", irrespective of their level, training, degree or job title, to do so in an ethical way. When it comes to encouraging (and teaching) "ethical statistical practice", there are two dimensions that must be recognized:

(i) To practice ethically, i.e., execute each task in accordance with ethical practice standards (like the Guidelines); and

(ii) To identify, and respond to, unethical actions/requests.

In this talk we will explore how a Stakeholder Analysis can be used with the ASA Ethical Guidelines (or any guidance) to practice ethically, and teach ethical statistical practice. We will also consider an Ethical Reasoning paradigm that facilitates identifying and making an informed decision about responding to ethical dilemmas. This paradigm is also useful for both engaging in, and teaching, ethical statistical practice. Both of these tools will be examined in the context of a 7-task “statistics and data science pipeline", which itself can help instructors to reinforce student learning about the scientific method, the Problem, Plan, Data, Analysis, Conclusion cycle, and even the eight step UN-based Generic Statistical Business Process model which was developed to support "official statistics", a special case of statistical practice.

Bio: Rochelle Tractenberg is a tenured professor in the Department of Neurology, with appointments in Biostatistics, Bioinformatics & Biomathematics and Rehabilitation Medicine, at Georgetown University in Washington, DC. She is a multi-disciplinary research methodologist and ASA-accredited Professional Statistician (PStat®), as well as a cognitive scientist focused on higher education curriculum design and evaluation. Her clinical and translational work integrates theories and principles of statistics, psychometrics, and domain-specific measurement to problems of assessment and the determination of changes in cognition, brain aging, and other difficult-to-measure constructs, using qualitative and quantitative methods. She is also an internationally recognized expert on ethical statistics and data science practice, having published two books, Ethical Practice of Statistics and Data Science and Ethical Reasoning for a Data-Centered World, in 2022. In addition to ethical statistics and data science practice, she has also contributed to guidelines for ethical mathematical practice (US based) and particularly, on how to integrate ethical content into quantitative courses. She is developing a new edition of Ethical Practice of Statistics and Data Science, specifically for government settings (expected 2025) and is collaborating on a forthcoming UN Handbook on Ethical Practice in Official Statistics. Professor Tractenberg is an elected Fellow of the American Statistical Association, the International Statistics Institute, and the American Association for the Advancement of Science, and was nominated for the 2022 Einstein Foundation Award for Promoting Quality in Research. Each of these nominations highlighted her commitment to, and support for, ethical statistical practice and scientific stewardship.


Top


Please give us your feedback or ask us a question

This message is...


My feedback or question is...


My email address is...

(Only if you need a reply)

A to Z Directory | Site map | Accessibility | Copyright | Privacy | Disclaimer | Feedback on this page