Department of Statistics
2021 Seminars
Seminars by year: Current | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021
Speaker: Eileen Li
Affiliation: The University of Auckland
When: Tuesday, 14 December 2021, 2:00 pm to 3:00 pm
Where: https://auckland.zoom.us/j/94353887562?pwd=YkxoN2JMdVFDY0RxVUtlNUR0VGtmZz09
Linked administrative data can provide rich information on a wide range of outcomes, and its usage in on the rise both in New Zealand and internationally. The Integrated Data Infrastructure (IDI) is a database maintained by Statistics New Zealand (Stats NZ) and contains linked administrative data at individual level. In the absence of unique personal identifier, probabilistic record linkage is performed which unavoidably would evoke linkage errors. However, the majority of IDI analysis is completed without understanding, measuring or correcting for potential linkage bias. We aim to quantify linkage errors in the IDI and provide feasible approaches to adjust for linkage biases in IDI analysis. In this talk, I will briefly explain how linkage errors (false links and missed links) may occur in the IDI, followed by approaches on false link and missed link identification. Some key limitations will also be addressed.
Estimating the approximation error for the saddlepoint maximum likelihood estimateSpeaker: Godrick Maradona Oketch
Affiliation: The University of Auckland
When: Wednesday, 8 December 2021, 2:00 pm to 3:00 pm
Where: https://auckland.zoom.us/j/99638945519?pwd=eUx2RHBzVjhWY0JCTnc2T0JsMk1CUT09
Saddlepoint approximation to a density function is increasingly being used primarily because of its immense accuracy. A common application of this approximation is to interpret it as a likelihood function, especially when the true likelihood function does not exist or is intractable. The application aims at obtaining parameter estimates using the likelihood function based on saddlepoint approximation. This study examines the likelihood function (based on first and second-order saddlepoint approximation) to estimate the difference between the true but unknown maximum likelihood estimation (MLE) estimates and the saddlepoint-based MLEs. We propose an expression to estimate this difference (error) by computing the gradient of the neglected term in the first-order saddlepoint approximation. Then using common distributions whose true likelihood functions are known to perform confirmatory tests on the proposed error expression, we show that the results are consistent with the difference between the true MLEs and saddlepoint MLEs. These tests indicate that the proposed formula could complement simulation studies, which has been widely used to justify the accuracy of such saddlepoint MLEs.
Disease risk prediction using deep neural networksSpeaker: Xiaowen Li
Affiliation: The University of Auckland
When: Thursday, 2 December 2021, 2:00 pm to 3:00 pm
Where: https://auckland.zoom.us/j/94436685664?pwd=TWJMN2ZaQUJ2MFJ4NEwwQ2FkYndKZz09
Accurate disease risk prediction is an essential step towards precision medicine, an emerging model of health-care that tailors treatment strategies based on individual's profiles. The recent abundant genome-wide data provide unprecedented opportunities to systematically investigate complex human diseases. However, the ultra-high dimensionality and complex relationships between biomarkers and outcomes have brought tremendous analytical challenges. Hence, dimension reduction is crucial for analysing high-dimensional genomic data. Deep learning models are promising approaches for modelling features of high complexity, and thus they have the potential to offer a unified approach in efficiently modelling diseases with different underlying genetic architectures. The overall objective of this project is to develop a hybrid deep neural network with multi-kernel Hilbert-Schmidt independence Criterion-Lasso (MK-HSIC-lasso) incorporated to efficiently select important predictors from ultra-high dimensional genomic data and model their complex relationships, for risk prediction analysis on high-dimensional genomic data.
Accessing 'grid' from 'ggplot2'Speaker: Paul Murrell
Affiliation: The University of Auckland
When: Thursday, 18 November 2021, 3:00 pm to 4:00 pm
Where: https://auckland.zoom.us/j/97611444131?pwd=T3kvdnhvMWRlNjhyZkErTjlmZGIxdz09
The 'ggplot2' package for R is a very popular package for producing statistical plots (in R). 'ggplot2' provides a high-level interface that makes it easy to produce complex images from small amounts of R code. The 'grid' package for R is an unpopular package for producing arbitrary images (in R). 'grid' provides a low-level interface that requires a lot of work to produce complex images. However, 'grid' provides complete control over the fine details of an image. 'ggplot2' uses the low-level package 'grid' to do its drawing so, in theory, users should be able to get the best of both worlds. This talk will discuss the surprising fact that 'ggplot2' users cannot easily get the best of both worlds and it will introduce the 'gggrid' package, which is here to save the day (and both worlds).
Applications of scoring rulesSpeaker: Matthew Parry
Affiliation: University of Otago
When: Thursday, 21 October 2021, 3:00 pm to 4:00 pm
Where: https://auckland.zoom.us/j/91629786490?pwd=WkpLZGxFTDZQR1JNKzRUNHhzamN6UT09
Suppose you publicly express your uncertainty about an unobserved quantity by quoting a distribution for it. A scoring rule is a special kind of loss function intended to measure the quality of your quoted distribution when an outcome is actually observed. In statistical decision theory, you seek to minimise your expected loss. A scoring rule is said to be proper if the expected loss under your quoted distribution is minimised by quoting that distribution. In other words, you cannot game the system!
In addition to having a rich theoretical structure – for example, associated with every scoring rule is an entropy and a divergence function – scoring rules can be tailored to the problem at hand and consequently have a wide range of application. They are used in statistical inference, for evaluating and ranking of forecasters, for assessing the quality of predictive distributions, and in exams.
I will talk about a range of scoring rules and discuss their application in areas such as classification and time series. In addition to so-called local scoring rules that do not depend on the normalisation of the quoted distribution, I will also discuss recently discovered connections between scoring rules and the Whittle likelihood.
Current state and prospects of R-package for the design of experimentsSpeaker: Emi Tanaka
Affiliation: Monash University
When: Thursday, 14 October 2021, 3:00 pm to 4:00 pm
Where: https://auckland.zoom.us/j/98830888312?pwd=QzR2aUUra3dMTnpWWHdBamhVQ0YvZz09
The critical role of data collection is well captured in the expression "garbage in, garbage out" -- in other words, if the collected data is rubbish then no analysis, however complex it may be, can make something out of it. The gold standard for data collection is through well-designed experiments. Re-running an experiment is generally expensive, contrary to statistical analysis where re-doing it is generally low-cost; there's a higher stake in getting it wrong for experimental designs. But how do we design experiments in R? In this talk, I will review the current state of R-package for the design of experiments and present my prototype R-package {edibble} that implements a framework that I call the "grammar of experimental design".
--
Dr. Emi Tanaka is a lecturer in statistics at Monash University whose primary interest is to develop impactful statistical methods and tools that can readily be used by practitioners. Her research area includes data visualisation, mixed models and experimental designs, motivated primarily by problems in bioinformatics and agricultural sciences. She is currently the President of the Statistical Society of Australia Victorian Branch and the recipient of the Distinguished Presenter's Award from the Statistical Society of Australia for her delivery of a wide-range of R workshops.
Highly comparative time-series analysisSpeaker: Ben Fulcher
Affiliation: School of Physics, The University of Sydney
When: Thursday, 7 October 2021, 3:00 pm to 4:00 pm
Where: https://auckland.zoom.us/j/97319182132?pwd=b05Ld1U5bThGQVFiaEIrV25sZ01NQT09
Over decades, an interdisciplinary scientific literature has contributed myriad methods for quantifying patterns in time series. These methods can be encoded as features that summarize different types of time-series structure as interpretable real numbers (e.g., the shape of peaks in the Fourier power spectrum, or the estimated dimension of a time-delay reconstructed attractor). In this talk, I will show how large libraries of time-series features (>7k, implemented in the hctsa package) and time series (>30k, in the CompEngine database) enable new ways of of analyzing time-series datasets, and of assessing the novelty and usefulness of time-series analysis methods. I will highlight new open tools that we’ve developed to enable these analyses, and discuss specific applications to neural time series.
Merging Modal Clusters via Significance AssessmentSpeaker: Yong Wang
Affiliation: The University of Auckland
When: Thursday, 19 August 2021, 3:00 pm to 4:00 pm
Where: https://auckland.zoom.us/j/91913917140?pwd=T1d2Q1g5Nm9scVlWaUE5UDdYeDdIZz09
In this talk, I will describe a new procedure that merges modal clusters step by step and produces a hierarchical clustering tree. This is useful to deal with superfluous clusters and to reduce the number of clusters as often desired in practice. Based on some new properties we establish for Morse functions, the procedure merges clusters in a sequential manner without causing unnecessary density distortion. Each cluster is evaluated for its significance relative to the other clusters, using the Kullback-Leibler divergence or its log-likelihood approximation, by truncating the density for the cluster at an appropriate level. The least significant cluster is then merged into one of its adjacent clusters, using the novel concept of cluster adjacency we define. The resulting hierarchical clustering tree is useful for determining the number of clusters, as may be preferred by a specific user or in a general, meaningful manner. Numerical studies show that the new procedure handles well difficult clustering problems and often produces intuitively appealing and numerically more accurate clustering results, as compared with several other popular clustering methods in the literature.
Generally-altered, -inflated and -truncated regression, with application to heaped and seeped countsSpeaker: Thomas Yee
Affiliation: University of Auckland
When: Thursday, 22 July 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
A very common aberration in retrospective self-reported survey data is digit preference (heaping) whereby multiples of 10 or 5 upon rounding are measured in excess, creating spikes in spikeplots. Handling this problem requires great flexibility. To this end,
and for seeped data also, we propose GAIT regression to unify truncation, alteration and inflation simultaneously, e.g., to general sets rather than {0}. Models such as the zero-inflated
and zero-altered Poisson are special cases. Parametric and nonparametric alteration and inflation means our combo model has five types of 'special' values. Consequently it spawns
a novel method for overcoming underdispersion through general truncation by expanding out the support. Full estimation details involving Fisher scoring/iteratively reweighted least squares are presented as well as working implementations for three 1-parameter
distributions: Poisson, logarithmic and zeta. Previous methods to date for heaped data have been found wanting, however GAIT regression hold great promise by allowing the joint flexible
modelling of counts having absences, deficiencies and excesses at arbitrary multiple special values. Now it is possible to analyze the joint effects of alteration, inflation and truncation
on under- and over-dispersion. The methodology is now implemented in the VGAM R package available on CRAN.
Does It Add Up? Hierarchical Bayesian Analysis of Compositional DataSpeaker: Em Rushworth
Affiliation: University of Auckland
When: Thursday, 22 July 2021, 2:00 pm to 3:00 pm
Where: 303-B05
Compositional data is everywhere - in mineral analysis, demographics, and species abundance for example. However, despite the well-known difficulties when analysing such data, research into expanding the existing methods or exploring alternative approaches has been limited. The past five years has seen a resurgence of interest in the field popularised by the publication of Aitchison(1982) that uses a family of log-ratio transformations with traditional statistical methodology to analyse compositional data. Most recent publications using this methodology focus solely on application despite the innate limitations of log-ratios preventing wider adoption. This research aims to fill in many of the blanks, including studying the approaches outside the log-ratio transformation family, by proposing a consistent definition of compositional data regardless of approach, methodological developments to the consideration of zeroes and sparse data, and demonstrations of applications across multiple domains.
Bayesian hierarchical models are prominent in many of the domains considered in this research, such as ecology and movement studies, and provide a useful framework for considering compositional data. Despite consistent mentions in either Bayesian or compositional data modelling papers of the crossover between these two fields, there is very little literature and it remains largely unexplored. Leininger et al. (2013) successfully used a Bayesian hierarchical framework to model the presence of zeroes as a separate hierarchical level, but there has not been any research outside of the log-ratio transformation. This research will seek to present a Bayesian approach to compositional data analysis using hierarchical models, and hopefully, assist to make the field more accessible for future researchers.
Stationary distribution approximations for two-island and seed bank modelsSpeaker: Han Liang Gan
Affiliation: The University of Waikato
When: Tuesday, 29 June 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
In this talk we will discuss two-island Wright-Fisher models which are used to model genetic frequencies and variability for subdivided populations. One of the key components of the model is the level of migration between the two islands. We show that as the population size increases, the appropriate approximation and limit for the stationary distribution of a two-island Wright-Fisher Markov chain depends on the level of migration. In a related seed bank model, individuals in one of the islands stay dormant rather than reproduce. We give analogous results for the seed bank model, compare and contrast the differences and examine the effect the seed bank has on genetic variability. Our results are derived from a new development of Stein's method for the two-island diffusion model and existing results for Stein's method for the Dirichlet distribution.
iNZight, Surveys, and the IDISpeaker: Tom Elliott
Affiliation: Victoria University of Wellington
When: Tuesday, 1 June 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
iNZight was originally designed to teach students core data analysis skills without the need for coding. However, it is also a powerful research development tool, allowing researchers low on time, money, or both to quickly obtain simple (or advanced) statistics without having to learn to code or pay an expensive programmer/statistician to do the work for them. iNZight now handles survey designs natively (without even needing to specify the design!?) into all graphs, summaries, data wrangling, and modelling. iNZight also now features an add-on system, providing a simple way of extending the existing UI to unique problems, for example Bayesian small area demography. In this talk, I'll be discussing recent modifications and additions to iNZight, plus some other work I've been doing as a member of Te Rourou Tātaritanga (https://terourou.org), an MBIE-funded data science research group aiming to improve New Zealand's data infrastructure.
War StoriesSpeaker: Peter Mullins
Affiliation:
When: Tuesday, 25 May 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
A trip through 50 years of consulting, with brief “views” into a variety of consulting tasks I’ve been involved in.
Estimating Power Spectral Density Parameters of Stochastic Gravitational Wave Background for LISATBASpeaker: Petra Tang
Affiliation:
When: Monday, 24 May 2021, 1:00 pm to 2:00 pm
Where: 303-155
Complementary to electromagnetic waves, the detection of gravitational waves (GWs) can lead astrophysics to dive deeper to the understanding of our Universe. Only until the last decade the detections of GWs have become possible. As we expand our search we bring on more challenges. One of these challenges is how do we resolve stochastic gravitational wave background (SGWB). My research uses the Bayesian parametric algorithms to unfold the properties of GW signals. More specifically, my research estimates the power spectral density of mock SGWB signals for the Laser Interferometer Space Antenna (LISA) in the millimeter frequency band. In this talk I will discuss my computational models and some current results using Bayesian parametric models using a Python package PYMC3 and end with further research ideas.
Statistical Modelling and Machine Learning, and their conflicting philosophical basesSpeaker: Murray Aitkin
Affiliation: University of Melbourne
When: Friday, 21 May 2021, 2:00 pm to 3:00 pm
Where: MLT1/303-G23
The Data Science/Big Data/Machine Learning era is upon us. Some Statistics departments are morphing into Data Science departments. The new era is focussed on flexibility and innovation, not on models and likelihood. In this process the history of these developments has become obscure. This talk traces these developments back to the arguments between Fisher and Neyman over the roles of models and likelihood in statistical inference. For many flexible model-free analyses there is a model-based analysis in the background. We illustrate with examples of the bootstrap and smoothing.
Strengthening the evidence base for assuring trustworthiness of governmentSpeaker: Len Cook
Affiliation:
When: Tuesday, 18 May 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
A long-standing trust in public services is being challenged for a mix of reasons. This has brought to light a variety of long standing tensions that exist in the scope, development, evaluation and publication of evidence about public services in New Zealand. These include:
- A mantra of evidence-based policy which overshadows the critical importance of evidence-based process and practice.
- The nature of risks from relying on anecdote and rare events to influence policy change and practice that the paring back of evidence in the public domain helps create.
- A focus on institutional measures ignores the changing societal context and dynamics of population groups that is essential to assess generational change in well-being.
- Ignorance of the judicial and societal dimensions of proportionality
- The role of independent well-resourced third parties (Ombudsman, Judiciary, Auditor-General) for providing public confidence in trustworthiness of public services.
- The importance of evidence from social sciences, official statistics, operational research and continuous improvement in enabling sector change involving diverse autonomous agencies.
- The necessity for of government wide principles that provide common assurance of the integrity of research selection, methods, quality and release.
The presentation will extent the evidence framework developed by Superu by drawing on an analysis of the five reviews of Oranga Tamariki, as well as experiences in official statistics. It will include examples from the presenter’s study of the justice system.
A Platform for Large-scale Statistical Modelling using RSpeaker: Jason Cairns
Affiliation:
When: Tuesday, 18 May 2021, 11:00 am to 12:00 pm
Where: 303S-G75
The growing sizes of data sets make it increasingly challenging to fit models and perform analytics on a single computer. Distributed computing infrastructure and projects like Hadoop or Spark make it possible to leverage a large number of compute
nodes, but they offer only a limited set of tools and algorithms or their extendibility is limited by performance or programming requirements. R on the other hand provides a vast variety of efficient tools for statistical computing, but it is typically limited
to a single machine. The aim of this project is to leverage the versatility and power of R in a distributed environment like Hadoop by allowing R users to define and run complex distributed algorithms using R. This will allow statisticians to vastly expand
the space of models available for large-scale data modelling. As part of the project we will illustrate the use of such methodology by implementing distributed iterative models in R and applying it to real-world tasks on large-scale problems.
Marine spatial planning to conserve deep sea corals in the South Pacific High SeasSpeaker: Carolyn Lundquist
Affiliation: University of Auckland
When: Tuesday, 11 May 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
Decision support tools have been developed to facilitate spatial management planning, utilising various computational methods to select representative sets of priority areas to conserve biodiversity over extensive geographic areas, while at the same time minimising the cost to existing users. Here, I will discuss the use of the decision support tool Zonation to support a stakeholder process to design revised spatial management of the South Pacific high seas for a process initiated by the South Pacific Regional Fisheries Management Organisation (SPRFMO). The objective of the process was to reduce the impact of fisheries (primarily orange roughy) on deep sea corals and other vulnerable deep sea invertebrates. Through a series of workshops, stakeholders directly contributed to the decisions required to parameterise the tool, including the choice of datasets to represent the range of priorities of the stakeholders. Industry, environmental stakeholders and government representatives determined which taxa to include to represent Vulnerable Marine Ecosystems using predictive habitat suitability layers, the weighting of uncertainty in these layers, and other biodiversity layers reflecting 'rarity and uniqueness'. Industry provided layers to represent an index of value to the fishery that included a 'buffer zone' to allow for logistics of deploying gear. A 'naturalness' layer was developed to incorporate prior disturbance history and likelihood of recovery. Scenarios using Zonation allowed stakeholders to visualise implications of decisions, and calculate relative cost to industry and protection of corals for each spatial management option. Following a series of iterations, a final spatial management proposal was agreed by the stakeholder working group, and boundaries were adjusted slightly for practicality. The proposed spatial management plan was adopted by the SPRFMO Commission in January 2019, and resulted in closures of >2 million km2 of high seas to bottom trawling. Ongoing iterations with stakeholders as part of the annual SPRFMO work plan have assessed the fishery closures with respect to additional data and improvements in species models.
Bio:
Carolyn Lundquist, NIWA/UoA Joint Graduate School in Coastal and Marine Science
Associate Professor, School of Environment, University of Auckland
Principal Scientist, National Institute of Water & Atmospheric Research, Hamilton
Carolyn Lundquist moved to New Zealand in 2000, after obtaining a PhD in Ecology at the University of California, Davis, and a BSc in Marine Biology from UCLA. She holds a joint position as Principal Scientist in Marine Ecology at the National Institute of Water and Atmospheric Research (NIWA) in Hamilton and as Associate Professor at the School of Environment at the University of Auckland. She is an applied marine ecologist, providing scientific and social-scientific input to inform decision-making for coastal and ocean management at local, national, regional and international scales. She leads two projects for the Sustainable Seas National Science Challenge that are developing marine spatial planning tools to improve management of cumulative impacts in New Zealand’s ocean ecosystems. Other recent projects include management of mangroves and other coastal wetland habitats, reviewing impacts of climate change on the seafood sector, and the development of global biodiversity scenarios for IPBES (the biodiversity equivalent of IPCC).
Sparse factor analysis via decoupled shrinkage and selectionSpeaker: Beatrix Jones
Affiliation:
When: Tuesday, 4 May 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
Generating a posterior over a model designed to shrink parameters is often easier than generating a posterior that must traverse the space of sparse models, where many parameters are selected to be zero. Decoupled shrinkage and selection post-processes
any posterior sample to produce a sparser model, while maintaining an expected fit that lies within the upper C% of the posterior over predicted fits. We explore the use of decoupled shrinkage and selection in the context of sparse factor analysis, using bfa
(Murray, 2016) to generate the initial posterior. In the Gaussian setting, simulation studies show this approach is competitive with factor analysis using non-convex penalties, (Hirose and Yamamoto, 2015), but more easily extended to models where a latent multivariate normal underlies non-Gaussian marginals, e.g., multivariate probit models. We illustrate our findings with a moderate dimensional example, extracting “dietary patterns” (underlying factors) from a set of responses to a food frequency questionnaire. Conventionally, foods associated with a particular pattern are identified via an arbitrary cut-off on the factor loadings. Our analysis both accommodates the non-normality of this data and provides automatic selection of foods associated with a particular pattern by producing
a sparse loading matrix.
A spatial capture-recapture model to estimate call rate and population density from passive acoustic surveysSpeaker: Ben Stevenson
Affiliation: The University of Auckland
When: Tuesday, 27 April 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
Spatial capture-recapture (SCR) models are commonly used to estimate animal population density from detections and subsequent redetections of individuals across space. In particular, acoustic SCR models deal with detections of animal vocalisations across an array of acoustic detectors. Previously published acoustic SCR methods estimate call density (calls per unit space per unit time), requiring an independently estimated call rate from separate data to convert to animal density.
In this talk I will present a new acoustic SCR model that simultaneously estimates both call rate and animal density from the acoustic survey data alone, alleviating the additional fieldwork burden of physically locating and monitoring individual animals to establish an independent call-rate estimate. I will describe an application of our method to data collected on an acoustic survey of the Cape peninsula moss frog Arthroleptella lightfooti, comparing our approach to existing alternatives.
This is joint work with Paul van Dam-Bates, Callum Young, and John Measey. David Smith has kindly provided gummy frogs, which will be available as a mid-seminar snack.
Fitting mixed models to complex samples is surprisingly hardSpeaker: Thomas Lumley
Affiliation:
When: Tuesday, 20 April 2021, 3:00 pm to 4:00 pm
Where: MLT3/303-101
There is increasing use of data from already-existing multistage surveys in social and health sciences, and increasing use of mixed models (and Bayesian hierarchical models), and both these trends are a Good Thing. However, there is still no really satisfactory way to fit these models to these datasets. I will talk about why fitting even linear mixed models to someone else's two-stage samples is harder than one might expect, and discuss some existing approaches and some approaches under development. In particular, I will talk about estimating the census loglikelihood when the full random-effect precision matrix is available, and about using pairwise composite likelihood in more general settings. There will be a brief mention of the kākāpō, the only species to have had full-population genome sequencing. This talk includes joint work with Xudong Huang, Zoe Luo, Ben Stevenson, Stone Chen, Eric Morenz, Jon Wakefield, and Peter Gao.
Branching with detectionSpeaker: Zehua Zang
Affiliation:
When: Wednesday, 31 March 2021, 3:00 pm to 4:00 pm
Where: 303-B05
For decades, one of the most popular epidemiology problems is predicting an outbreak's population size and evaluating disease control efficiency. A common strategy used to study the issue is mathematical modelling with a stochastic process, and a
challenging problem that arises in this domain is the detection problem. In the real-world, the cluster size is related to the quarantine level (e.g. the detection rate). Compared to a low quarantine level (e.g. a low detection rate), a high quarantine level
(e.g. a high detection rate) will be more likely to detect a disease in their early spread help control the cluster size of disease in an outbreak. For this purpose, we shall discuss the behaviour of infection spread with detection problem in a modern probabilistic
way.
Natural language processing of clinical dataSpeaker: Charco Hui
Affiliation:
When: Thursday, 25 February 2021, 3:00 pm to 4:00 pm
Where: 303-G16
Clinical trials (CT) provide the most robust evidence that affects the policies and practices of a health care system. With a large amount of publicly available clinical trial reports, modern techniques can be applied to gain inference on existing
trials. In addition, utilizing the knowledge from existing trials, algorithms can be developed to automate statistical analysis for CTs, which can substantially reduce manual work and human errors. While massive amount of documents from previous trials provide unprecedented resources for knowledge learning and automation, consensus is lacking of 1) how to process unstructured CT documents to learn knowledge, and 2) how to design efficient algorithms for automation. Therefore, we propose to develop a Question Answering (QA) system to learn knowledge from existing CT documents with the consideration of their underlying structures, and further build an automation system to automate standard CT analysis.
Credible set estimationSpeaker: Kate Lee
Affiliation:
When: Tuesday, 23 February 2021, 3:00 pm to 4:00 pm
Where: 303-G16
Estimating a joint Highest Posterior Density credible set for a multivariate posterior density is challenging as dimension gets larger. We demonstrate how to estimate joint Highest Posterior Density credible sets for density estimation trees given
by Li et al. (2016) and, use a consistent estimator to measure of the symmetric difference between our credible set estimate and the true HPD set. This quality measure can be computed without the need to know the true set. We illustrate our methods with simulation
studies and find that our estimator is competitive with existing methods.