Department of Statistics SeminarsCurrent | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011
On-line inference from data streams
Speaker: Dr Peter Clifford Affiliation: Oxford University When: Friday, 16 December 2005, 11:00 am to 12:00 pm Where: Rm 222, Science Centre Massive data streams are now common in many statistical applications, for example in monitoring and examining Internet traffic; recording and interpreting customer transactions (loyalty cards); analysing high-frequency financial data in market trading; voice and video capture; and data logging in areas of scientific enquiry. Markov Chain Monte Carlo (MCMC) methods revolutionised statistics in the 1990s by providing practical, computationally feasible access to the flexible and coherent framework of Bayesian inference. However, massive datasets have produced difficulties for these methods since, with a few simple exceptions, MCMC implementations require a complete scan of what might be several gigabytes of data at each iteration of the MCMC algorithm. The development of practical Bayesian methodology for the analysis of massive datasets is one of the outstanding challenges of modern statistics. In this talk I will discuss promising new approaches to the problem: specifically the application of particle filter methods and the concept of data sketching. Exact likelihood calculation for a class of Cox processesSpeaker: Dr Peter Clifford Affiliation: Oxford University When: Thursday, 15 December 2005, 4:00 pm to 5:00 pm Where: Rm 222, Science Centre In his invited address to the European Meeting of Statisticians in Uppsala in 1992, Sir David Cox identified a number of challenging research topics which were in urgent need of investigation. One of the problems was to devise a computationally efficient method of evaluating the likelihood of point data for a doubly stochastic point process driven by a diffusion, i.e. an inhomogeneous Poisson process whose intensity is a diffusion depending on a number of unknown parameters. In this talk I will show that for an important class of diffusions, the likelihood can be rapidly obtained by exploiting an equivalence between the point process and the sequence of death times in a simple immigration-birth-death process. The general equivalence result is surprising since it provides a theoretical link between two of the most basic stochastic processes. A proof of the equivalence is given along with an outline of the computational procedure. (CANCELLED) Penalized Spline Smoothing and Generalized Linear Mixed Models - Some Theory and ApplicationsSpeaker: Professor Goeran Kauermann Affiliation: University of Bielefeld / University of New South Wales When: Thursday, 24 November 2005, 4:00 pm to 5:00 pm Where: Rm 222, Science Centre This seminar has been cancelled as the speaker has had to return home. Penalized spline fitting as smoothing technique has become more and more popular over the last years. Labelled by Eilers & Marx (1996, Statistical Science) with the phrase P-spline smoothing, the recent book by Ruppert, Wand & Carroll (2003, Semiparametric Regression, Cambridge University Press) has shown the real practical benefits of this approach. In particular, it is the link between smoothing and Linear Mixed Models which makes the procedure so attractive, both, in terms of theory and practice. In fact, P-spline estimates are equivalent to posterior Bayes estimates in a Linear Mixed Model where the spline basis coefficients are treated as normally distributed. This connection can be exploited for smoothing parameter selection. The presentation shows some theoretical results in this matter. A particular focus is on smoothing parameter selection in the presence of correlated residuals. It is well known that this is a delicate issue (see Opsomer, Wang & Yang, 2001, Statistical Science) and standard smoothing parameter selection routines tend to overfit the data. We show, however, that the link to Mixed Models circumvents this problem and data driven smoothing parameter selection works fine. Further results will be provided based on non normal models in a generalized regression model style. This in turn builds a connection to duration time models. Use of scaled rectangle diagrams to visualise categorical data, with particular reference to clinical and epidemiological dataSpeaker: Dr Roger Marshall Affiliation: School of Population Health, Epidemiology and Biostatistics, University of Auckland When: Thursday, 10 November 2005, 3:00 pm to 4:00 pm Where: Rm 231, Bldg 734, Tamaki In medical contexts, Venn diagrams are often used to symbolically display relationships between symptoms, signs, risk factors and diagnostic states. A scaled rectangle diagram is a Venn-like construction using rectangles rather than circles and with areas of rectangles and all intersecting areas approximately proportional to frequency. How to construct these diagrams is discussed. Their use will be illustrated by medical and epidemiological examples. Comparison with alternative methods will be made, in particular, mosaic plots. Colour Me ... CarefullySpeaker: Dr Ross Ihaka Affiliation: Dept. of Statistics, University of Auckland When: Thursday, 27 October 2005, 4:00 pm to 5:00 pm Where: Rm 222, Science Centre Technology has advanced to the point where it is very easy to use to use colour in data graphics. Unfortunately, this makes it easy to use colour badly. To use colour well, it is it is important to understand both the benefits and pitfalls of colour use. Such understanding needs to based on knowledge of how the human vision system works and to be informed by the results of experimentation and data collection. In this talk we'll examine some of the basics of colour theory and see how theory and experimentation can be used to determine how to use colour well (and badly) for data display. An understanding of these principles can make the (good) use of colour substantially easier. Statistical modelling of the natural selection processes acting on protein sequencesSpeaker: Dr Stéphane Guindon Affiliation: Bioinformatics Institute, University of Auckland When: Thursday, 13 October 2005, 4:00 pm to 5:00 pm Where: Rm 222, Science Centre The unambiguous footprint of Darwinian selection at the molecular level is revealed by comparing protein sequences collected among different species. Comparing proteins takes place in a probabilistic framework as the evolution of these sequences is usually approximated by a Markov process. Estimating the parameters these Markov models then helps us to better understand the way natural selection acts on proteins. I will first present some of these models, their assumptions, and their relevancy from a biological perspective. I will then focus on recent models of DNA evolution that generalise the previous ones. Examples that demonstrate the relevancy of this new approach will be examined. New Gaussian Mixture Techniques for Filtering and Smoothing of Discrete-Time Gauss-Markov Jump Markov SystemsSpeaker: Dr W. Paul Malcolm Affiliation: National I.C.T. Australia, ANU, Canberra, Australia When: Thursday, 13 October 2005, 3:00 pm to 4:00 pm Where: Rm 222, Science Centre In this seminar we extend the new state and mode estimation algorithms computed by Professors Robert J Elliott and Francois Dufour. The algorithm developed by Elliott and Dufour is distinct from extant methods, such as the so called Interacting Multiple Model algorithm (IMM) and Sequential Monte Carlo methods, in that it is precise; that is, their algorithm is based upon well defined approximations of the "exact" hybrid filter. To compute our smoothing algorithm, we exploit a duality between forwards and backwards (dual) dynamics. The natural framework to exploit this duality is the method of reference probability, whereby one chooses to work under a new, or 'reference' probability measure. Under this new measure, both the state process and the observation process are independently and identically distributed with Gaussian statistics, however, the Markov chain, whose state value fully determines the system dynamics, remains unchanged. A closed form expression is given for the smoothed estimate of state. An interesting feature of our smoother is it provides a new degree of freedom, that is, the product decomposition of the smoother density is approximated by mutually independent Gaussian mixtures. This means the chosen accuracy of 'the past',(influencing the smoother density), is independent of the chosen accuracy of 'the future', influencing the smoother density. To fix the memory requirements of our smoother we extend ideas based upon the so called K-best paths problem and the Viterbi algorithm. Since our smoothing algorithm depends upon its corresponding filter, we start by giving a review of the jump Markov system filter developed by Elliott and Dufour. This filter has been shown to significantly outperform the IMM in conventional object tracking scenarios and the more challenging bearings only maneuvering target tracking problem. This seminar is joint work with Prof. Robert J. Elliott(University of Calgary, Canada) and Prof. Francois Dufour (Universite Bordeaux, France). mailto:paul.malcolm@anu.edu.au Local sensitivity of Bayesian inference to priors and dataSpeaker: Associate Professor Russell Millar Affiliation: Dept. of Statistics, University of Auckland When: Thursday, 29 September 2005, 4:00 pm to 5:00 pm Where: Room 301.1060 (Geol 1060) Local sensitivity of Bayesian inference to priors and data Part 1: Prior InfluencePriors are seldom unequivocal and an important component of Bayesian modelling is assessment of the sensitivity of the posterior to the specified prior distribution. This is especially true in fisheries science where the Bayesian approach has been promoted as a rigorous method for including existing information from previous surveys and from related stocks or species. These informative priors may be highly contested by various interest groups. Here, formulae for the first and second derivatives of Bayes estimators with respect to hyper-parameters of the joint prior density are given. The formula for the second derivative provides a correction to a previously published result. The formulae are shown to reduce to very convenient and easily implemented forms when the hyper-parameters are for exponential family marginal priors. For model parameters with such priors it is shown that the ratio of posterior variance to prior variance can be interpreted as the sensitivity of the posterior mean to the prior mean. This methodology is applied to a non-linear state-space model for the biomass of South Atlantic albacore tuna and sensitivity of the maximum sustainable yield to the prior specification is examined. Part 2: Data InfluenceA geometric perturbation of likelihood terms is used to define a class of posteriors parameterized by observation (or group) weights. Kullback-Leibler divergence is used to quantify the difference between the baseline posterior and the perturbed posterior with altered weight given to observation /i/. The curvature of this divergence, evaluated at the baseline posterior, is shown to be the posterior variance of the log-likelihood of observation /i/, and is therefore a readily available measure of local case-influence. A second local measure of posterior change, the curvature of the Kullback-Leibler divergence between predictive densities, is seen to be the variance (over future observations) of the expected log-likelihood, and can easily be estimated using importance sampling. Analytical expressions are obtained for the linear regression model. The methodology is applied to a nonlinear state-space model of fish biomass, and used to determine the local influence of annual catch-rate observations on the predictive density of the current biomass. The final example examines the local influence of groups of repeated binary data in a behavioral study. The state of the art in the generation of efficient statistical designsSpeaker: Associate Professor David Whitaker Affiliation: Department of Statistics, University of Waikato When: Thursday, 15 September 2005, 4:00 pm to 5:00 pm Where: Rm 222, Science Centre The problem of generating efficient statistical designs can be formulated as a non-linear zero-one mathematical programming problem. Since the problem is essentially a combinatorial optimisation problem with a non-linear objective then as expected it turns out to be NP-hard. The talk focuses on evaluating the different methods adopted for solving this problem ranging from classical optimisation to steepest descent and meta-heuristics. References
Speaker: Dr Beatrix Jones Affiliation: Institute of Information & Mathematical Sciences, Massey University, Albany Camp When: Thursday, 11 August 2005, 12:00 pm to 1:00 pm Where: Rm 222, Science Centre 303, University of Auckland This talk will give an overview of the issues involved in fitting, and then interpreting, Gaussian graphical models for high dimensional data. We will discuss both formulations based on undirected graphical models, and methods based on first fitting an acyclic directed graph. We introduce priors over the graphical structure that encourage sparsity, discuss model search strategies, and finally consider the process of extracting substantive insights from the fitted graph and precision matrix. We formulate this final step in terms of assigning path weights that represent the importance of different intermediaries between two correlated variables. (Joint work with Carlos Carvalho, Adrian Dobra and Mike West) Small area analysis of cancer ratesSpeaker: Dr Mark Clements Affiliation: National Centre for Epidemiology & Population Health When: Wednesday, 20 July 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland There are several issues with presenting maps of cancer rates. The rates are generally imprecise and the level of precision is heterogeneous between areas. Consequently, raw cancer rates can be misleading. Moreover, there are often marked differences in the distribution of risk factors, which leads to over-dispersion and to local clustering. One approach to this problem is to employ shrinkage estimators, based on global or local random effects. The local random effects can be characterised by an adjacency matrix or some other measure of distance. Work by Roger Marshall has been used to develop empirical Bayesian estimators, however more recent work using full Bayesian methods has several advantages. First, both global and local random effects can be incorporated into the model. Second, posterior probabilities allow for a more careful interpretation of any estimates. Third, the model generalises easily to incorporate both spatio-temporal effects and covariates. I will present an implementation of the full Bayesian methods using WinBUGS. Statistical Inference and Tubular NeighborhoodsSpeaker: Catherine Loader Affiliation: Department of Statistics, Case Western Reserve University When: Tuesday, 21 June 2005, 10:00 am to 11:00 am Where: Rm 279, Bldg 303S, Science Centre, Univ Auckland Speaker: Michael Stuart Affiliation: Trinity College, Dublin When: Thursday, 9 June 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland Speaker: S.N. Lahiri Affiliation: Iowa State University When: Thursday, 2 June 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland In this talk, we consider the problem of bootstrapping a class of spatial regression models when the sampling sites are generated by a (possibly nonuniform) stochastic design and are irregularly spaced. It is shown that the natural extension of the existing block bootstrap methods for grid spatial data does not work under nonuniform stochastic designs. A variant of the blocking mechanism is proposed. It is shown that the proposed block bootstrap method provides a valid approximation to the distribution of a class of $M$-estimators of the spatial regression parameters. Finite sample properties of the method are investigated through a moderately large simulation study. The talk is based on a joint work with Professor J. Zhu. Timing and Social Change: An Introduction to Event History AnalysisSpeaker: Assoc. Professor Bradford S. Jones Affiliation: Department of Political Science, University of Arizona When: Tuesday, 31 May 2005, 1:00 pm to 2:00 pm Where: Federation of University Women, Old Government House "Developments in Social Research Methods" Occasional Seminar Series Abstract: ALL WELCOME Co-hosted by the NZ Social Statistics Network and the Department of Political Studies, University of Auckland. This seminar will be followed by a 2 hour workshop providing more detailed information on the application of event history modeling methods. Attendance by prior enrolment only. To enrol, please email Andrew Sporle Investigating Response Times using the Generalised Lambda DistributionSpeaker: Robert King Affiliation: University Of Newcastle When: Tuesday, 10 May 2005, 1:00 pm to 2:00 pm Where: Rm B08, Math/Physics Bldg, Univ Auckland The generalised lambda distribution (gld, Freimer et al (1988)) is a distributional family that allows a wide range of shapes within the one distributional form. It is able to separately model the right and left tails, which makes it particularly attractive for response time modelling. Response times (the time for a person to respond to a stimulus in a simple experiment) are a continuing research interest in psychology. Of particular interest is our assessment of whether changes in conditions (particularly in difficulty of a response task) lead to a change in only location and scale, or whether there is a shape change as well (Heathcote et al, 1991). I will present an alternative five parameter version of the distribution (including the addition of an additional skewing parameter), which shows promise for fitting tail-shape parameters for a large data set and then keeping these fixed while allowing skewness to change over conditions. References:gld package for R Marshall Freimer, Govind S. Mudholkar, Georgia Kollia, and C. Thomas Lin. A study of the generalized Tukey lambda family. Communications in Statistics - Theory and Methods, 17:3547-3567, 1988 Familial Longitudinal Data Analysis with Biomedical ApplicationsSpeaker: Brajendra Sutradhar Affiliation: Memorial University of Newfoundland, Canada When: Thursday, 5 May 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland In many bio-medical studies, data are collected repeatedly from a large number of families/clusters over a small period of time. Here the repeated responses such as binary or counts collected from the members of the same family become structurally and longitudinally correlated. This talk will demonstrate the applications of a recently developed generalized quasi-likelihood (GQL) approach in analyzing such a familial longitudinal data. As an illustration, some results from the analysis of a Canadian health care utilization data, will be presented. Penalized versus Generalized Quasi-likelihood Inference in GLMMSpeaker: Brajendra Sutradhar Affiliation: Memorial University of Newfoundland, Canada When: Monday, 2 May 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland For the estimation of the main parameters in the generalized linear mixed model (GLMM) set up, the penalized quasi-likelihood (PQL) approach, analogous to the best linear unbiased prediction (BLUP), treats the random effects as fixed effects and estimate them as such. The regression and variance components of the GLMMs are then estimated, based on the estimates of the so-called random effects. Consequently, the PQL approach may or may not yield consistent estimate for the variance component of the random effects, depending on the cluster size and the associated design matrix. In this talk, we introduce an exact quasi-likelihood approach that always yields consistent estimators for the parameters of the GLMMs. This approach also yields more efficient estimators for the parameters as compared to the estimators obtained by a recently introduced simulated moment approach. Binary and Poisson mixed models are considered, for example, to compare the asymptotic efficiencies. Learning From Huge Data Sets by SVMsSpeaker: Assoc. Prof. Vojislav Kecman Affiliation: School of Engineering When: Thursday, 28 April 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland The seminar introduces the basics of a 'novel' learning approach known as the support vector machines (SVMs) and it discusses a novel iterative approach in solving (QP based) SVMs learning problem when faced with huge data sets (say several hundreds thousands, or more, training data pairs, i.e., measurements i.e., samples). It will be shown that the SVMs learning is related to both the classic Kernel AdaTron method and Gauss-Seidel iterative procedure for solving a system of linear equations with constraints. Comparissons with an SMO based algorithms will be given. Expected audience are all interested in machine learning, as well as the users and developers of the SVMs learning tools. Everyone welcome! Efficient calculation of p values in permutation significance testsSpeaker: Peter M.W. Gill Affiliation: Australian National University When: Thursday, 31 March 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland Permutation and bootstrap resampling techniques for hypothesis testing are widely used and enjoy many desirable statistical properties. Unfortunately, exhaustive examination of the resampling space is usually prohibitively expensive and it is often necessary to resort to random sampling within the space. However, I will show that it is possible to write the exact p value as an infinite series whose terms can be computed rapidly, even for large group sizes. Because of connections with the N-step random walk in the plane, the rate of convergence of the series improves as the size of the resampling space increases. Statistics: Reflections on the past and visions for the futureSpeaker: C.R. Rao Affiliation: Pennsylvania State University, USA When: Wednesday, 23 March 2005, 12:00 pm to 1:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland Statistics is not a basic discipline like mathematics, physics, chemistry or biology each of which has a subject matter of its own on which new knowledge is built. Statistics is more a method of solving problems and creating new knowledge in other areas. Statistics is used in diverse fields such as scientific research, legal practice, medical diagnosis, economic development and optimum decision making at individual and institutional levels. What is the future of statistics in the 21st century which is dominated by information technology encompassing the whole of communications, interaction with intelligent systems, massive data bases, and complex information processing networks? The current statistical methodology based on probabilistic models applied on small data sets appears to be inadequate to solve new problems arising in emerging areas of science, technology and policy making. Ad hoc methods are being put forward under the title Data Mining by computer scientists and engineers to meet the demands of customers. The talk will focus on a critical review of current methods of statistics and future developments based on large data sets and enormous computing power and efficient optimization techniques. CR Rao, one of the great figures of 20th century statistics, is NZSA Visiting Lecturer for 2005 http://nzsa.rsnz.org/visiting_lecturer_2005.shtml A biography of CR Rao is given here http://www.stats.waikato.ac.nz/rao_biography.pdf "C.R. Rao is among the world leaders in statistical science over the last six decades. His research, scholarship and professional services have had a profound influence in theory and applications of statistics. Technical terms such as, Cramer-Rao inequality, Rao-Blackwellization, Rao's Score Test, Fisher-Rao Theorem, Rao distance, and orthogonal arrays (described by Forbes Magazine as "new manthra" for industries) appear in all standard books on statistics. Two of his papers appear in Breakthroughs in Statistics in the last century. C.R. Rao is the author of 14 books and about 350 research papers." http://nzsa.rsnz.org/visiting_lecturer_2005.shtml General measures of variability and dependence for multivariate continuous distributionsSpeaker: Angelika van der Linde Affiliation: University of Bremen, Germany When: Thursday, 10 March 2005, 4:00 pm to 5:00 pm Where: Rm 222, Math/Physics Bldg, Univ Auckland In this talk general descriptive measures of multivariate variability and dependence are suggested which can be used in comparisons of random vectors of different dimensions. They generalize the measures of scatter and linear dependence proposed by Pena and Rodriguez (2003). The measure of variability that is introduced is the (transformed) r-th root of the entropy, and the measure of dependence is based on the mutual information between the components of an r-dimensional random vector, capturing general stochastic dependence instead of merely linear dependence. Decompositions of the measure of variability into a measure of scale and a measure of stochastic dependence are investigated. The decomposition resulting from independent components provides a representation of variability by the scale of independent components and thus generalizes the explanation of covariance by principal components in classical multivariate analysis. The ideas are illustrated for examples of r-dimensional log-normal random vectors. |
Contact DetailsPostal address: Courier address: Phone: +649 3737599 x86893 or x87510 Enquiries: |