Professor Alan Lee

Alan Lee City Room: 201 Science Centre
Tamaki Room: 721.332 Tamaki
Extn: 88749 / 85274
Email: lee@stat.auckland.ac.nz

Alan Lee attended Auckland University for his undergraduate and Masters degrees, and the University of North Carolina for his PhD. He joined the staff of the then Mathematics Department at Auckland University in 1974. He has held visiting academic appointments at Indiana University, the University of North Carolina, McGill University and Southampton University. His research interests include the analysis of directional data, non-parametric statistics, statistical computing, the analysis of correlated categorical data, the application of capture-recapture methods to the estimation of population size, regression for response-selective sampling designs and the creation of synthtic data sets.

More Information:

Recent manuscripts

1. A. J. Lee, A.J Scott and C.J. Wild (2009) Efficient estimation in multi-phase case-control studies.

In this paper we discuss the analysis of multi-phase, or multi-stage, case-control studies and present an efficient semiparametric maximum-likelihood approach that unifies and extends earlier work, including the seminal case-control paper by Prentice & Pyke (1979) as well as work by Breslow & Cain (1988), Scott & Wild (1997), Breslow & Holubkov (1997), and others. The theoretical derivations apply to arbitrary binary regression models but we present results for logistic regression and show that the approach can be implemented by including additional intercept terms in the logistic model and then making some simple corrections to the score and information equations from the prospective loglikelihood.

2. A. J. Lee (2009) Circular Data.

We give a brief survey of the field of circular statistics,including summary statistics, circular distributions, basic inference and models for regression and time series. An extensive bibliography is provided.

3. A. J. Lee (2009) Generating Synthetic Microdata From Published Marginal Tables and Confidentialised Files.

We describe several methods for generating synthetic data sets. The methods we describe are based on creating data sets using a combination of publically available marginal tables, and microdata samples. We describe a set of R functions which implement the methods under study, and use these functions to apply the methods to data from the 2001 Census of Population and Dwellings.

4.Alan Lee and Yuchi Hirose (2007) Semi-parametric efficiency bounds for regression models under generalised case-control sampling: the profile likelihood approach.

Abstract: We obtain an information bound for estimates of parameters in general regression models where data is collected under a variety of response-selective sampling schemes. The asymptotic variances of the semi-parametric estimates of Scott and Wild (1986, 1997, 2001) are compared to the bound and the estimates are found to be fully efficient.

5. A. J. Lee (2007) On the semi-parametric efficiency of the Scott-Wild estimator under choice-based and two-phase sampling.

Using a projection approach, we obtain an asymptotic information bound for estimates of parameters in general regression models under choice-based and two-phase, outcome-dependent sampling. The asymptotic variances of the semi-parametric estimates of Scott and Wild (1997, 2001) are compared to these bounds and the estimates are found to be fully efficient.

6. A. J. Lee (2007) Semi-parametric efficiency bounds for regression models under choice-based sampling.

We extend the Bickel--Klaassen--Ritov--Wellner theory of semi-parametric efficiency bounds to the case of sampling from several populations, and discuss the form of the efficient score and efficient influence function in this situation. The theory is applied to obtain an information bound for estimates of parameters in general regression models under case-control sampling.

.

7.A.J. Lee, A.J. Scott and C.J. Wild. (2007) On the Breslow-Holubkov estimator.

Abstract: Breslow and Holubkov (1997) developed semiparametric maximum likelihood estimation for two-phase studies with a case-control first phase under a logistic regression model and noted that, apart for the overall intercept term, it was the same as the semiparametric estimator for two-phase studies with a prospective first phase developed in Scott and Wild (1997) . In this paper we extend the Breslow-Holubkov result to general binary regression models and show that it has a very simple relationship with its prospective first-phase counterpart. We also explore why the design of the first phase only affects the intercept of a logistic model, simplify the calculation of standard errors, establish the semiparametric efficiency of the Breslow-Holubkov estimator and derive its asymptotic distribution in the general case.

8. Alan Lee (2006) Generating synthetic unit-record data from published marginal tables.

Abstract: We survey methods for generating synthetic data sets without making use of unit-record data. The methods we describe are based on creating data sets which match publically available marginal tables. We describe a set of R functions which implement the methods under study, and apply the methods to data from the 2001 Census of Population and Dwellings.

Selected publications:

  • A. J. Lee (1990) U-Statistics, Theory and Practice. 302pp. Marcel Dekker, New York.
  • A. J. Lee (1995) Data Analysis. 410pp. To be published by Oxford University Press.
  • N. I. Fisher and A. J. Lee (1981). Non parametric measures of angular-linear association. Biometrika, 68, pp 629-636.
  • A. J. Lee (1981). A note on Campbell's sampling theorem. SIAM J. Appl. Math., 41, pp 553-557.
  • N. I. Fisher and A. J. Lee (1982). Non parametric measures of angular-angular association. Biometrika, 69, pp 315-321.
  • A. J. Lee (1982). On incomplete U-statistics having minimum variance. Aust. J. Statist., 24, pp 275-282.
  • N. I. Fisher and A. J. Lee (1983) A correlation coefficient for circular data. Biometrika, 79, 159-166.
  • N. I. Fisher and A. J. Lee (1986) Correlation coefficients for random variables on the sphere and hypersphere. Biometrika, 79, 159-166.
  • N. I. Fisher and A. J. Lee (1992) Regression models for an angular response. Biometrics, 48, 665-677.
  • A. J. Lee, A. J. Scott and S. C. Soo. (1993) Comparing Liang-Zeger estimates with maximum likelihood in bivariate logistic regression. Journal of Statistical Computation and Simulation, 44, 133-148.
  • A. J. Lee (1993) Generating random binary deviates having fixed marginal distributions and specified degrees of association. The American Statistician, 47, 209-215.
  • N. I. Fisher and A. J. Lee (1994). Time series analysis of circular data. Journal of the Royal Statistical Society, Series B, 327,339.
  • Andrew Balemi and AJ Lee (1995) On the mean squared error of the sandwich estimator in Liang-Zeger estimation. Proceedings of the AC Aitken Memorial Conference.
  • AJ Lee, Lovina McMurchy and AJ Scott (1997) Reusing data from case-control stud ies. Statistics in Medicine, 16, 1377-1389
  • AJ Lee (1997) Modeling scores in the premier League: Is Manchester United really the best? Chance, 10, 15-19.
  • AJ Lee (1997). Some simple methods for generating correlated categorical variates. Computational Statistics and Data Analysis, 26, 133-148.
  • Belami, A and Lee, AJ (1999) Some Properties of the Liang?Zeger method applied to clustered binary regression Austral. & New Zealand J. Statist. 41, 43-58
  • Lee, A (1999) Modelling Rugby league data via bivariate negative binomial regression. Austral. & New Zealand J. Statist. 41, 141-152
  • A. J. Lee, G.A.F Seber, Jennifer K. Holden and John T. Huakau (2001) Capture-recapture, Epidemiology and List Mismatches: Several Lists. Biometrics, 57, 707-713 (S-Plus functions to implement the methods discussed in this paper are in file PL2functions.s. Code for the example is in the file example.)
  • A. J. Lee (2002) Effect of list errors on the estimation of population size. Biometrics, 58, 185-191. (S-Plus functions to implement the methods discussed in this paper are in file functions.s. Code for the examples is in Example1 and Example2. )

  • AJ Lee, O Nyangoma and GAF Seber. (2002) Confidence regions for multinomial parameters. Journal of Computational Statistics and Data Analysis, 39,329-342, 2002.
  • George A. F. Seber and Alan J. Lee, (2003) Linear Regression Analysis, 2nd Ed. Wiley, New York .
  • Robin KS. Hankin and Alan Lee. (2006). A new family of non-negative distributions. Australian and New Zealand Journal of Statistics, 48, 67-78.
  • Michael Hautus and Alan Lee (2006). Estimating sensitivity and bias in a yes/no task. British Journal of Mathematical and Statistical Psychology, 59, p257-273.
  • Lee, A.J., Scott, A.J., Wild, C.J. (2006). Fitting binary regression models with case-augmented samples, Biometrika, 385-397.
  • 59, p257-273, 2006