Statistics Seminar at Georgia State University

Fall 2011-Spring 2012, Fridays 3:00-4:00pm, Paul Erdos Conference room (796) COE

Organizer: Yichuan Zhao

If you would like to give a talk in Statistics Seminar, please send an email to Yichuan Zhao at

March 20, 3:00-4:00pm, 796 COE, Ms. Shuang Ji, Emory University


March 16, 3:00-4:00pm, 796 COE, Associate Professor Limin Peng, Department of Biostatistics and Bioinformatics, Emory University
Self-Consistent Estimation of Censored Quantile Regression

Abstract: The principle of self-consistency has been employed to estimate regression quantile with randomly censored response. The asymptotic studies for this type of approach was established only until recently, partly due to the complex forms of the current self-consistent estimators of censored regression quantiles. Of interest, how the self-consistent estimation of censored regression quantiles is connected to the alternative martingale-based approach still remains uncovered. In this paper, we propose a new formulation of self-consistent censored regression quantiles based on stochastic integral equations. The proposed representation of censored regression quantiles entails a clearly defined estimation procedure. More importantly, it greatly simplifies the theoretical investigations. We establish the large sample equivalence between the proposed self-consistent estimators and the existing estimator derived from martingale-based estimating equations. The connection between the new self-consistent estimation approach and the available self-consistent algorithms is also elaborated.

March 15, 3:00-4:00pm, 796 COE, Assistant Professor of Biostatistics, Mingan Yang, Saint Louis University


March 14, 3:00-4:00pm, 796 COE, Mr. Yang Ning, John Hopkins University


March 12, 3:00-4:00pm, 796 COE, Assistant Professor of Biostatistics, Lili Yu, Georgia Southern University


March 6, 3:00-4:00pm, 796 COE, Dr. Charles Heilig, CDC


Febuary 17, 2:00-3:00pm, 796 COE, Assistant Professor Xin Qi, Department of Mathematics and Statistics, Georgia State University
Functional Principal Component Analysis for Discretely Observed Functional Data with Applications

Abstract: We propose a new method to perform functional principal component analysis (FPCA) for discretely observed functional data by solving successive optimization problems. Our method does not require estimates of the individual sample functions or the covariance functions. Hence, it can be used to analyze functional data with multidimensional arguments (e.g. random surfaces) to which existing methods cannot be applied if the observation points are sparse and irregular. Furthermore, it can be applied to many processes and models with complicated or nonsmooth covariance functions. In our method, smoothness of eigenfunctions is controlled by directly imposing roughness penalties on eigenfunctions, which makes it more efficient and flexible to tune the smoothness. Efficient algorithms for solving the successive optimization problems are proposed. We provide the existence and characterization of the solutions to the successive optimization problems. The consistency of our method is also proved. We apply our method to classification problems of retinal pigment epithelial cells in eyes of mice and to longitudinal CD4 counts data.

February 10, 3:00-4:00pm, 796 COE, Assistant Professor Ying Guo, Department of Biostatistics and Bioinformatics, Emory University
A general probabilistic model for group independent component analysis and its estimation methods

Abstract: Independent component analysis (ICA) has become an important tool for analyzing data from functional magnetic resonance imaging (fMRI) studies. ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a pre-specified group design matrix and the uncertainty in between-subjects variability in fMRI data. We present a general probabilistic ICA (PICA) model that can accommodate varying group structures of multi-subject spatio-temporal processes. An advantage of the proposed model is that it can flexibly model various types of group structures in different underlying neural source signals and under different experimental conditions in fMRI studies. A maximum likelihood method is used for estimating this general group ICA model. We propose two EM algorithms to obtain the ML estimates. The first method is an exact EM algorithm which provides an exact E-step and an explicit noniterative M-step. The second method is an variational approximation EM algorithm which is computationally more efficient than the exact EM. We conduct simulation studies to evaluate the performance of the proposed methods. An fMRI data example is used to illustrate application of the proposed methods.

January 20, 2:00-3:00pm, 796 COE, Associate Professor Lisa Lambert, Department of Managerial Sciences, Robinson College of Business, Georgia State University
Approaches for Assessing the Construct Validity of Latent Variables

Abstract: The conclusions we can legitimately draw regarding the relationships among variables in a study rest on the definitions of our variables and the construct validity of measures of those variables. This presentation will offer best practices for ensuring construct validity. The relationship between substantive theory, measurement theory, construct definition, and measures, the implications of formative and reflective measures, and multidimensional constructs will be included. Current practices in establishing construct validity in the management literature will be described including the role of indicators of reliability, interrater agreement, and exploratory and confirmatory factor analyses. A range of models, scales and analyses will be presented.

November 18, 3:00-4:00pm, 796 COE, Dr. Harry Chen, VP, Consumer Banking Risk Management, SunTrust Bank
Bayes forecasting of AR(p) model with Gamma-Normal prior under different loss functions

Abstract: With the Gamma-Normal conjugate priors, the probability density function of k-step Bayes prediction for AR(p) is derived in concise matrix format. Based on Gamma-Normal conjugate priors, the Bayes estimates on different loss functions for AR(p) model are given.

November 11, 3:00-4:00pm, 796 COE, Associate Professor Jeff Qin, Department of Mathematics and Stat., Georgia State University
Empirical likelihood based inferences for the ROC and AUC regression models

Abstract: In ROC analysis, the area under the ROC curve (AUC) is a popular one number summary index of the discriminatory accuracy of a diagnostic test. Accounting for covariates can improve diagnostic accuracy of the test. Regression models for the ROC curve and the AUC are two means to evaluate the effects of the covariates on the diagnostic accuracy. In this paper, empirical likelihood (EL) methods are proposed for the AUC regression model and the ROC regression model respectively. For the regression parameter vectors in the AUC regression model and the ROC regression model, it is shown that the limiting distributions of their EL ratio statistics are the weighted sum of independent chi-square distributions. Confidence regions can be constructed for the parameter vectors in the regression models based on the newly developed empirical likelihood theories. Simulation studies are conducted to compare the relative performance of the proposed EL-based methods with the existing method in AUC regression. Finally, we illustrate the proposed methods with a real data set.

November 11, 2:00-3:00pm, 796 COE, Professor Sanjib Basu, Department of Mathematics, Northern Illinois University
A unified competing risks cure rate model with application to cancer survival data

Abstract: A competing risks framework refers to multiple risks acting simultaneously on a subject or on a system. A cure rate, or a limited-failure model, postulates a fraction of the subjects/systems to be cured or failure-free, and can be formulated as a mixture model, or alternatively by a bounded cumulative hazard model. We develop models that unify the competing risks and limited-failure approaches. The identifiability of these unified models are studied in detail. We describe Bayesian analysis of these models, and discuss conceptual, methodological and computational issues related to model fitting and model selection. We describe detailed applications in survival data from breast cancer patients in the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (NCI) of the United States.

November 4, 2:00-3:00pm, 796 COE, Professor Yi Li, Department of Biostatistics, University of Michigan
A New Class of Estimating Equation-based Variable Selectors for Risk Assessment

Abstract: We propose a new class of estimating equation-based Dantzig selectors that can achieve simultaneous estimation and variable selection in the absence of a likelihood function, even when the number of covariates exceeds the number of samples. Our research was motivated by practical problems encountered in two studies: a clinical trial of therapies for head and neck cancer, and a genomics study of multiple myeloma patients. These problems proved difficult to analyze under the likelihood setting and must instead be approached with estimating equations. We prove nonasymptotic probability bounds on the accuracy of our estimator, report extensive simulation results, and use our method to analyze the aforementioned problems and construct more accurate prediction rules. (Joint work with Dave Zhao from Harvard University)

October 21, 2:00-3:00pm, 796 COE, Assistant Professor Sarah Henes, Division of Nutrition, Byrdine F. Lewis School of Nursing and Health Professions, Georgia State University
Future Research and Collaboration

Abstract: Previous data indicate that measuring resting energy expenditure with a portable indirect calorimeter in the clinical setting is a particularly important assessment tool in older obese youth (Henes, unpublished data, 2010). While childhood obesity may be leveling off on the national level, recent data indicates that 17% of youth aged 2 to 19 years of age that are overweight and obese in the state of Georgia. To date, childhood obesity is one of the most dire public health dilemmas of our time. Registered dietitians and health care professionals can better determine caloric targets for weight loss in this population with the use of a clinically useful tool such as indirect calorimetry. The primary objective of this grant proposal is to access funds to purchase a portable indirect calorimeter. A pilot study will then be implemented such that resting energy expenditure (MREE) will be measured in obese teen youth (aged 17-18 yrs. of age) using the ReeVue portable indirect calorimeter used in the clinical setting and compared with MREE measured in the research setting with a metabolic cart. This validation study will then be used to help implement the use of portable indirect calorimetry in the clinical and community setting. One of the goals is to provide clinicians with a more accurate measurement of energy needs which can then be used to determine more accurate caloric targets for weight loss in the treatment of childhood obesity For pilot study: recruit 17-18 year old college freshman who are at or above the 85th percentile for age and gender. Sample size will be determined by power statistics. For further study- collaborations with community and medical center initiatives for the prevention and treatment of childhood obesity are currently being investigated.

October 14, 2:00-3:00pm, 796 COE, Associate Professor Ming Yuan, Industrial and Systems Engineering, Georgia Institute of Technology
High dimensional inverse covariance matrix estimation

Abstract: More and more often in practice, one needs to estimate a high dimensional covariance matrix. In this talk, we discuss how this task is often related to the sparsity of the inverse covariance matrix. In particular, we consider estimating a (inverse) covariance matrix that can be well approximated by ``sparse'' matrices, which oftentimes connects with graphical models. Taking advantage of such connection, we introduce an estimating procedure that can effectively exploit such ``sparsity''. The proposed methods can be efficiently computed and have the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the methods are adaptive to different types of sparsity of the problem.

October 7, 2:00-3:00pm, 796 COE, Assistant Professor Yuanhui Xiao, Department of Mathematics and Statistics, Georgia State University
Multiple Imputation

Abstract: Missing data is a common complication in data analysis. Missing data can cause difficulties in estimation, precision and inference. Although there are many methods to deal with incomplete data, multiple imputation (MI) has become one of the leading methods. In this talk, I will give an overview of MI and the theory behind it. I will also go over some softwares for MI and discuss the challenges for MI.