报告题目：Classification in Imbalanced Complex Longitudinal Data
报告摘要：Imbalanced classification has drawn considerable attention in the statistics and machine learning literature. Typically, traditional classification methods, such as logistic regression and support vector machine (SVM), often perform poorly when a severely skewed class distribution is observed, not to mention under a high-dimensional longitudinal data structure. Given the ubiquity of big data in areas including modern health research, face recognition, and object identification, it is expected that imbalanced classification may encounter an additional level of difficulty that is imposed by such a complex data structure.
In this talk, we propose a nonparametric classification approach for binary imbalanced data in longitudinal and high-dimensional settings. Technically, the functional principal component analysis (FPCA) is applied for feature extraction under the sparse and irregular longitudinal structure. The univariate exponential loss function coupled with group LASSO penalty is then adopted into the classification procedure in high-dimensional settings. Along with the improvement in AUC and sensitivity for imbalanced classification, our approach also provides a meaningful feature selection for interpretation while enjoying a remarkable computational efficiency. The proposed method is illustrated with the real data of Alzheimer’s disease and Pima Indians diabetes, and its empirical performance in finite sample size is extensively evaluated by simulations.