基于lq正则项的稀疏线性判别分析

Sparse Linear Discriminant Analysis Based on lq Regularization

线性判别分析在特征提取和数据降维及分类上具有重要的作用, 随着科技的进步, 需要处理的数据越来越庞大, 然而在高维情形下, 线性判别分析面对着两个问题—投影后的数据解释性不足, 因为它们均涉及到所有p个特征, 是所有特征的线性组合以及类内协方差矩阵的奇异性问题。线性判别分析存在三个不同的论点: 多元高斯模型、Fisher判别问题和最优评分问题。针对这两个问题, 本文建立了一种求解第k个判别成分的模型, 该模型首先对线性判别分析中的Fisher判别问题原始模型做出变换, 利用类内方差的对角估计矩阵代替原始类内协方差矩阵, 克服了矩阵奇异的问题, 同时将其投影到正交投影空间上, 以便去掉其正交约束, 随后加入了lq范数正则项, 增强其解释性, 实现降维和分类的目的。最后给出了求解该模型的迭代算法及收敛性分析, 证明了由该算法产生的序列是下降收敛的, 且对任意初始值均收敛到问题的局部最小值。

Linear discriminant analysis plays an important role in feature extraction, data dimensionality reduction, and classification. With the progress of science and technology, the data that need to be processed are becoming increasingly large. However, in high-dimensional situations, linear discriminant analysis faces two problems: the lack of interpretability of the projected data since they all involve all p features, which are linear combinations of all features, as well as the singularity problem of the within-class covariance matrix. There are three different arguments for linear discriminant analysis: multivariate Gaussian model, Fisher discrimination problem, and optimal scoring problem. To solve these two problems, this article establishes a model for solving the kth discriminant component, which first transforms the original model of Fisher discriminant problem in linear discriminant analysis by using a diagonal estimated matrix for the within-class variance in place of the original within-class covariance matrix, which overcomes the singularity problem of the matrix and projects it to an orthogonal projection space to remove its orthogonal constraints, and subsequently an lq norm regularization term is added to enhance its interpretability for the purpose of dimensionality reduction and classification. Finally, an iterative algorithm for solving the model and a convergence analysis are given, and it is proved that the sequence generated by the algorithm is descended and converges to a local minimum of the problem for any initial value.