There was a recent discussion on PCA for classification that I had walked into, however, everyone had left the building when I joined. Since I run into this conceptual misunderstanding of PCA's relevance to classification often, let me repeat what I said there.
The problem with using PCA as a preprocessing step for linear classification is that this dimensionality reduction step is being done without paying any heed to the end goal -- better linear separation of the classes. One can get lucky and get a low-d projection that separates well but that is pure luck.
Let me see if I can draw an example
The '+' and '-' denote the data points of the two different classes. In this example the PCA direction will be the along the X axis, which would be the worst axis to project on to separate the classes. The best in this case would have been the Y axis.
A far better approach would be to use a dimensionality reduction technique that is aware of the end goal. One such example is Fisher discriminant analysis and its kernelized variant.
The problem with using PCA as a preprocessing step for linear classification is that this dimensionality reduction step is being done without paying any heed to the end goal -- better linear separation of the classes. One can get lucky and get a low-d projection that separates well but that is pure luck. Let me see if I can draw an example
The '+' and '-' denote the data points of the two different classes. In this example the PCA direction will be the along the X axis, which would be the worst axis to project on to separate the classes. The best in this case would have been the Y axis.A far better approach would be to use a dimensionality reduction technique that is aware of the end goal. One such example is Fisher discriminant analysis and its kernelized variant.