There was a recent discussion on PCA for classification that I had walked into, ...

There was a recent discussion on PCA for classification that I had walked into, however, everyone had left the building when I joined. Since I run into this conceptual misunderstanding of PCA's relevance to classification often, let me repeat what I said there.

The problem with using PCA as a preprocessing step for linear classification is that this dimensionality reduction step is being done without paying any heed to the end goal -- better linear separation of the classes. One can get lucky and get a low-d projection that separates well but that is pure luck. Let me see if I can draw an example

           ++++++
       ++++++++++++++++
             +++
           ---------
       -----------------
            -------

The '+' and '-' denote the data points of the two different classes. In this example the PCA direction will be the along the X axis, which would be the worst axis to project on to separate the classes. The best in this case would have been the Y axis.

A far better approach would be to use a dimensionality reduction technique that is aware of the end goal. One such example is Fisher discriminant analysis and its kernelized variant.