Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article is silly because PCA cannot select features. It is all about dimensionality reduction. You should think of PCA as the equivalent to VAEs from the neural network world. The idea would be something like this: you have big images (let's say 4k) and this is too expensive to train with/store forever. So, you collect a training set, train a PCA on these images, and then you can convert your 4k images to 720p or even 10 numbers, which you then use to predict/train whatever you want. Of course, we have algorithms that scale images but maybe all your images are of cats and there is a specific linear transformation that contains more information from the 4k image than simply scaling. The implicit thing here is that you still are collecting 4k images but just immediately compress them down using your trained PCA transformation.

So, although you have less numbers than before, you still need to collect the original data. A real feature selection process would be able to do something like: "the proximity of the closest Applebees is not important to predict house prices, you should probably stop wasting your time calculating this number". As others have mentioned, L1 regression or some statistical procedure to identify useless features is typically how this is done. I would also add that domain knowledge is probably your #1 feature selection because we have to restrict the variables we input in the first place and which data we prioritize is inherently selecting the features.



So would this be a more-or-less correct tl;dr:

Dimensionality reduction is a compressing data in a way that retains the most important information for the task

Feature selection is removing unimportant information (keeping/collecting, or selecting, only the important parts)

Both cut down on the amount of data you end up with, but one does it by finding a representation that is smaller, the other does it by discarding unnecessary data (or, rather, telling you which data is necessary, so you can stop collecting the unnecessary data).


> so you can stop collecting the unnecessary data

I think that's the key, thanks.

But still, if some inputs are redundant shouldn't this be somehow apparent in the eigen-vectors/values of the covariance matrix (making PCA an indirect feature selection algorithm)?


So they are functionally doing the same thing, reducing the amount of data used. I find the debate about what we should call it useless and pointless.

Understand what it is, what it doing, what it s limitations are, and use it appropriately based on your needs. Done.


indeed, predictable and disappointing how the discussion devolved into pedantry. should have been obvious what the author meant (plus it is clarified at the very beginning of the article). I'm not sure if this is a ML practitioner vs. statisticians thing or what


if you read the actual article, dimensionality reduction by PCA does not retain the most important information about the data


Yes, I would say this is entirely correct.


One method is to drop variables with small coefficients in the top n principal components. This is feature selection, by your definition.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: