Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

PCA as feature selection is quite popular. Many text book on classification teach this.


So you have an example? I've never seen this and would like to see what people are doing here. I teach a lot of newbies data mining, so I'm very interested in how people get it wrong



But this is not so much "feature selection" as it is "compressing the data". It says right in the conclusion that the entire goal was "dimensionality reduction". In a very real way, PCA is selecting all features. That is, your data collection process remains unchanged. In a real feature selection, you would be able to say "ok, we don't need to collect X data anymore".


The point of the article, though, is that dimensionality reduction which minimizes information loss (PCA) isn't necessarily dimensionality reduction which minimizes signal loss.

A good example from the article is random features: random implies high information content but no signal value.


exactly right, barrkel. Did you read the article, hervature?


It says nothing about feature selection.


What is the difference between feature selection and dimensionality reduction?


Feature selection is a process by which you drop features for different reasons. The main reason features tend to be dropped is because they are closely related to another feature, so you only need one of them. This makes algorithms train faster, reduces noise, and makes it easier to diagnose what the algorithm did.

PCA takes some N features and compresses them into N-n features. This process ALSO eliminates Collinearity completely, as the resulting, compressed features will be completely uncorrelated. However, calling PCA a feature selection algorithm is a bit untrue, because you have essentially selected none of your features, you have completely transformed them into something else.


It doesn't seem like such a stretch to conceptualize that if PCA assigns a tiny weight to a variable (assume all variables have been pre de-biased to mean 0, std dev 1) then it is saying it's a feature that doesn't contribute much to the overall prediction and therefore "deselecting" it by merging it with several other variables that it's correlated with, and downweighting it relative to them.

Most successful techniques I see in deep nets take the incoming features and mux them into intermediate features which are the actual ones being learned. Feature selection and PCA are in a sense just built into the network.


Short answer: feature selection is one particular method of dimensionality reduction.

And most people when they say feature selection mean deliberate domain-driven selection of features.

That is to say, you can also create an entirely new synthesized feature from, say, 5 raw features and use it to replace those 5 features (this is ... PCA-esque.)

Or you could also use random forest techniques for example which just arbitrarily reduce the dimensions / features of each individual decision tree in the forest.

I agree many people here are making mountains out of molehills of terminology.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: