Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's better to think of linear regression and logistic regression as special cases of the Generalized Linear Model (GLM).

In that framework, they are literally the same model with different "settings" - Gaussian vs Bernoulli distribution.



I have to disagree with you. While assuming Gaussian disturbance terms results in a linear regression, the linear regression framework is more general. It makes no assumptions about the distribution of the disturbance terms. Instead, it merely restricts the variance to be constant over all values of the response variable.


Both things can be true.

Linear regression is extra-special because it's a special case of several different frameworks and model classes.

I should have written that it's better (in my opinion) to think of logistic regression in the context of GLMs, at least while you're learning.

Edit: yes logistic regression is a special case of regression with a different loss function. But it's not nearly "as special" as linear regression.


As above, I would strongly agree with you. Both linear and logistic regression can be special cases of frameworks that are more general and far less parametric than GLM. But they also have very intuitive or hands-on explanations, especially logistic regression, which GLM doesn't have.


Well, then I am gonna say it's even better to think of linear and logistic regression as special case of M Estimators or GMM.

Joke aside, the truth is that logistic regression can be understood based on several assumptions.

Above we have the latent variable explanation. Then there is a Bayesian version. There's even a "random utlity" formulation, where one models explicitly choices of an agent with a probabilistic error. That one is good to explain hierarchical logit models and many of the "issues" with logit such as IAA.

GLM on the other hand I don't feel like it adds much except parameterizing the procedure, which ain't even a good thing. Nowadays we appreciate the semi parametric nature of regression a lot, which is why GLM has declined in use.


The main benefit of the GLM formulation is the observation that your model implies a particular probability distribution for the target, whether you like it or not. And that your point predictions are in fact conditional means. In my opinion, this is an important aspect of modeling that is glossed over or omitted by a lot of introductory material.


I agree on the point of conditional means, however I'd say other statistical frameworks emphasize that point even more.

The point about the probability distribution is reasonable, but I am not sure if it is taken seriously by everyone applying GLM either. And again, if it is not necessary to assume such a distribution, then I would prefer a semi parametric approach, such as in linear regression.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: