1. Apparently modeling is a solved problem. No need for any knowledge of math/stats folks, just use the latest python packages.
2. Data collection is more important than actually knowing how a model works and even more importantly doesn't work. What happens when he is hired and can't put together a reliable model?
I think his approach is solid (using projects to learn and being ambitious), but it feels like trying to run before you can walk.
Like Jeremy Howard (fast.ai) says, nobody learns baseball by first studying several years how to build baseball bats, optimal playing strategies, or managing baseball teams. Nope, you're given a bat, told where to stand, swing the bat and try to hit the ball. Suddenly, you're playing baseball.
This feels inaccurate, or misleading because you typically learn baseball when very young. Going out and swinging wildly without knowing how is hardly "playing baseball."
When I tried to learn how to golf a lot of time was spent on proper form and club choice, not just "swing!" I swung as much without a ball in front of me as with one.
did your instructor also sit you down and instruct you on how the materials involved in your club make it possible for you to swing?
the club is just the component you need to launch the ball towards the hole using your skill. you don't need or want to know what the club is made of until you think of buying a more expensive club(which might be never)
My problem was not with Alex or his approach (which I agree is a great way to get started as stated in my original post).
It was the narrative the article was pushing about how knowledge of the field doesn't matter, just strong work ethic and selling yourself visually to potential employers.
I get where you're coming from (and I was convinced that I would dislike the article when I realised what it was about).
However, there's some good advice:
- get real data
- clean it, play with it, build models
- focus on the cleaning, as that's what the gig normally is (and damn right too, your models will be way better if you've taken the time to understand your data).
I did find the and then he got a job part annoying, but the post was much better than I expected (relative to other towardsdatascience posts).
I am not afraid to outsource domain-agnostic projects to smart junior practitioners, though. Your conclusion is a cultural barrier where metrics is the real king. Domain expertise is absolutely needed in a few specific cases only, the rest being gatekeeping.
I was maybe a bit too snarky, apologize if so. But the crux of it is that his model never achieved > 50% accuracy. It was never necessary because he already got the job. But the thing is, that is the hard part.
Closing those gaps in not just accuracy, but also generalization is the data science portion of the task (and requires much more knowledge than what is demonstrated by this blog post - although they could of left out a lot of detail). They make it seem like if he just had a little bit more time this would of been straightforward. But I am not sure about that.
I am all for giving junior practitioners a chance. But this is like hiring an english major for aerospace engineering because they built a model airplane in my opinion. But maybe I vastly underestimate the amount of extremely low hanging fruit out there for ML projects.
You’re welcome and not snarky at all. I am not sure you work with commercially-driven ML projects? It really depends on the metrics only. I can’t trust a ML black box for cancer screening (both false negatives and false positives have a big cost) or complex industrial failures (stopping a plant starts a lot of expensive compensating manouvres) but nobody sane in his mind has a real problem with 88, 90 or 92% accuracy for online retail recommenders (all the field is a kind of magic beyond a certain baseline and money never earned is not money lost from a bookkeeping perspective) or language translation (which is still human-proofed wherever it gets legal value of any kind). Hope I made my point clearer. Cheers.
1. Apparently modeling is a solved problem. No need for any knowledge of math/stats folks, just use the latest python packages.
2. Data collection is more important than actually knowing how a model works and even more importantly doesn't work. What happens when he is hired and can't put together a reliable model?
I think his approach is solid (using projects to learn and being ambitious), but it feels like trying to run before you can walk.