Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This. Relatedly, losing an easy Google News Archive was killer for some of the research I'd like to do. Several papers/articles I wrote in c. 2010 would not be possible to do today.


Common Crawl has a new news archive started a few months ago (http://commoncrawl.org/2016/10/news-dataset-available/) and the Internet Archive has had one going for quite a while.


Thanks for this! I'm talking about old scanned newspapers. :-) The Internet Archive has a good start, but it's pretty heavy on Kentucky, and few have in-text search available, which is killer if you're researching an event with few/no specific dates. (That's not to knock them—IA is pretty amazing, and OCRing newspapers is notoriously difficult.)


Do you mean the old deja thing? We (google) got a copy to the archive years ago.


This reply got a bit convoluted. My apologies.

First, I'm referring to this: https://www.theatlantic.com/technology/archive/2011/05/googl...

It's still /technically/ possible to search what's there via https://news.google.com/newspapers. Still, it's not exactly user-intuitive, and not being able to sort/search by date can make historical research very difficult (especially when the OCR isn't perfect—that's common, but trying several different phrases to make sure you've found everything is way easier when searching range of years).

Some related thoughts can be found in an old Hacker News post: https://news.ycombinator.com/item?id=7408034

Online newspaper archives are a ridiculously awesome boon for the humanities. Chronicling America from the Library of Congress, for instance, is great. It's the de facto successor to Google News Archive in the US. I just wish that Google News Archive could get a couple of the old search features back to aid researchers. :-)

Second, on a quick tangent I just discovered: when you select "archives" at news.google.com, it says "looking for scanned newspapers?" with a link to: https://support.google.com/news/answer/3334. But there's nothing there anymore about scanned news. :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: