Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

    GET /news.html HTTP/1.1
    User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    HTTP/1.1 200 OK
vs.

    GET /news.html HTTP/1.1
    User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    HTTP/1.1 403 Forbidden


I'm probably just missing the context of the discussion, but what does this mean & what's its significance?


it's a technical way to prevent google from reaching your site. You easily can do it right now (even better, use robots.txt instead of obscure hacks), but the publishers don't want it. They don't want google to stop showing their content, they want money for it and google is unwilling to pay.


If they don't want to be crawled, they could just change the robots.txt. But nobody does.


Almost nobody: http://www.guardian.co.uk/media/greenslade/2012/oct/22/googl...

I'm looking forward to see in a years time what the result of this little experiment is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: