Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It depends on what you're trying to do.

For most things, I use Node.js with the Cheerio library, which is basically a stripped-down version of jQuery without the need for a browser environment. I find using the jQuery API far more desirable than the clunky, hideous Beautiful Soup or Nokogiri APIs.

For something that requires an actual DOM or code execution, PhantomJS with Horseman works well, though everyone is talking about headless Chrome these days so IDK. I've not had nearly as many bad experiences with PhantomJS as others have purportedly experienced.



I have been playing around with Cheerio for a short while and it is quite cool! Although extracting comments wasn't as straightforward as I thought it would be.

Do you have any experience with processing and scraping large files using Cheerio? It doesn't support streaming does it? I am currently faced with processing a ~75 MB XML and I am not sure if Cheerio is suited for that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: