I have a script for concurrent web scraping: https://github.com/mateuszbuda/webscraping-benchmark
It takes a file with urls and scrapes the content. For more demanding websites it can use web scraping API that handles rotating proxies.
I add some logic to process the output as needed.