You can’t get much crawling done from published cloud IPs. Residential proxies a...

mrweasel · 2026-01-31T18:52:33 1769885553

> You can’t get much crawling done from published cloud IPs.

Think about why that might be. I'm sorry, if you legitimately need to crawl the net, and do so from a cloud provide, your industry screwed you over with bad behaviour. Go get hosting with a company that cares about who their customers are, you're hanging out with a bad crowd.

tonymet · 2026-01-31T18:58:57 1769885937

what industry is that? Every industry is on the cloud.

mrweasel · 2026-01-31T19:25:44 1769887544

No, no they really aren't, but I was thinking the "scraping industry" in the sense that that's a thing. Getting hosting in smaller datacenters is simple enough, but you may need to manage your own hardware, or VMs. Many will help you get your own IP ranges and ASN, that's going to go a long way, if you don't want to get bundled in with the bad bots.

This differs obviously, but having an ASN in our case means that we can deal you, contact you and assume that you're better than random bot number 817.

Kodiack · 2026-02-02T06:04:32 1770012272

Thank you for speaking some sense. As a site operator that's been inundated with junk traffic over the past ~month where well in excess of 99% of it has to be blocked, the scrapers have brought this upon themselves.

I actually do let quite a few known, "good" scrapers scrape my stuff. They identify themselves, they make it clear what they do, and they respect conventions like robots.txt.

These residential proxies have been abused by scrapers that use random legit-looking user agents and absolutely hammer websites. What is it with these scrapers just not understanding consent? It's gross.

tonymet · 2026-01-31T21:21:14 1769894474

Scraping isn’t an industry. There are legitimate and illegitimate scraping pursuits.

There are lots of healthy / productive businesses in the cloud and lots of scumbags, just like any enterprise.

I still have no idea about your point, by the way.

Kodiack · 2026-02-02T06:08:09 1770012489

If you have a "legitimate scraping pursuit", identify yourself appropriately that way. I'm happy to let most well-behaved scrapers access my content.

Hiding behind a residential proxy and using random user agents? Gross. Learn what consent is.

tonymet · 2026-02-04T23:38:27 1770248307

try scraping any of the major players e.g. Amazon without residential proxy it won't work. I appreciate that you are offering to abide by crawling etiquette (e.g. robots.txt) but no major app supports that any more.

You're thinking about the case of big AI companies crawling your blog. I'm talking about a small startup trying to do traditional indexing and needing to run from residential proxy to make it work.