I also monitor daily, and if I see anything evade, I block it.
But anything successfully scraped before these will exist for a while until it's considered stale.
I return HTTP 403 to the majority of things... But did redirect some stuff to large random files and a couple of weeks ago accidentally served 2.3PB in 6 hours to the Facebook scraper.
The blocks are effective.
Attached you can see a scraper that hit us at 20:00 UTC yesterday, and it was effectively blocked.
I also monitor daily, and if I see anything evade, I block it.
But anything successfully scraped before these will exist for a while until it's considered stale.
I return HTTP 403 to the majority of things... But did redirect some stuff to large random files and a couple of weeks ago accidentally served 2.3PB in 6 hours to the Facebook scraper.
The blocks are effective.
Attached you can see a scraper that hit us at 20:00 UTC yesterday, and it was effectively blocked.