You are reading a single comment by @Festerban and its replies. Click here to read the full conversation.
    • User agent blocks
    • ASN blocks
    • IP blocks
    • Rate limits
    • HTTP header + TLS cipher blocks

    I also monitor daily, and if I see anything evade, I block it.

    But anything successfully scraped before these will exist for a while until it's considered stale.

    I return HTTP 403 to the majority of things... But did redirect some stuff to large random files and a couple of weeks ago accidentally served 2.3PB in 6 hours to the Facebook scraper.

    The blocks are effective.

    Attached you can see a scraper that hit us at 20:00 UTC yesterday, and it was effectively blocked.

About

Avatar for Festerban @Festerban started