• How big does the database of spammers have to get for this to be impractical due to lag of iterating through all the known spammers? Or is this not a problem until the DB is huge?

    It's an interesting question, and the answer would be that it couldn't get too big to be impractical.

    There are only a few search fields (username, IP address, email) and all could be hashed into single column PK tables that are partitioned by hash over multiple nodes.

    So even without a map-reduce solution you could scale a SQL database up to billions of records providing you can afford enough nodes. Thankfully on that note an ISP is donating the machines for this, and to be honest 1.7 million entries is a small database... some of the LFGSS tables are way larger (did I ever tell the story of how the LFGSS backup passed 40GB last month and required me to upgrade servers on Christmas Eve?... LFGSS is roughly 6m rows of data in 180 database tables.).

About

Avatar for Velocio @Velocio started