-
Large = db files physically cannot fit on one box or (write or read) operations exceed capability of one box.
We're a long way short of that.
I operate a single node with failover. I presently put all reads and writes through the master, and am just using a DNS record for the address and the failover mechanism will go to the slave, and that becomes the new master.
I do not have read slaves, and don't have a traditional cluster... just a single Linode with a 12GB RAM allocation offering 6 CPU cores. It's a balance between computation and enough RAM to hold indexes. This is Linodes most powerful (computation) and high RAM instance for below $100 per month... and we have 2 of them, so it's a huge part of our hosting cost.
Fulltext we do process relevant fields into a ts_vector with GIN index, but in a different table I named
search_index
. That table has some extra fields, the fields that are relevant when fulltext search, i.e. if people search text and author, then I put the author id on there too as well as as the text index columns. I only want full text search to hit that table, not to do lots of JOINs.I've found the PostgreSQL fulltext to be good enough for 99% of scenarios, which makes it good enough full stop.
PostgreSQL is easy to sysadmin, easy to run, simple to configure for failover, easy to partition, easy to shard, easy enough to cluster, easy to set up read slaves.
It's easy.
It only really gets messy if you go overboard on either triggers or stored procs. DBA rules still apply, I only treat the database as the owner of data integrity. Meaning I permit some triggers and funcs, but only where they help enforce data integrity. I leave computation to the application calling the database.
And do you run the cluster yourself etc, or managed solution like RDS? What kind of set up are we talking?
Also interested in what kind of approach you're using towards fulltext search?
We are running RDS postgres flavour ... I have no idea what's considered a large vs a small DB.
I really like postgres, especially the jsonb stuff, and have learnt a lot about it in the last two years. If there's a complicated reporting query they want, then it's always me writing them.
For fulltext right now I am running a trigger that processes the relevant fields into a ts_vector. Not sure if that's a good or shit approach.