-
The query optimiser within PostgeSQL will execute them in parallel before merging the results within the UNION.
At the time each subquery runs, the result from the others is not known.
The UNION that discards the duplicates can only do so within each type of update "someone replied", "new thread in forum", "you've been mentioned"... so multiple mentions of you in a comment are removed, etc.
If I try and run each query in a specific order, perhaps by structuring the queries as preceding WITH queries, then I could use the output of those to feed into the next queries to remove the duplicates.... however...
The current query produces a single result set of all types of updates blended together as a single conceptual table, and this makes pagination of updates work well... I can change the LIMIT and OFFSET in one place and get to any page of updates.
But if I split the subqueries into separate queries then each of these need to be paginated separately, and this means search results suddenly need complex state to be tracked, to know the offset per query per search... alternatively I could materialise every visit to the updates page as a full table so that they could be paginated over.
Both of those approaches are insane... so the only real option is what we have already.
So if the query is fine... what can be done?
The only thing I see is to remove duplicates in the API / front-end after the query is run. But this will mean some pages won't actually have 25 updates on them.
I also can't program (unless you count smashing bits of VBA together) but it looks like there are three sequential parts to the query - could you add a condition to the second two to ignore any records found in the preceding part(s)?