You are reading a single comment by @Emyr and its replies. Click here to read the full conversation.
  • The query optimiser within PostgeSQL will execute them in parallel before merging the results within the UNION.

    At the time each subquery runs, the result from the others is not known.

    The UNION that discards the duplicates can only do so within each type of update "someone replied", "new thread in forum", "you've been mentioned"... so multiple mentions of you in a comment are removed, etc.

    If I try and run each query in a specific order, perhaps by structuring the queries as preceding WITH queries, then I could use the output of those to feed into the next queries to remove the duplicates.... however...

    The current query produces a single result set of all types of updates blended together as a single conceptual table, and this makes pagination of updates work well... I can change the LIMIT and OFFSET in one place and get to any page of updates.

    But if I split the subqueries into separate queries then each of these need to be paginated separately, and this means search results suddenly need complex state to be tracked, to know the offset per query per search... alternatively I could materialise every visit to the updates page as a full table so that they could be paginated over.

    Both of those approaches are insane... so the only real option is what we have already.

    So if the query is fine... what can be done?

    The only thing I see is to remove duplicates in the API / front-end after the query is run. But this will mean some pages won't actually have 25 updates on them.

  • Or make it easier by presenting the three sets of items separately, rather than as a single Following page.

About

Avatar for Emyr @Emyr started