Subtle changes, bugs and feedback

You are reading a single comment by @danb and its replies. Click here to read the full conversation.

•

Velocio in reply to @bq

It is the same.

Presently determining "unread" is a row-based function.

Performance with SQL is never achieved when row-based functions are called across collections.

•

danb in reply to @Velocio

But surely you already know if it is read or not in order to display the "new" icon?

Or is that cookie based and "read" is fed hack to the dB later?

•

Velocio in reply to @danb
We have a table that looks roughly like this:
```
CREATE TABLE read AS (
profile_id bigint NOT NULL,
item_type_id bigint NOT NULL,
item_id bigint NOT NULL,
when timestamp NOT NULL
);
```
Where item_type_id contains a number that basically answers the question "Is this a conversation? Is it a PM? Is it a forum?" and so on for every "type" we support.

When you read a conversation (like this one) we will put a value in there: INSERT INTO read (50180, 6, 131364, timestamp_of_last_post_on_page).

If you read a forum, then that trumps a conversation... so we can delete from read everything that refers to an item within the forum, and then insert the row for the forum.

Then when we want to check whether something is "read", we work out what you're talking about... a conversation... and determine whether a read row exists for that, and whether a read row exists for any of it's parents (forum or site), and then take the MAX of any of those values, and compare it to the last_modified for the conversation.

All of that is presently wrapped in a decoration row-based function (used in the SELECT and not in the WHERE that roughly looks like: SELECT is_unread(50180, 6, 131364) AS unread.

This all works blindingly well. If you read something on your mobile, the desktop reflects it, when we build apps and you read something in an app, the desktop reflects it. Read it on your desktop, the app would reflect it. And so on.

It only doesn't work now that there is a clamour for a filter based on is_unread.

As is_unread is a row-based function, were you to apply it to the search (for example), then it would be applied as part of the WHERE. Applying a row-based function to the SELECT means that on a page of 25 items it's only going to be called 25 times. Applying a row-based function to the WHERE or to a subquery that will be filtered means applying it to potentially hundreds of thousands (or even millions) of items, the vast majority of which are going to be discarded.

An example scenario in which row-based functions bite people in the arse is in Postgres's full-text search: http://www.postgresql.org/docs/9.2/static/textsearch-controls.html#TEXTSEARCH-HEADLINE

They have a row-based function which is also cheap on a per row basis, but expensive across lots of rows. It fetches from the searched documents the highlighted fragment that matched the search:
```
SELECT ts_headline('english',
 'The most common type of search
is to find all documents containing given query terms
and return them in order of their similarity to the
query.',
 to_tsquery('query & similarity'));
 ts_headline 
------------------------------------------------------------
 containing given query terms
 and return them in order of their similarity to the
 query.
```
Their warning on that page: A typical mistake is to call ts_headline for every matching document when only ten documents are to be shown. SQL subqueries can help; here is an example:

And their example:
```
SELECT id, ts_headline(body, q), rank
FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank
 FROM apod, to_tsquery('stars') q
 WHERE ti @@ q
 ORDER BY rank DESC
 LIMIT 10) AS foo;
```
Row-based queries are extremely ill-suited to filtering, and should never be used when the majority of the work will actually just be thrown away afterwards.

Obviously we have the data in the read table, but our current way of using that data is to check the "read" status of individual items through a row-based function for the purpose of painting a "New" label. The current row-based function is ill-suited to being used as a filter (we know because we've done it).

Subtle changes, bugs and feedback

About

LFGSS