-
Fractional benefits. This site was been optimised to hell about 8 years ago and hasn't needed to be revisited, and so the benefits from upgrades isn't significant.
That said, I am going to book a week off work to spend time working on the site and I want to add a lot more observability so that I can get a far better idea of slow queries and where the latency is. At the moment my hunch is that nearly all of the time is spent in the Django / Python rather than in the database or Go (API) layer. That hunch is driven by my local dev environment, where I have partially implemented the web layer in Go and the speedup is phenomenal and it's really revealing how bad the Python layer is in comparison.
The SQL layer... I have a slow query log and virtually nothing goes into it except for some very generic searches on the largest sites.
-
We are in the latest postgres 14.
And yeah the search is phenomenal.
The advantage of implementing search in the database are twofold:
- Permissions can be applied to the search at the time of search.
- No external service is needed for indexing.
We're using the search in a pretty advanced way, every single feature is being used from weighting, ranking, result snippets. It's pretty cool.
- Permissions can be applied to the search at the time of search.
-
-
-
-
-
Easy... no.
Difficult... not really.
So long as you know the profile ID that you want to search, then inside a given thread on the right hand side is a search box that allows searching within a thread.
In that box enter
authorId:59312
and hit enter.Tada... you have now filtered this thread to just your comments.
This can be done for any
authorId
in any thread whilst also combining it with any search terms.Additionally you can use that in the top level query to search for search terms by an author across the site (but remember to uncheck the "Search titles" bit).
-
-
-
You can't.
Hmm.
This is the problem with language search.
Most people hate it, because exact match never gets plurals, etc correct. But if one does language based search (which I did) then the exact match doesn't work either.
I have to upgrade the database soon, I'll see if the search offers any new features in this area.
-
-
-
-
Have you considered Elastic search?
No-one has ever shown me a good way to use Elastic Search and yet have it trim the results by permissions.
We have 3 sets of permissions that need applying to everything:
- Is this in a forum you're allowed to view?
- Is this in a private message you're allowed to view?
- Does it exclude the forums, items and profiles that you are ignoring?
I could potentially do #1 trivially, but #2 is harder to imagine how it's done (given that participants can be added and removed so easily and changes have to be guaranteed to be instant) and #3 screws me right over.
- Is this in a forum you're allowed to view?
-
That basically does not work, as the search is designed for language and not links.
But in theory I could make it work, as I already store every link in a different table. Could probably add it by adding a
link:8mQxum-DEH4
which would find all places this video were embedded: https://www.lfgss.com/comments/13061166/ -
I know.
I'm all kinds of pleased with it.
But I'm also looking at a search query that makes my head spin... not least because what you see above is one permutation of it, and it can actually build itself in hundreds of different ways and I needed to make sure I hadn't slowed down some of those other ways.
Searching things that have attachments is faster now too.
Am pleased with myself. Will go to bed smug as no-one has yet reported something has broken. If I can make it through my birthday without reports of me having broken stuff, it will be a good day.
-
-
Notes to myself beyond the above...
UPDATE search_index si SET microcosm_id = f.microcosm_id FROM flags AS f WHERE si.item_type_id = 4 AND si.microcosm_id IS NULL AND si.parent_item_type_id != 5 AND si.parent_item_type_id != 3 AND f.item_type_id = si.parent_item_type_id AND f.item_id = si.parent_item_id; CREATE INDEX ON search_index (microcosm_id ASC NULLS LAST); -- In the search query, does this even make sense? COALESCE(f.microcosm_id, f.item_id) IN (SELECT microcosm_id FROM m)
Also update
update_revisions_search_index()
to set themicrocosm_id
when a comment revision is posted. -
OK, I have another approach... I'll try this out this weekend...
WITH m AS ( SELECT m.microcosm_id FROM microcosms m LEFT JOIN permissions_cache p ON p.site_id = m.site_id AND p.item_type_id = 2 AND p.item_id = m.microcosm_id AND p.profile_id = 47686 LEFT JOIN ignores_expanded i ON i.profile_id = 47686 AND i.item_type_id = 2 AND i.item_id = m.microcosm_id WHERE m.site_id = 234 AND m.is_deleted IS NOT TRUE AND m.is_moderated IS NOT TRUE AND i.profile_id IS NULL AND m.microcosm_id = 549 AND ( (p.can_read IS NOT NULL AND p.can_read IS TRUE) OR (get_effective_permissions(234, m.microcosm_id, 2, m.microcosm_id, 47686)).can_read IS TRUE ) ), sr AS ( SELECT si.item_type_id ,si.item_id FROM search_index si ,plainto_tsquery('mks gr9') AS query WHERE si.site_id = 234 AND si.document_vector @@ query AND si.microcosm_id IN (SELECT microcosm_id FROM m) ) SELECT total ,item_type_id ,item_id ,parent_item_type_id ,parent_item_id ,last_modified ,rank ,ts_headline(document_text, query) AS highlight ,has_unread(item_type_id, item_id, 47686) FROM ( SELECT COUNT(*) OVER() AS total ,f.item_type_id ,f.item_id ,f.parent_item_type_id ,f.parent_item_id ,f.last_modified ,ts_rank_cd(si.document_vector, query, 8) AS rank ,si.document_text ,query.query FROM sr JOIN search_index si ON sr.item_type_id = si.item_type_id AND sr.item_id = si.item_id JOIN flags f ON f.item_type_id = sr.item_type_id AND f.item_id = sr.item_id LEFT JOIN ignores i ON i.profile_id = 47686 AND i.item_type_id = f.item_type_id AND i.item_id = f.item_id ,plainto_tsquery('mks gr9') AS query WHERE f.site_id = 234 AND i.profile_id IS NULL AND f.microcosm_is_deleted IS NOT TRUE AND f.microcosm_is_moderated IS NOT TRUE AND f.parent_is_deleted IS NOT TRUE AND f.parent_is_moderated IS NOT TRUE AND f.item_is_deleted IS NOT TRUE AND f.item_is_moderated IS NOT TRUE AND ( -- Things that are public by default COALESCE(f.parent_item_type_id, f.item_type_id) = 3 OR -- Things in microcosms COALESCE(f.microcosm_id, f.item_id) IN (SELECT microcosm_id FROM m) ) ORDER BY f.last_modified DESC LIMIT 25 OFFSET 0 ) r
That correctly returns 52 results, in the right forum, bearing in mind the scope of the search and any ignores that you may have in place.
Oh, and it does it in 111 milliseconds.
-
Made the major change... I'll see how it goes.
Your original query: https://www.lfgss.com/search/?q=mks%20gr9+forumId:549+sort:date
-
Ooh, that's a good idea.
It didn't mean that.
It meant that "page 1 is fast", "pages 2 and onwards are a little slower".
But... hey, I could prefetch the next search page and then when you click the link the browser either has it, or the server has already cached the query... either way that would be fast.
Cool.
So my changes this evening (when I get home) will be:
- Split the search query into 2 queries, one that does the search and one that counts the results.
- Add a prefetch meta header to the search results page.
- Entirely unrelated, add a meta header to prevent sending referrer headers off-site, which improves privacy.
- Split the search query into 2 queries, one that does the search and one that counts the results.
-
I've just confirmed that running it as one query (do the search and figure out how many results there are whilst only returning the first 25) takes 1 minute and 38 seconds.
Running it as two queries (do the search, return 25 rows and then figure out how many results there are) takes 2.5 seconds in total.
Definitely will change this.
Though... there is a risk that the search will be slower on subsequent pages. i.e. it may take towards 15-20 seconds (and risk timing out on some queries that return very large result sets) to fetch page 2 or more.
In essence I had chosen to implement a way of searching in which the cost for all pages was equal, but it's probably best to tune for making page 1 very cheap, and subsequent pages expensive.
-
https://www.lfgss.com/search/?q=Montenegro+sort:date
But can't see much there.
https://www.lfgss.com/conversations/149770/?offset=50#comment16552577
That looks plausible... but doesn't contain the terms you wanted.
Are you sure you remember how they phrased it?