-
Out of interest in lay mans terms why did that happen?
There were two potential issues, one confirmed and the other a possibility.
The confirmed one was a cartesian join on the links table for a given comment when multiple revisions of the comment existed (we keep old edits for liability reasons and to support wiki style behaviour in future).
That meant that if a comment had been edited three times, the link would be returned three times and the item embedded thrice.
This never occurred during testing as during testing we were not editing the same comment over and over... we just created it and viewed the output.
The second factor was a race condition scenario. We generate HTML when a comment is first inserted into our database, but... there are some scenarios in which the HTML may be deleted and we would then not generate it until the comment is requested. i.e. When the embed code changes for gpsmadeeasy.com we'll go and wipe the HTML for all comments that have links to gpsmadeeasy.com. And then... when pages with the comment are requested, we'll re-generate the HTML from the original markdown, and do the embeds.
The race condition occurs when the page on which the comment exists is called simultaneously by multiple people, and thus triggers multiple processes to find the empty HTML and determine that they need to generate the HTML and perform the embeds. Because of the way our system works, not all of that process occurs within the same database transaction, and so it's not a case of "Last update wins". There is a real possibility that the timing is such that both processes could win and a double-embed would occur.
So... the solutions:
- Remove the cartesian on the query.
- Add an attribute to the
<a href="">
tagembed="true"
to indicate when we have already embedded a URL, and then don't embed if that exists.
- Remove the cartesian on the query.
Out of interest in lay mans terms why did that happen?