• The underlying domain name for this website is lfgss.microco.sm and I've just discovered that the microco.sm domain name has been suspended (why?!) by nic.sm that manages it.

    The domain name has been suspended!!!!

    https://www.nic.sm

    Domain Name: microco.sm
    Registration date: 21/12/2011
    Status: Suspended
    

    This was discovered by people reporting broken avatars and image attachments.

    I've opened a support case with Gandi (the domain registrar), but will start moving things to a spare domain.

    I'll be moving from microco.sm to microcosm.app which I already own and it is active on Cloudflare already, which means that I can adjust the DNS records there and expect < 10s for each change.

    There will 100% be breakage during this process as microco.sm is a SaaS platform and it changing breaks the many websites using it (of which LFGSS is the largest)... One doesn't just change the domain name for a SaaS service, but I shall.

    Started: 2023-05-15T21:26
    Finished: 2023-05-16T11:25
    Duration of incident: 13h 59m
    Impact: 1h 3m of total outage from 21:26 to 22:24 on 2023-05-15
    Costs: £550 (TLS certs, email campaigns updating people on some sites, domain renewal / extension, DNS provider, new domain name)

    1. Auth0 has been updated to send from auth0@microcosm.app
    2. Sendgrid has been configured to send from microcosm.app
    3. Confirmed that Sendgrid can send with the new config
    4. Re-wrote every reference in microcosm app to point microco.sm to microcosm.app
    5. Purchased new TLS wildcard cert for *.microcosm.app (£250!!!)
    6. Updated all references to microco.sm within the Microcosm Go API
    7. Updated all references to microco.sm within the Python web site
    8. Updated all references to microco.sm within the https://microcosm.app site
    9. Updated all /etc/hosts references
    10. Updated the load balancer
    11. Installed the TLS cert
    12. Restarted everything
    13. Contacted the sites that have had new content this year and informed them of how to update their URLs or DNS and offered email lists should they need them for communication.
    14. Purge HTML cache to rebuild all of the content (fixes outbound links).
    15. Fix all forum logos on all sites, as they typically are assets on microco.sm
    16. Replace all microco.sm with microcosm.app in all comments
    17. Purge nginx file cache for the API
    18. Fix Sendgrid DKIM and SPF, fix the HSTS for the pervasive link tracking that I can't remove
    19. Reconfigure Cloudflare page rules for the API paths to cache all the files, but bypass cache for the API
    20. Open Support ticket with Cloudflare as microco.sm had cross-user CNAME grandfathered in, and microcosm.app does not, which means that other forums with their domains on Cloudflare cannot CNAME their custom domain.
    21. Update 37 applications in Auth0 to ensure that the CORS all correctly point to microcosm.app
    22. Extend the microcosm.app domain to 10 years (cost of £110)
    23. Add microcosm.app to DNS Made Easy, and pay 1 year (cost of £180)
    24. Change DNS Nameservers for microcosm.app from Cloudflare to DNS Made Easy as I need to temporaily get off Cloudflare whilst the CNAME does not work
    25. Hard disable SendGrid link tracking to prevent HTTPS errors
    26. Removed all Cloudflare proxying from microcosm.app, tested nothing broke, awaiting nameserver updates to flush through.
    27. Disable automatic renewal of microco.sm in Gandi (why pay for a suspended domain!?)
    28. Network traffic now I've moved microcosm.app off of Cloudflare is more than double, verified that we have a fast enough set of SSDs and also several 4Gbps links and should be able to keep up. But, I shall prep a load balanced second cache machine to bear the load if needed.
    29. Verify SPF and DMARC
    30. The admin panel on https://microcosm.app login was broken, re-wrote all references in the admin app.
    31. Changed nameservers to Linode as DNS Made Easy simply does not work in Firefox 🤷 will get the pro-rated refund and see how Linode fares. Linode Support confirmed that they're not white-labelling Cloudflare, so the cross-user CNAME issues should not arise.
    32. ERR no spare domains... purchased microcosm.ch as one should always have a spare domain name lying around in case you need it.
    33. Permanently HTTP redirect (HTTP 301) microco.sm to microcosm.app, the old domain will exist until mid-December, which is enough time for everything to be pointed at the new URLs.

    Note: microcosm.app now has DNS hosted by Akamai Linode rather than Cloudflare. As such we'll have higher outbound traffic fees in the future. The reason for this is I worked at Cloudflare and microco.sm was on a free (staff) Enterprise plan, but microcosm.app has no such favour given to it, and so it does not allow the same configuration to be achieved without a high price tag.

    No-one has "your domain name of your SaaS provider is suspended" on their disaster recovery playbooks, which goes to show the art of the game is to be able to handle the unexpected. Top tips for those doing disaster recover plans... just model the following scenarios: compute failure, DNS failure (including domain name), network failure, storage failure. If you have those modelled, you can compose the response to cover any scenario.

    Attachment shows impact to web traffic... it's about 1h3m of total outage, but mostly things have recovered. There is still a bumpy ride expected as DNS records flush through.


    1 Attachment

    • Screenshot 2023-05-16 111859.png
  • Cheers for the heads up and your tireless efforts as ever Boss

  • I don’t know what those words mean but thank you for making it all work again. Was wondering if the missing pics were just me or everyone was getting them.

  • <hold onto your butts gif>

    Good luck dude and much appreciated!

  • I don’t know what those words mean but thank you for making it all work again

    Seconded!

  • I am at step 4 of this:

    1. Auth0 has been updated to send from auth0@microcosm.app
    2. Sendgrid has been configured to send from microcosm.app
    3. Confirmed that Sendgrid can send with the new config
    4. Re-wrote every reference in microcosm app to point microcosm.app to microcosm.app

    And now I need to deploy the updated app... at this point, things will 100% guaranteed break.

  • Live surgery about to begin.

  • I think Brixtons forum is down as well, I can put something on our WhatsApp group. We don’t actually use the forum much

  • Just completed these steps:

    1. Purchased new TLS wildcard cert for *.microcosm.app (£250!!!)
    2. Updated all references to microcosm.app within the Microcosm Go API
    3. Updated all references to microcosm.app within the Python web site
    4. Updated all references to microcosm.app within the https://microcosm.app site
    5. Updated all /etc/hosts references
    6. Updated the load balancer
    7. Installed the TLS cert
    8. Restarted everything
  • Good job

  • Hah, thanks... there was a non-zero chance that this site did not come back.

  • Fast work. Now I'm curious why microcosm.app was suspended.

  • Hopefully the support ticket with Gandi will reveal this.

    But it's done... this was a very expensive domain name and I intended to move off of it, so whatever the reason I'll not be renewing in 6 months time.

    They could've just changed the domain rules for .sm or something silly... could be anything really.

  • It's a shame, I always thought microcosm.app was very elegant, but obviously there are more practical considerations.

  • A San Marino domain name, and a renewal process that required sending a fax in Italian was always a pain in the ass to be honest.

    This is why I committed to move immediately... the chances of reaching the right person in the office within San Marino who can make sense of what happened and undo it... are low.

  • I'm done and am going to bed...

    3.5 hours to move domain name of a SaaS platform and perform live surgery on a production system is pretty freaking stressful.

    I think everything works.

  • Now I'm curious why microcosm.app was suspended.

    .

    A San Marino domain name, and a renewal process that required sending a fax in Italian

    prolly brexit innit

  • You're a star

  • Do you need to run a DB migration or something to unfurl all shortened urls?

  • FYI outbound links in posts still refer to microcosm.app

  • Great success, thanks!

    Re-wrote every reference in microcosm app to point microcosm.app to microcosm.app

    Hope you changed them all to an env var ;)

  • Thanks boss.
    Appreciate all your hard work.

  • I thought of this after I went to bed. I need to purge the caches, but I'll do it in a little bit

  • That was impressive work, fixing the whole site in so little time.

  • Thanks for the emergency surgery!

  • Post a reply
    • Bold
    • Italics
    • Link
    • Image
    • List
    • Quote
    • code
    • Preview
About

Emergency maintenance: microco.sm domain suspended and moving to microcosm.app

Posted by Avatar for Velocio @Velocio

Actions