I woke up this morning to a text from my ISP, “There is an outage in your area, we are working to resolve the issue”

I laugh, this is what I live for! Almost all of my services are self hosted, I’m barely going to notice the difference!

Wrong.

When the internet went out, the power also went out for a few seconds. Four small computers host all of my services. Of those, one shutdown, and three rebooted. Of the three that ugly rebooted some services came back online, some didn’t.

30 minutes later, ISP sends out the text that service is back online.

2 hours later I’m still finding down services on my network.

Moral of the story: A UPS has moved to the top of the shopping list! Any suggestions??

  • PadookOP
    link
    fedilink
    English
    1610 months ago

    I didn’t mean to imply that Services actually broke. Only that they didn’t come back after a reboot. A clean reboot may have caused some of the same issues because, I’m learning as I go. Some services are restarted by systemctl, some by cron, some…manual. This is certainly a wake up call that I need standardize and simplify the way the services are started.

    • @[email protected]
      link
      fedilink
      English
      1710 months ago

      We’ve all.committed that sin before. Its better to rely on it surviving the reboot than to try prevent the reboot.

      Also worth looking into some form of uptime monitoring software. When something goes down, you want to know about it asap.

      And documenting your setup never hurts :D

      • @nimmoA
        link
        English
        510 months ago

        On the uptime monitoring I’ve been quite happy with uptime kuma, but… If you put it on the same host that’s down… Well, that’s not going to work :p (I nearly made that mistake)

        • @[email protected]
          link
          fedilink
          English
          410 months ago

          It’s not the most detailed thing, but I just use a free account on cron-job.org to send a head request every two minutes to a few services that are reachable from the internet (either just their homepage or some ping endpoint in the API) and then used the status page functionality to have a simple second status page on a third party server.

          You can do a bit more on their paid tier, but so far I didn’t need that.

          On the other hand, you could try if a free tier/cheap small vps on one of the many cloud providers is sufficient for an uptime Kuma installation. Just don’t use the same cloud provider as all other of your services run in.

          • @nimmoA
            link
            English
            210 months ago

            Oh, I’m fine with my setup, I have a couple of external servers that can monitor all my web accessible stuff with kuma and then I’ve got another local one to monitor my non-web accessible stuff.

            Thanks for those tips though, definitely useful to consider other options

        • @[email protected]
          link
          fedilink
          English
          110 months ago

          Same, Uptime Kuma is fantastic. I put it on my most critical server, if Kuma is down, everything is down :D

    • @[email protected]
      link
      fedilink
      English
      210 months ago

      I reboot every box monthly to flush out such issues. It’s not perfect, since it won’t catch things like circular dependencies or clusters failing to start if every member is down, but it gets lots of stuff.