Work on Home Server
So today I was working on my home server, and I accidentally discovered from monitoring that some services weren’t running. Since my home Kubernetes server is more for playing around, but also hosts some public things, I get messages from Nagios once every 24 hours, so it doesn’t disturb me too much, but at the same time makes me aware of the need to address issues.
When I looked at the server, I found that some PODs had crashed. I went through it a bit, but I had no idea why, and restarting didn’t help. So I chatted with ChatGPT for a while, and we concluded that the PODs were being killed by the OOM killer - simply put, there’s not enough memory on the server, even though I have 20GB of RAM. However, I realized that memory problems appeared after I started working with Istio, so I turned it off along with all the injectors, which freed up a lot of memory. After restarting the PODs, it still wasn’t booting up, so when I examined it more closely, I realized that the OOM killer had also killed the NFS server where the files for the PODs were stored. After restarting, I managed to restore everything.
When I think about it, I’ll need to monitor the server memory in Nagios as well, and ideally add it to LOKI to scan system logs and look for “Out of Memory” and possibly send alerts to email or Slack.
Of course, I don’t have much time for this now (it’s a hobby project), plus yesterday I spent several hours dealing with a nest of sockets and cables under the computer, which still isn’t completely resolved. I mounted one socket to the desk with screws. Then I thought a bit about what extension cords are in the extension cord, but I’m getting to a longer-term problem, which is rewiring the electricity in the entire apartment.
Sometimes I wish time could be stretched. On top of that, I have a project running at work, a kind of fix in Azure, which must be completed within a month, otherwise we have a really big problem.
Hopefully next time I’ll write more positive news.