A few tangential things I needed in the course of building the cluster
Resizing filesystems and volumes #
Originally, kubernasty nodes were created in progfiguration with the volume group taking up 100% of the available space on the block device. With the need for minio or ceph to have a raw block device of its own, we needed to change this.
- My disks are like 238GB disks
- The new datadisk role in progfiguration will fill up just 50% for new k3s nodes, but it won’t go back and fix existing nodes. I need to do that myself.
- Going to just eyeball 50%; the exact amount doesn’t matter.
- Stop k3s, unmount /psyopsos-data; see cluster destruction.
- You may need to use
lsof +D /psyopsos-data
to find all relevant processes
Then:
umount /psyopsos-data
# resize2fs requires an e2fsck before each run
e2fsck -f /dev/mapper/psyopsos_datadiskvg-datadisklv
# Size it to BELOW what we think the final total will be
resize2fs /dev/mapper/psyopsos_datadiskvg-datadisklv 100G
# This resizes the underlying volume
lvresize --size 120g /dev/psyopsos_datadiskvg/datadisklv
# resize2fs requires an e2fsck before each run
e2fsck -f /dev/mapper/psyopsos_datadiskvg-datadisklv
# Without a size, it fills to the max of the (newly shrunk) volume
resize2fs /dev/mapper/psyopsos_datadiskvg-datadisklv
Replacing a Kubernetes node #
For proactive replacments, where the node is still online:
- Drain it, like
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
- Delete it, like
kubectl delete node <node name>
- You may wish to also follow the instructions in Cluster destruction to kill various process that hang around, but I don’t believe this is necessary unless you want to re-add the node without rebooting.
- Shut down the node
For reactive replacment, where the node hardware fails,
you have to just delete it with kubectl delelete node <node name>
.
Then add the new node:
- Boot the node
- Follow adding secondary node instructions in Cluster creation.
- Longhorn should automatically replicate data to the new node, DaemonSets should deploy automatically, etc.
- TODO: Can Longhorn automatically remove old nodes? The old node object sticks around as failed, even if you create a new node with the same name as the old one. Actually, it looks like it cleans them up eventually, but it seems to take hours/days.