Ceph BlueFS spilling and blocks.db resizing
I recently stumbled upon the following warning on my production Ceph cluster for an OSD on one of our machines:
root@ceph-8:~# ceph health detail
HEALTH_WARN 1 OSD(s) experiencing BlueFS spillover; 1 stray daemon(s) not managed by cephadm
[WRN] BLUEFS_SPILLOVER: 1 OSD(s) experiencing BlueFS spillover
osd.211 spilled over 282 MiB metadata from 'db' device (7.0 GiB used of 8.7 GiB) to slow device
I didn’t find much info about it, and it was mainly when people did play with limits around the blocks.db
size but I follow sizing recommendations and provision them at a bit more than 4%. It should be more than enough since we’re only using RBD
for now, so I was a bit surprised to get this warning.
Upon a closer look, I found that the blocks.db
volumes were severly undersized (0.4% instead of 4%) thanks to a typo in the variable used by the ansible playbook that created the live volumes.
Since it is just a typo and the SSD drives were large enough to hold the 4%, I could just resize the blocks.db
live volume. I wasn’t sure I could do that without needing to reprovision the host entirely, but I still tried it seeing the other option would be recreating the host from scratch.
So I shut down the OSD, use ceph-volume lvm list
to find the corresponding blocks.db
live volume, then extend it:
lvextend vg-ceph-db-0/ceph-db-2 -L +75528M
systemctl restart ceph-$cluster-id@osd.211.service
The restart failed, but the daemon booted normally. To get the warning to go away, I just needed to “compact” the OSD: ceph tell osd.211 compact
.
As I got more OSDs to fix, I had a lot of occurences where compacting the OSD wasn’t sufficient to fix the spilling problem. I finally found this message on the ceph-users where the solution is explained with ceph-bluestore-tool
. I did try it before but it wouldn’t work without the cephadm shell
, I suppose because of versions mismatch. Since the lvextend
was already done for every OSDs, I wrote the following script to fix every OSDs still spilling:
#!/usr/bin/env bash
cid="$cluster_id"
osds="209 212 214 216 218 219 221 222 223 224 226 227"
for osd in $osds
do
systemctl stop ceph-$cid@osd.$osd.service
sleep 2
cephadm shell --fsid $cid --name osd.${osd} -- ceph-bluestore-tool \
bluefs-bdev-migrate \
--path /var/lib/ceph/osd/ceph-${osd} \
--devs-source /var/lib/ceph/osd/ceph-${osd}/block \
--dev-target /var/lib/ceph/osd/ceph-${osd}/block.db
sleep 2
systemctl start ceph-$cid@osd.$osd.service
done
After executing that on the two hosts (with the correct OSD IDs) impacted, all the warnings were gone!