Store kernel dumps using kdump on remote machines

If the Linux kernel crashes it is possible to create a memory image (also called vmcore) using a mechanism called kdump. Especially, this is useful if you plan to engage support to fix the reason for the issue. By default, the vmcore is stored under /var/crash. But if the kernel is not able to access the storage anymore (e.g. because of faulty storage or HBA drivers) the memory image can't be stored. In this case it would be a benefit to store the dump on another host in the network. Fortunately kdump is able to copy these information using SSH and SCP on other systems.

Recently I had this rare use-case and created a virtual machines for storing kernel dumps.

In my case the kernel dumps should be stored on a dedicated LVM storage. To address this, I created a dedicated LVM volume group and logical volume. To have this storage persistent, it is inserted in the file /etc/fstab. Missing SELinux flags are recovered:

 1# pvcreate /dev/sdb
 2# vgcreate vg_data /dev/sdb
 3# lvcreate --extents 100%FREE vg_data --name lv_crash
 4# mkfs.ext4 /dev/mapper/vg_data-lv_crash
 5# vi /etc/fstab
 6...
 7/dev/mapper/vg_data-lv_crash    /var/crash      ext4    defaults        1 2
 8
 9ESC ZZ
10
11# mount -a
12# restorecon -v /var/crash

It is recommended to have a dedicated user for the SSH communication between registered systems and the kernel dump host:

1# useradd --comment "Kernel Dump-User" kdump
2# passwd kdump

If password aging is enabled, it is a common procedure to disable it for the user created previously:

1# chage -M 99999 kdump

To grant write access to the directory /var/crash to the new user it is a good idea to enable Access Control Lists (ACL) and create an appropriate rule:

1# tune2fs -o acl /dev/mapper/vg_data-lv_crash
2# mount -o remount,acl /var/crash
3# setfacl -R -m u:kdump:rwx /var/crash

On the other systems the kdump configuration is altered. The path line is disabled and another line inserted:

1# chkconfig kdump on
2# cp /etc/kdump.conf /etc/kdump.conf.initial
3# vi /etc/kdump.conf
4...
5#path /var/crash
6net kdump@mymachine.localdomain.loc
7
8ESC ZZ

Afterwards the password of the kdump user is entered per system. In case of a kernel dump this user is chosen for establishing a connection to the kernel dump server and copying the files:

1# service kdump propagate
2Using existing keys...
3The authenticity of host 'mymachine.localdomain.loc (xxx.xxx.xxx.xxx)' can't be established.
4RSA key fingerprint is xxx.
5Are you sure you want to continue connecting (yes/no)? yes
6kdump@mymachine.localdomain.loc's password:
7/root/.ssh/kdump_id_rsa has been added to ~kdump/.ssh/authorized_keys on mymachine.localdomain.loc

Finally, it is recommended to restart the kdump service and check whether the configuration is valid:

1# service kdump restart
2# service kdump status
3Kdump is operational

Differing outputs might indicate an error. This test does not check whether the user provided can be used for establishing a SSH connection to the kernel dump server. So, this functionality needs to be checked and monitored using another mechanism - e.g. using a monitoring system's checks.

It is also possible that the following output is displayed:

1Warning: There might not be enough space to save a vmcore.
2         The size of kdump@mymachine.localdomain.loc:/var/crash/tmp.XSGZ0jMgsd should be greater than 132159368 kilo bytes.
3
4In this case the file system on the kernel dump host is too small to store a full memory image. In the worst case a memory image is as big as the memory (_in this case 128 GB_).

Translations: