mdmon: fix, close spare activation race

The following test fails when the md_check_recovery() event triggered by
the ro->rw transition causes remove_and_add_spares() to run while mdmon
is attempting spare activation.

Result is that the kernel races to set the slot immediately after
sysfs_add_disk() writes new_dev.  mdmon thinks the spare activation
failed and declines to send the monitor a new acitve_array.  We show
degraded after the wait because the monitor cannot notify the metadata
that all disks are in_sync.

#!/bin/bash
i=0
false
while [ $? == 1 ]
do
	i=$((i+1))
	mdadm -Ss
	mdadm -CR /dev/md0 /dev/loop[0-2] -n 3 -e imsm
	mdadm -CR /dev/md1 /dev/loop[01] missing -n 3 -l 5
	mdadm --wait /dev/md1
	mdadm -E /dev/loop2 | grep -i degraded
done
echo "failed: $i"

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This commit is contained in:
Dan Williams 2011-08-25 19:14:29 -07:00 committed by NeilBrown
parent b276dd33c7
commit 1d446d52a7
1 changed files with 4 additions and 1 deletions

View File

@ -498,7 +498,10 @@ static void manage_member(struct mdstat_ent *mdstat,
newa = duplicate_aa(a);
if (!newa)
goto out;
/* Cool, we can add a device or several. */
/* prevent the kernel from activating the disk(s) before we
* finish adding them
*/
sysfs_set_str(&a->info, NULL, "sync_action", "frozen");
/* Add device to array and set offset/size/slot.
* and open files for each newdev */