mdadm

Commit Graph

Author	SHA1	Message	Date
Artur Paszkiewicz	ae7d61e35e	mdmon: fix wrong array state when disk fails during mdmon startup If a member drive disappears and is set faulty by the kernel during mdmon startup, after ss->load_container() but before manage_new(), mdmon will try to readd the faulty drive to the array and start rebuilding. Metadata on the active drive is updated, but the faulty drive is not removed from the array and is left in a "blocked" state and any write request to the array will block. If the faulty drive reappears in the system e.g. after a reboot, the array will not assemble because metadata on the drives will be incompatible (at least on imsm). Fix this by adding a new option for sysfs_read(): "GET_DEVS_ALL". This is an extension for the "GET_DEVS" option and causes all member devices to be returned, even if the associated block device has been removed. Use this option in manage_new() to include the faulty device on the active_array's devices list. Mdmon will then properly remove the faulty device from the array and update the metadata to reflect the degraded state. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2019-05-20 14:14:36 -04:00
Artur Paszkiewicz	69d084784d	mdmon: don't attempt to manage new arrays when terminating When mdmon gets a SIGTERM, it stops managing arrays that are clean. If there is more that one array in the container and one of them is dirty and the clean one is still present in mdstat, mdmon will treat it as a new array and start managing it again. This leads to a cycle of remove_old() / manage_new() calls for the clean array, until the other one also becomes clean. Prevent this by not calling manage_new() if sigterm is set. Also, remove a check for sigterm in manage_new() because the condition will never be true. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2019-02-28 15:46:01 -05:00
Tomasz Majchrzak	a44c262abc	managemon: Don't add disk to the array after it has started If disk has disappeared from the system and appears again, it is added to the corresponding container as long the metadata matches and disk number is set. This code had no effect on imsm until commit `20dc76d15b` ("imsm: Set disk slot number"). Now the disk is added to container but not to the array - it is correct as the disk is out-of-sync. Rebuild should start for the disk but it doesn't. There is the same behaviour for both imsm and ddf metadata. There is no point to handle out-of-sync disk as "good member of array" so remove that part of code. There are no scenarios when monitor is already running and disk can be safely added to the array. Just write initial metadata to the disk so it's taken for rebuild. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-12-07 09:20:16 -05:00
Tomasz Majchrzak	c76242c56e	mdmon: get safe mode delay file descriptor early After switch root new mdmon is started. It sends initrd mdmon a signal to terminate. initrd mdmon receives it and switches the safe mode delay to 1 ms in order to get array to clean state and flush last version of metadata. The problem is sysfs filesystem is not available to initrd mdmon after switch root so the original safe mode delay is unchanged. The delay is set to few seconds - if there is a lot of traffic on the filesystem, initrd mdmon doesn't terminate for a long time (no clean state). There are 2 instances of mdmon. initrd mdmon flushes metadata when array goes to clean state but this metadata might be already outdated. Use file descriptor obtained on mdmon start to change safe mode delay. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-10-04 11:41:57 -04:00
Artur Paszkiewicz	2c8890e926	Don't abort starting the array if kernel does not support ppl Change the behavior of assemble and create for consistency-policy=ppl for external metadata arrays. If the kernel does not support ppl, don't abort but print a warning and start the array without ppl (consistency-policy=resync). No change for native md arrays because the kernel will not allow starting the array if it finds an unsupported feature bit in the superblock. In sysfs_add_disk() check consistency_policy in the mdinfo structure that represents the array, not the disk and read the current consistency policy from sysfs in mdmon's manage_member(). This is necessary to make sysfs_add_disk() honor the actual consistency policy and not what is in the metadata. Also remove all the places where consistency_policy is set for a disk's mdinfo - it is a property of the array, not the disk. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-10-02 16:07:04 -04:00
Tomasz Majchrzak	b13b52c80f	Get failed disk count from array state Recent commit has changed the way failed disks are counted. It breaks recovery for external metadata arrays as failed disks are not part of the array and have no corresponding entries is sysfs (they are only reported for containers) so degraded arrays show no failed disks. Recent commit overwrites GET_DEGRADED result prior to GET_STATE and it is not set again if GET_STATE has not been requested. As GET_STATE provides the same information as GET_DEGRADED, the latter is not needed anymore. Remove GET_DEGRADED option and replace it with GET_STATE option. Don't count number of failed disks looking at sysfs entries but calculate it at the end. Do it only for arrays as containers report no disks, just spares. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-06-05 11:11:36 -04:00
Jes Sorensen	b831b299e8	mdadm: Fix '==' broken formatting Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-16 14:04:22 -04:00
Jes Sorensen	fc54fe7a7e	mdadm: Fixup a large number of bad formatting of logical operators Logical oprators never belong at the beginning of a line. Signed-off-by: Jes Sorensen <jsorensen@fb.com>	2017-05-16 13:52:15 -04:00
Tomasz Majchrzak	6dc1785fdb	mdmon: bad block support for external metadata - sysfs file open Open 'badblocks' and 'unacknowledged_bad_blocks' sysfs files for each disk in the array. Add them to the list of files observed by monitor. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>	2016-11-28 17:45:56 -05:00
NeilBrown	27aefbdb3d	Don't ignore return value from read and write New gcc sometimes complains about this. Signed-off-by: NeilBrown <neilb@suse.com>	2015-07-24 16:11:23 +10:00
NeilBrown	1ade5cc15a	Consistently print program Name and __func__ in debug messages. make dprintf() print program name and __func__, so that this messaging is consistent. Also remove all __func__ messages from pr_err(). We shouldn't leak that internal data in error message. If we really want function name there, we new pr_XXX might be wanted. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-12 13:21:17 +11:00
NeilBrown	da5a36fa1f	mdmon: already read sysfs files once after opening. seq_file in the kernel will allocate a read buffer on first read. We want this to happen under the managemon thread, not the 'monitor' thread, as the latter is not allow to allocate memory (might deadlock). So do a first read after opening. Signed-off-by: NeilBrown <neilb@suse.de>	2014-09-17 15:02:18 +10:00
NeilBrown	5fe6f031d9	mdmon: allow prepare_update to report failure. If 'prepare_update' fails for some reason there is little point continuing on to 'process_update'. For now only malloc failures are caught, but other failures will be considered in future. Signed-off-by: NeilBrown <neilb@suse.de>	2014-07-10 15:54:02 +10:00
NeilBrown	cc81325634	managemon: fix a dprintk. There is not guarantee that 'inst' is a number, and even if there were there is no point converting it str->int and then int->str again. Signed-off-by: NeilBrown <neilb@suse.de>	2013-09-10 09:31:18 +10:00
NeilBrown	4e5e54cf82	mdmon: make sure we set safe_mode on SIGTERM. Without this, array may not go clean and mdmon will then not exit. A safe_mode of '0' (which is the only one that is handled differently by this patch) means "never switch to 'active_idle'". We don't want that when mdmon is stopping. Signed-off-by: NeilBrown <neilb@suse.de>	2013-09-02 12:08:44 +10:00
NeilBrown	e49a8a8026	mdmon: don't use 'ghost' values from an inactive array. It is possible for mdmon to see (in /proc/mdstat) and array in 'inactive' state, "mdadm -S" has written "inactive" to "array_state". In this state values such as "raid_disk" are not meaningful and so should be ignored by manage_member(). Reported-by: "Dorau, Lukasz" <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 15:40:16 +10:00
NeilBrown	4389b648bb	managemon: fix typo affecting incrmental assembly. This clearly should be 'st2'. As it is the 'raid_disk' value being tested is completely meaningless in the context of the new device. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 14:25:15 +10:00
mwilck@arcor.de	0c5d6054e4	mdmon: always get layout from sysfs commit `71d68ff62` uses the array layout. It needs to be initialized. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-05 11:31:56 +10:00
NeilBrown	2ef219630b	mdmon: clear safe_mode_delay on shutdown When we receive a signal, set the safemode delay to v.small so that we can ge clean arrays and exit quickly Signed-off-by: NeilBrown <neilb@suse.de>o	2013-08-01 15:54:24 +10:00
Martin Wilck	6ca1e6eccb	mdmon: manage_member: fix race condition during slow meta data writes In order to track kernel state changes, the monitor needs to notice changes in sysfs. If the changes are transient, and the monitor is busy writing meta data, it can happen that the changes are missed. This will cause the meta data to be inconsistent with the real state of the array. I can reproduce this in a test scenario with a DDF container and two subarrays, where I set a disk to "failed" and then add a global hot-spare. On a typical MD test setup with loop devices, I can reliably reproduce a failure where the metadata show degraded members although the kernel finished the recovery successfully. This patch fixes this problem by applying two changes. First, when a metadata update is queued, wait until it is certain that the monitor actually applied these meta data (the for loop is actually needed to avoid failures completely in my test case). Second, after triggering the recovery, set prev_state of the changed array to "recover", in case the monitor misses the transient "recover" state. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:00:46 +10:00
Martin Wilck	30b83120ed	mdmon: manage_member: debug messages for array state Add debug messages to watch the manager's steps. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-31 13:00:32 +10:00
NeilBrown	1011e8344a	Remove lots of unnecessary white space. Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-19 12:31:45 +10:00
NeilBrown	a88e119f6f	pr_err for mdmon. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-21 12:58:02 +10:00
Pawel Baldysiak	4edb8530e8	Add updating component_size to manager thread of mdmon Mdmon does not update component_size now. It is wrong because in case of size's expansion component_size is changed by mdadm but mdmon does not reread its new value and uses a wrong, old one. As a result the metadata is incorrect during size's expansion. It contains no information that resync is in progress (there is no checkpoint too). The metadata is as if resync has already been finished but it has not. Component_size will be set to match information in sysfs. This value will be updated by manager thread in manage_member() function. Now mdmon uses the correct, current value of component_size and the correct metadata (containing information about resync and checkpoint) is written. Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-04-08 11:32:53 +10:00
NeilBrown	4dd2df0966	Discard devnum in favour of devnm We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>	2013-02-21 17:05:23 +11:00
NeilBrown	72ca9bcff3	Allow data-offset to be specified per-device for create mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ... The size is in K unless a suffix: K M G is given. The suffix 's' means sectors. Signed-off-by: NeilBrown <neilb@suse.de>	2012-10-04 16:34:21 +10:00
NeilBrown	503975b9d5	Remove scattered checks for malloc success. malloc should never fail, and if it does it is unlikely that anything else useful can be done. Best approach is to abort and let some super-daemon restart. So define xmalloc, xcalloc, xrealloc, xstrdup which don't fail but just print a message and exit. Then use those removing all the tests for failure. Also replace all "malloc;memset" sequences with 'xcalloc'. Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-09 17:14:16 +10:00
Adam Kwolek	50927b1323	Fix: Sometimes mdmon throws core dump during reshape Problem was found during reshaping 2 volumes /raid0 and raid5/ in container. Sometimes mdmon throws core dump due to NULL pointer exception. Problem occurs in scenario: - managemon: is about spare activation (degraded raid4 volume == raid0 under takeover) - managemon: detect level change and signals monitor (manage_member() calls replace_array()) - monitor: detects transition raid4/5->raid0 and sets a->container to NULL to indicate array deactivation - managemon : continues his work and tries to activate spare (a->check_degraded is set). NULL pointer is passed to metadata handler activate_spare() Core dump is generated. To resolve this situation managemon (after monitor kick) checks again a->container pointer to learn if current array is not to be deactivated. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-02-09 12:20:52 +11:00
Jes Sorensen	c20478757d	close_aa(): Verify file descriptors are valid before trying to close them Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-11-03 08:08:23 +11:00
Jes Sorensen	3e1d79b2d6	disk_init_and_add(): Fail if opening sysfs file descriptors fail Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-11-03 08:08:09 +11:00
Lukasz Dorau	ba71445069	FIX: Mdmon crashes after changing RAID level from 1 to 0 Description of the bug: Sometimes mdmon crashes after changing RAID level from 1 to 0 (takeover). Cause of the bug: The managemon marks an active_array for removal from monitoring by assigning a->container to NULL value (in the "manage_member" function). Sometimes (during stress test) it happens right when the monitor is in the "read_and_act" function and a->container pointer is in use. This causes the monitor crashes. Solution: The active array has to be marked for removal in another way than setting NULL pointer when it can be in use. A new field "to_remove" was added to the "active_array" structure. It is used in the managemon to mark a container to remove (instead of the old assigment: a->container = NULL) and monitor checks it to determine if the array should be removed. The field "to_remove" should be checked in some other places to avoid managing of the array which is going to be removed. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-09-06 15:19:58 +10:00
Dan Williams	1d446d52a7	mdmon: fix, close spare activation race The following test fails when the md_check_recovery() event triggered by the ro->rw transition causes remove_and_add_spares() to run while mdmon is attempting spare activation. Result is that the kernel races to set the slot immediately after sysfs_add_disk() writes new_dev. mdmon thinks the spare activation failed and declines to send the monitor a new acitve_array. We show degraded after the wait because the monitor cannot notify the metadata that all disks are in_sync. #!/bin/bash i=0 false while [ $? == 1 ] do i=$((i+1)) mdadm -Ss mdadm -CR /dev/md0 /dev/loop[0-2] -n 3 -e imsm mdadm -CR /dev/md1 /dev/loop[01] missing -n 3 -l 5 mdadm --wait /dev/md1 mdadm -E /dev/loop2 \| grep -i degraded done echo "failed: $i" Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-08-30 10:49:42 +10:00
Adam Kwolek	84f3857fec	FIX: After discarding array give chance monitor to remove it When raid0 expansion occurs, takeover operation is used. After backward takeover monitor remains in memory. This happens due to remaining just removed active array in mdmon structures. If there is no other monitored arrays, mdmon has to finish his work. Problem was introduced in patch (2011.03.22): mdmon: Stop keeping track of RAID0 (and LINEAR) arrays. Prior to this patch mdmon kicking occurs via replace_array() where wakeup_monitor() was called. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-04-05 09:24:17 +10:00
NeilBrown	7023e0b8ae	mdmon: Stop keeping track of RAID0 (and LINEAR) arrays. Tracking RAID0 arrays doesn't really work. There is no need, and there are some sysfs files which won't exist when the array appears and then won't be opened when the level is changed. So simply ignore RAID0 and LINEAR arrays - don't add them when they appear and if an array we are monitoring turns into one of these, discard it promptly. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-22 17:23:17 +11:00
NeilBrown	4e2c1a9a32	mdmon: allow manage_member to cope with ->container becoming NULL. As monitor() can set ->container to NULL, we need to be careful about dereferencing it. So take a copy in manage_member, return if it is NULL, and only use the copy. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-22 14:52:37 +11:00
NeilBrown	88b496c269	Merge branch 'master' into devel-3.2 Conflicts: Manage.c managemon.c super-ddf.c super-intel.c	2011-03-15 15:35:04 +11:00
NeilBrown	b0edee6efb	ddf: implement remove_from_super This is needed to remove devices from mdmon's knowledge when the device is removed from the md container. Now that ddf have a remove_from_super we don't need the code that allows some personalities not to implement this. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 15:10:32 +11:00
Labun, Marcin	258c3e921b	IMSM: Fix problem in mdmon monitor of using removed disk in imsm container. Manager thread shall pass the information to monitor thread (mdmon) that some devices are removed from container. Otherwise, monitor (mdmon) might use such devices (spares) to rebuild the array that has gone degraded. This problem happens for imsm containers, since a list of the container disks is maintained in intel_super structure. When array goes degraded, the list is searched to find a spare disks to start rebuild. Without this fix the rebuild could be stared on the spare device that was a member of the container, but has been removed from it. New super type function handler has been introduced to prepare metadata format specific information about removed devices. int (remove_from_super)(struct supertype st, mdu_disk_info_t *dinfo) The message prepared in remove_from_super is later processed by process_update handler in monitor thread. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 15:09:31 +11:00
NeilBrown	0c4f6e378b	managemon: Don't do spare assignment while any updates are pending. Spare assignment requires full knowledge of array state. A pending update might modify that state (such as a pending spare assignment) so don't try while there are updates pending. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-15 14:51:12 +11:00
NeilBrown	4dd968cc54	ddf: implement remove_from_super This is needed to remove devices from mdmon's knowledge when the device is removed from the md container. Now that ddf have a remove_from_super we don't need the code that allows some personalities not to implement this. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:56:16 +11:00
Adam Kwolek	0d51bfa20e	FIX: Last_checkpoint has to be initialized in per disk units last_checkpoint is variable that tracks sync_complete sysfs entry. sync_complete is per disk counter, so initializing during starting from checkpoint has to have this in mind and convert reshape position properly. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:17:52 +11:00
Adam Kwolek	138477db4b	FIX: Last checkpoint is not initialized on reshape restart When reshape is restarted and active array in mdmon is being initialized, mdmon has to know last checkpoint, otherwise reshape will be restarted form '0' position. mdadm when reshaped array is assembled stores reshape_position in sysfs and runs mdmon. Initialize last_checkpoint in active array structure to value present in sysfs for reshaped array start. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:12:57 +11:00
NeilBrown	515afde355	mdmon: don't copy an invalid chunk_size As chunk_size in mdstat_ent is never set, we shouldn't copy it into a->info.array. In fact, it is safest to get rid of the field altogether. Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:21:03 +11:00
Adam Kwolek	02eedb57aa	imsm: FIX: array size is wrong Calculation of size is almost ok, except concept of blocks. Size for setting in md has to be divided by 2 to be correct. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-02-03 17:40:18 +11:00
NeilBrown	f54a6742b2	managemon: don't try to add spares when resync/recovery is happening. kernel should reject this anyway, and we really should not be trying as it can only lead to confusion. Signed-off-by: NeilBrown <neilb@suse.de>	2011-02-01 14:44:02 +11:00
Adam Kwolek	57f8c76946	Detect level change For level migration support it is necessary to allow mdmon to react for level changes. It has to have ability to change configuration of active array, and for array level change to raid0 finish array monitoring. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-01-06 19:17:29 +11:00
NeilBrown	aad6f216a1	Handle checkpointing during reshape We need to allow metadata to handle progress of reshape, completion, and abort-before-start. Include all those in ->set_array_state() Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 15:48:05 +11:00
NeilBrown	cb23f1f4c3	Allow a metadata update to have a linked list of allocated spaces. Sometimes one metadata update will require allocating several larger data structures. As 'monitor' cannot allocate, 'manager' must, so it must be able to attach a list of allocates to the update, and importantly it must be able to easily free them. So add a 'space_list' element to metadata updates where each element on the list starts with a pointer to the next. Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 12:10:01 +11:00
NeilBrown	0f99b4bd73	mdmon: when a reshape is detected, add any newly added devices to the array. When mdadm starts a reshape, it might add some devices to the array first. mdmon needs to notice the reshape starting and check for any new devices. If there are any they need to be provided to be monitored. Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 09:07:52 +11:00
Labun, Marcin	1a64be565b	IMSM: Fix problem in mdmon monitor of using removed disk in imsm container. Manager thread shall pass the information to monitor thread (mdmon) that some devices are removed from container. Otherwise, monitor (mdmon) might use such devices (spares) to rebuild the array that has gone degraded. This problem happens for imsm containers, since a list of the container disks is maintained in intel_super structure. When array goes degraded, the list is searched to find a spare disks to start rebuild. Without this fix the rebuild could be stared on the spare device that was a member of the container, but has been removed from it. New super type function handler has been introduced to prepare metadata format specific information about removed devices. int (remove_from_super)(struct supertype st, mdu_disk_info_t *dinfo) The message prepared in remove_from_super is later processed by process_update handler in monitor thread. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-15 15:51:51 +11:00

1 2 3

117 Commits