mdadm

Commit Graph

Author	SHA1	Message	Date
Adam Kwolek	02eedb57aa	imsm: FIX: array size is wrong Calculation of size is almost ok, except concept of blocks. Size for setting in md has to be divided by 2 to be correct. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-02-03 17:40:18 +11:00
NeilBrown	f54a6742b2	managemon: don't try to add spares when resync/recovery is happening. kernel should reject this anyway, and we really should not be trying as it can only lead to confusion. Signed-off-by: NeilBrown <neilb@suse.de>	2011-02-01 14:44:02 +11:00
Adam Kwolek	57f8c76946	Detect level change For level migration support it is necessary to allow mdmon to react for level changes. It has to have ability to change configuration of active array, and for array level change to raid0 finish array monitoring. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-01-06 19:17:29 +11:00
NeilBrown	aad6f216a1	Handle checkpointing during reshape We need to allow metadata to handle progress of reshape, completion, and abort-before-start. Include all those in ->set_array_state() Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 15:48:05 +11:00
NeilBrown	cb23f1f4c3	Allow a metadata update to have a linked list of allocated spaces. Sometimes one metadata update will require allocating several larger data structures. As 'monitor' cannot allocate, 'manager' must, so it must be able to attach a list of allocates to the update, and importantly it must be able to easily free them. So add a 'space_list' element to metadata updates where each element on the list starts with a pointer to the next. Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 12:10:01 +11:00
NeilBrown	0f99b4bd73	mdmon: when a reshape is detected, add any newly added devices to the array. When mdadm starts a reshape, it might add some devices to the array first. mdmon needs to notice the reshape starting and check for any new devices. If there are any they need to be provided to be monitored. Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 09:07:52 +11:00
Labun, Marcin	1a64be565b	IMSM: Fix problem in mdmon monitor of using removed disk in imsm container. Manager thread shall pass the information to monitor thread (mdmon) that some devices are removed from container. Otherwise, monitor (mdmon) might use such devices (spares) to rebuild the array that has gone degraded. This problem happens for imsm containers, since a list of the container disks is maintained in intel_super structure. When array goes degraded, the list is searched to find a spare disks to start rebuild. Without this fix the rebuild could be stared on the spare device that was a member of the container, but has been removed from it. New super type function handler has been introduced to prepare metadata format specific information about removed devices. int (remove_from_super)(struct supertype st, mdu_disk_info_t *dinfo) The message prepared in remove_from_super is later processed by process_update handler in monitor thread. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-15 15:51:51 +11:00
Adam Kwolek	a9d868c3a2	FIX: sync_completed_fd handler has to be closed sync_completed_fd handler has to be closed when array is closing. This is in pair to open handler code. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-03 15:07:04 +11:00
NeilBrown	ab2bb0b621	mdmon: don't copy an invalid chunk_size As chunk_size in mdstat_ent is never set, we shouldn't copy it into a->info.array. In fact, it is safest to get rid of the field altogether. Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2010-11-30 18:35:36 +11:00
Dan Williams	bc77ed535d	block monitor: freeze spare assignment for external arrays In order to support reshape and atomic removal of spares from containers we need to prevent mdmon from activating spares. In the reshape case we additionally need to freeze sync_action while the reshape transaction is initiated with the kernel and recorded in the metadata. When reshaping a raid0 array we need to freeze the array before it is transitioned to a redundant raid level. Since sync_action does not exist at this point we extend the '-' prefix of a subarray string to flag mdmon not to activate spares. Mdadm needs to be reasonably certain that the version of mdmon in the system honors this 'freeze' indication. If mdmon is not already active then we assume the version that gets started is the same as the mdadm version. Otherwise, we check the version of mdmon as returned by the extended ping_monitor() operation. This is to catch cases where mdadm is upgraded in the filesystem, but mdmon started in the initramfs is from a previous release. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2010-11-23 15:00:54 +11:00
Dan Williams	e5408a3202	Provide a mdstat_ent to subarray helper ...before introducing another open coded instace of this conversion. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2010-11-23 14:44:23 +11:00
NeilBrown	a5d85af748	get_info_super: report which other devices are thought to be working/failed. To accurately detect when an array has been split and is now being recombined, we need to track which other devices each thinks is working. We should never include a device in an array if it thinks that the primary device has failed. This patch just allows get_info_super to return a list of devices and whether they are thought to be working or not. Signed-off-by: NeilBrown <neilb@suse.de>	2010-11-22 19:35:25 +11:00
Dan Williams	d19e3cfb66	Merge branch 'fixes' into for-neil	2010-07-01 17:36:11 -07:00
Dan Williams	b526e52dc7	Always assume SKIP_GONE_DEVS behaviour and kill the flag ...i.e. GET_DEVS == (GET_DEVS\|SKIP_GONE_DEVS) A null pointer dereference in Incremental.c can be triggered by replugging a disk while the old name is in use. When mdadm -I is called on the new disk we fail the call to sysfs_read(). I audited all the locations that use GET_DEVS and it appears they can tolerate missing a drive. So just make SKIP_GONE_DEVS the default behaviour. Also fix up remaining unchecked usages of the sysfs_read() return value. Reported-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2010-06-16 17:26:04 -07:00
Dan Williams	484240d8a3	mdmon: periodically checkpoint recovery The kernel updates and notifies md/sync_completed when it is time to take a checkpoint. When this occurs (at 1/16 array size intervals) write 'idle' to md/sync_action to have the current recovery position updated in recovery_start and resync_start. Requires the metadata handler to reset ->last_checkpoint when it has determined that recovery has ended. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2010-05-14 17:42:49 -07:00
Dan Williams	63b4aae33e	mdmon: fix missing open of md/<dev>/recovery_start When activating a spare we neglect to open recovery_start and as such do not see checkpoint events. Move disk initialization to common routine to mitigate recurrence. Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2010-04-29 10:50:29 -07:00
NeilBrown	fa716c83c5	mdmon: insist on creating .pid file at startup. Now that we don't "mdadm --takeover" until /var/run is writable there is no need to continually try to create files in there. So only create these files at startup and fail if they cannot be made. This means that to start an array with externally managed metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be writable. To 'takeover' from a previous mdmon instance, /var/run must be writable. This means we don't need to worry about SIGHUP (which was once used to tell us it was time to create .pid) and SIGALRM. Signed-off-by: NeilBrown <neilb@suse.de>	2010-02-08 17:26:18 +11:00
NeilBrown	58a4ba2a6b	mdmon: don't monitor /proc/mounts to decide when to create .pid file. Monitoring /proc/mounts and creating a .pid file as soon as /var/run is writable is racy. Most distros clean all non-directories from /var/run early in boot and if mdmon races with this it could lose the files as soon as they are created. Instead require that "mdmon --takeover" be run after /var is writable. Signed-off-by: NeilBrown <neilb@suse.de>	2010-02-08 17:26:18 +11:00
NeilBrown	5d4d1b26d3	mdmon: allow pid to be stored in different directory. /var/run probably doesn't persist from early boot. So if necessary, store in in /lib/init/rw or somewhere else that does persist. Signed-off-by: NeilBrown <neilb@suse.de>	2010-02-04 16:47:28 +11:00
NeilBrown	688a1e5b07	mdmon: don't mkdir /var/run Creating /var/run in mdmon is really not justifiable. If /var/run doesn't exist, then it is either deliberate and it should be left that way to make sure the mapfile gets created in /dev, or it is a configuration error and not our problem to fix. Signed-off-by: NeilBrown <neilb@suse.de>	2010-02-04 16:37:20 +11:00
Dan Williams	2904b26f05	Support external metadata recovery-resume Minimal changes needed to permit reassembling partially recovered external metadata arrays. The biggest logical change is that ->container_content() can now surface partially rebuilt members rather than omitting them from the disk list. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-21 12:51:57 -07:00
Dan Williams	d23534e464	Teach sysfs_add_disk() callers to use ->recovery_start versus 'insync' parameter Also fixup 'in_sync' versus 'insync' typo. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-21 11:26:21 -07:00
Dan Williams	e1516be1db	Add scaffolding for handling md/dev-XXX/recovery_start Prepare the code to handle saving a recovery checkpoint. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-21 10:06:14 -07:00
Dan Williams	b7941fd68d	mdmon: cleanup resync_start We don't need to sprinkle reads of this attribute all over the place, just once at the entry of read_and_act(). Also, the mdinfo structure for the array already has a 'resync_start' member, so just reuse that. Finally, rename get_resync_start() to read_resync_start to make it consistent with the other sysfs accessors in monitor.c. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-14 12:57:55 -07:00
Dan Williams	071cfc4258	mdmon: cleanup manage_member() leak free() the results of activate_spare(). Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-12 14:10:01 -07:00
Dan Williams	96a8270d46	mdmon: avoid writes in the startup path for mdmon on root arrays When killing a previous monitor be careful not to cause writes to the filesystem until the reads necessary to get the monitor operational have completed. The code is already prepared for errors creating the pid and socket files, so simply defer creation of these files until after the first call to manage(). Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-10-13 17:41:57 -07:00
NeilBrown	e736b62389	Update copyright dates and remove references to @cse.unsw.edu.au Also removed 'paper' addresses. Signed-off-by: NeilBrown <neilb@suse.de>	2009-06-02 14:35:45 +10:00
NeilBrown	462906cdee	incremental_container: preserve 'in_sync' flag when adding to existing array. When building container members with -IR, we need to ensure that devices added to an active array preserve the 'in_sync' status so they don't needlessly get rebuilt. So allow sysfs_add_disk to do this (only works in kernels since 2.6.30) and pass the relevant flag down. Signed-off-by: NeilBrown <neilb@suse.de>	2009-04-14 10:19:02 +10:00
NeilBrown	661dce3617	mdmon: allow incremental assembly of containers. If mdmon sees a device added to a container, it should assume it is a new spare. It could be a part of the array that just hadn't been assembled yet. So check first. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-10 16:28:22 +11:00
Dan Williams	04a8ac089c	mdmon: record added disks Prevent duplicate disks from being sent to the monitor thread. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-02-24 18:45:57 -07:00
Dan Williams	7da80e6faa	mdmon: fix removed disk handling Use SKIP_GONE_DEVS when reading the container, and correct some confused logic in manage_new(). Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-02-24 18:45:57 -07:00
Dan Williams	a54d52625a	update copyright headers Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-28 10:55:29 -07:00
Dan Williams	883a6142e6	mdmon: wait after trying to kill Now that mdmon handles sigterm if another monitor wants to take over it should wait until all managed arrays are clean. So make WaitClean() available to mdmon and teach try_kill_monitor() to wait on each subarray in the container. ...since we may be communicating with a dieing process, we need to block SIGPIPE earlier. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-15 14:43:57 -07:00
Dan Williams	6144ed4414	mdmon: terminate clean We generally don't want mdmon to be terminated, but if a SIGTERM gets through try to leave the monitored arrays in a clean state, block attempts to mark the array dirty, and stop servicing the socket. When we are killed by sigterm don't remove the pidfile let that be cleaned up by the next monitor. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-15 14:43:57 -07:00
Dan Williams	695154b2e7	mdmon: periodically retry to create the socket If initial socket creation fails, EROFS, set a periodic alarm to wake up the manager and retry. Include a kernel patch that will wake us up if the mount flags are changed. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-15 14:15:52 -07:00
NeilBrown	7801ac2092	Factor out add-disk code The variety of approaches to 'add_disk' are factored out into a separate function, and Incremental mode benefits by being closer to supporting the assembly of containers. Also remove the adding-to-array-data-structure out of sysfs_add_disk and into add_disk. And add some tests for --incremental mode to make sure we don't break it. Signed-off-by: NeilBrown <neilb@suse.de>	2008-09-18 15:13:32 +10:00
Dan Williams	295646b3d5	mdmon: recreate socket/pid file on SIGHUP Allow mdmon to start while /var/run/mdadm is readonly. Later a SIGHUP can trigger mdmon to drop its pid and socket once /var/run/mdadm is writable. Of course one needs the pid to send a HUP, that can be stored in a distribution specific rw-init directory... For now, rely on a killall -HUP mdmon to get the files dumped. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-09-15 20:58:43 -07:00
Dan Williams	313a4a82f1	ping_manager() to prevent 'add' before 'remove' completes It is currently possible to remove a device and re-add it without the manager noticing, i.e. without detecting a mdstat->devcnt container->devcnt mismatch. Introduce ping_manager() to arrange for mdmon to run manage_container() prior to mdadm dropping the exclusive open() on the container. Despite these precautions sysfs_read() may still fail. If this happens invalidate container->devcnt to ensure manage_container() runs at the next event. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-09-15 20:58:43 -07:00
Dan Williams	93f7cacab3	mdmon: resume rebuild If we started a degraded array that was previously rebuilding we may have enough information to resume the rebuild without a trip through the monitor. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-09-15 20:58:43 -07:00
NeilBrown	e9dd159873	Allow an externally managed array to be marked readonly If the metadata_version is -mdXXX/whatever rather than /mdXXX/whatever then the array is readonly and should be left alone by mdmon. Signed-off-by: NeilBrown <neilb@suse.de>	2008-08-19 17:55:15 +10:00
NeilBrown	3c558363a1	Factor out test for subarray version string. We are about to change the syntax of the version string for 'subarray's. So factor out the test into a single function. Signed-off-by: NeilBrown <neilb@suse.de>	2008-08-19 17:55:15 +10:00
Dan Williams	43dad3d6fb	mdadm: add device to a container Adding a device updates the container and then mdmon takes action upon noticing a change in devices. This reuses the container version of add_to_super to create a new record for the device. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2008-08-19 17:19:51 +10:00
Dan Williams	7bc1962f8c	mdmon: remove devices from container Once the monitor thread has kicked a drive from all managed arrays mdadm -r is permitted. We are guaranteed that the drive is marked failed at this point, so allow the drive to be re-added as a spare. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-08-19 14:55:12 +10:00
NeilBrown	3c00ffbe98	Make metadata updates from manage to monitor 'synchronous' A metadata update may modify the data structure of the metadata including freeing things, so it is not safe of the manager to touch the metadata while an update is pending in the monitor. So When an update has been submitted, don't do anything else in the manager until it is complete. Signed-off-by: NeilBrown <neilb@suse.de>	2008-08-19 14:55:03 +10:00
Dan Williams	f1d267661d	mdmon: allow degraded arrays to be monitored manage_new is too strict in the face of failed devices. Teach it to monitor degraded arrays. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-08-15 10:58:43 -07:00
Dan Williams	63d7cc784b	mdmon: use 'recover' instead of 'repair' when activating a spare Repair sets MD_RECOVERY_REQUESTED in md which may not result in the spare device being recovered. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-08-07 11:54:09 -07:00
Dan Williams	836759d561	mdmon: ignore inactive arrays and other manage_new() cleanups While mdadm is constructing an array mdmon may see an intermediate state (some disks not yet added / redundancy attributes like sync_action not available). Waiting for mdstat->active == true ensures that the array is ready to be handled. This fixes a bug in create array via mdmon update whereby failures are not detected in the new array. Introduce aa_ready() to catch cases where the active_array is not correctly initialized. Barring a kernel bug this should never trigger, nonetheless it precludes a class of bugs like the one mentioned above from triggering. Cleanup the exit paths and only call replace_array when the new array is ready to be inserted into container->arrays. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-08-04 16:48:27 -07:00
NeilBrown	1eb252b848	mdmon: ping will wait for manage_mon to catch up. When a 'ping' (empty message) is sent to mdmon, we wait for 'monitor' to do a full loop to make sure it has caught up with anything that needs doing. This allows synchronisation between mdadm and mdmon. Maybe monitor should signal managemon rather than managemon polling... Signed-off-by: Neil Brown <neilb@suse.de>	2008-07-18 16:37:06 +10:00
Neil Brown	103f2410ec	Make sure resync_start is initialised properly and maintained properly Signed-off-by: Neil Brown <neilb@suse.de>	2008-07-18 16:37:04 +10:00
Dan Williams	272bcc48d1	mdmon: initialize component_size in manage_new When we go to activate a spare for an array we expect ->info.component_size is valid. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-07-14 14:59:39 -07:00

1 2

74 Commits