mdadm

Commit Graph

Author	SHA1	Message	Date
Krzysztof Wojcik	3a5d04735b	FIX: imsm: Rebuild does not start on second failed disk Problem: If we have an array with two failed disks and the array is in degraded state (now it is possible only for raid10 with 2 degraded mirrors) and we have two spare devices in the container, recovery process should be triggered on booth failed disks. It does not. Recovery is triggered only for first failed disk. Second failed disk remains unchanged although the spare drive exists in the container and is ready to recovery. Root cause: mdmon does not check if the array is degraded after recovery of first drive is completed. Resolution: Check if current number of disks in the array equals target number of disks. If not, trigger degradation check and then recovery process. Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-24 10:10:56 +11:00
NeilBrown	7023e0b8ae	mdmon: Stop keeping track of RAID0 (and LINEAR) arrays. Tracking RAID0 arrays doesn't really work. There is no need, and there are some sysfs files which won't exist when the array appears and then won't be opened when the level is changed. So simply ignore RAID0 and LINEAR arrays - don't add them when they appear and if an array we are monitoring turns into one of these, discard it promptly. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-22 17:23:17 +11:00
NeilBrown	d998b738f5	mdmon: don't wait for O_EXCL when shutting down. If mdmon is shutting down because there are no devices left to look at, then don't wait 5 seconds for an O_EXCL open, and that can block progress of --grow. Only wait for O_EXCL if we received a signal. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-22 16:10:22 +11:00
NeilBrown	de6a92199e	Merge branch 'master' into devel-3.2	2011-03-14 18:49:57 +11:00
NeilBrown	e40512fddb	monitor: close recovery_fd when closing state_Fd These should be open or closed together. Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-14 18:24:01 +11:00
Krzysztof Wojcik	3275e05ec1	FIX: Reset disk state if disk is missing If we can't read actual disk state, it shoud be initiated to 0. Overwise it may be out of date value resulting false action later in code (e.g. set disk to improper state). Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-03-10 17:07:04 +11:00
Adam Kwolek	6d4225a131	FIX: Last checkpoint is not set When reshape is finished monitor has to set last checkpoint to the array end to allow metatdata for reshape finalization. Metadata has to know if reshape is finished or it is broken On reshape finish metadata finalization is required. When reshape is broken, metadata must remain as is to allow for reshape restart from checkpoint. This can be resolved based on reshape_position sysfs entry. When it is equal to 'none', it means that md finishes work. In such situation move checkpoint to the end of array. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-02-03 17:34:27 +11:00
Krzysztof Wojcik	10ce18083d	FIX: Reset disk state if disk is missing If we can't read actual disk state, it shoud be initiated to 0. Overwise it may be out of date value resulting false action later in code (e.g. set disk to improper state). Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-01-26 10:47:52 +10:00
Adam Kwolek	2a9f840972	FIX: sync_completed == 0 causes reshape cancellation in metadata md signals reshape completion (whole area or parts) by setting sync_completed to 0. This causes in set_array_state() to rollback metadata changes (super-intel.c:4977. To avoid this do not allow for set last_checkpoint to 0 if reshape is finished. This was also root cause of my previous fix for finalization reshape that I agreed earlier is not necessary, Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-01-17 12:44:52 +11:00
Adam Kwolek	4867e068e3	Raid0: detect reshape on array start When raid0 array is takeovered to raid4 for reshape it should be possible to detect that array for reshape is monitored now for metadata update. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-01-06 19:27:25 +11:00
Adam Kwolek	57f8c76946	Detect level change For level migration support it is necessary to allow mdmon to react for level changes. It has to have ability to change configuration of active array, and for array level change to raid0 finish array monitoring. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2011-01-06 19:17:29 +11:00
NeilBrown	aad6f216a1	Handle checkpointing during reshape We need to allow metadata to handle progress of reshape, completion, and abort-before-start. Include all those in ->set_array_state() Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 15:48:05 +11:00
NeilBrown	0f99b4bd73	mdmon: when a reshape is detected, add any newly added devices to the array. When mdadm starts a reshape, it might add some devices to the array first. mdmon needs to notice the reshape starting and check for any new devices. If there are any they need to be provided to be monitored. Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-16 09:07:52 +11:00
Hawrylewicz Czarnowski, Przemyslaw	6f4cdfd927	fix: mdadm -Ss for external metadata don't stop container Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its device and do not close it before exit(). The period between open and release of handle is too long and md is not able stop device. Releasing handle before exit does not block md. Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2010-12-07 21:12:38 +11:00
Dan Williams	4f0a7acc9a	mdmon: record sync_completed directly to the metadata When sync_action is idle mdmon takes the latest value of md/resync_start or md/<dev>/recovery_start to record the resync/rebuild checkpoint in the metadata. However, now that mdmon is reading sync_completed there is no longer a need to wait for, or force an idle event to take a checkpoint. Simply update the forward progress of ->last_checkpoint at every wakeup event and force it to be recorded at least every 1/16th array-size interval. It may be recorded more frequently if a ->set_array_state() event occurs. This also cleans up some confusion in handling the dual-rebuild case. If more than one spare has been activated the kernel starts the rebuild at the lowest recovery offset, so we do not need to worry about min_recovery_start(). Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2010-06-15 18:41:57 -07:00
Dan Williams	484240d8a3	mdmon: periodically checkpoint recovery The kernel updates and notifies md/sync_completed when it is time to take a checkpoint. When this occurs (at 1/16 array size intervals) write 'idle' to md/sync_action to have the current recovery position updated in recovery_start and resync_start. Requires the metadata handler to reset ->last_checkpoint when it has determined that recovery has ended. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2010-05-14 17:42:49 -07:00
NeilBrown	fa716c83c5	mdmon: insist on creating .pid file at startup. Now that we don't "mdadm --takeover" until /var/run is writable there is no need to continually try to create files in there. So only create these files at startup and fail if they cannot be made. This means that to start an array with externally managed metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be writable. To 'takeover' from a previous mdmon instance, /var/run must be writable. This means we don't need to worry about SIGHUP (which was once used to tell us it was time to create .pid) and SIGALRM. Signed-off-by: NeilBrown <neilb@suse.de>	2010-02-08 17:26:18 +11:00
Dan Williams	b7528a20cc	Introduce MaxSector Replace occurrences of ~0ULL to make it clear we are talking about maximal resync/recovery position. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-21 10:23:26 -07:00
Dan Williams	e1516be1db	Add scaffolding for handling md/dev-XXX/recovery_start Prepare the code to handle saving a recovery checkpoint. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-21 10:06:14 -07:00
Dan Williams	b7941fd68d	mdmon: cleanup resync_start We don't need to sprinkle reads of this attribute all over the place, just once at the entry of read_and_act(). Also, the mdinfo structure for the array already has a 'resync_start' member, so just reuse that. Finally, rename get_resync_start() to read_resync_start to make it consistent with the other sysfs accessors in monitor.c. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-12-14 12:57:55 -07:00
NeilBrown	e736b62389	Update copyright dates and remove references to @cse.unsw.edu.au Also removed 'paper' addresses. Signed-off-by: NeilBrown <neilb@suse.de>	2009-06-02 14:35:45 +10:00
NeilBrown	2800528713	Wait for POLLPRI on /proc or /sys files. From 2.6.30, /proc/mounts and various /sys files will probably always returns 'readable' to select, so we will need to wait on POLLPRI to get the 'new data is available' signal. When using select, this corresponds to an 'exception', so adjust calls to select accordingly. In one case we sometimes wait on a socket and sometime on /proc/mounts, so we need to test which. Signed-off-by: NeilBrown <neilb@suse.de>	2009-04-14 14:59:24 +10:00
Dan Williams	7e7fffc402	mdmon: fix resync completion detection Starting with 2.6.30 the md/resync_start attribute will no longer return a non-sensical number when resync is complete, instead it now returns 'none'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-04-12 00:58:28 -07:00
Dan Williams	140d3685fb	mdmon: fix missed 'clean' event mdmon may miss events because it re-reads state after read_and_act. The additional read is used to determine dirty status before allowing a sigterm to proceed. Since read_and_act is in the best position to determine 'dirty' status and its return value is not used, modify it to return true if the array is dirty. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-02-24 18:45:57 -07:00
NeilBrown	e8a70c8958	mdmon: pass symbolic name to mdmon instead of device name. Now that names in /dev are usually created (eventually) by udev, it isn't really safe to rely in finding a name in /dev to pass to mdmon to identify which array to monitor. And it isn't really necessary to have a name in /dev. So just pass the symbolic name, e.g. md127 or md123. Change util.c to pass that name, and change mdmon to process the name sensibly. Signed-off-by: NeilBrown <neilb@suse.de>	2008-11-20 14:51:42 +11:00
Dan Williams	a54d52625a	update copyright headers Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-28 10:55:29 -07:00
Dan Williams	6144ed4414	mdmon: terminate clean We generally don't want mdmon to be terminated, but if a SIGTERM gets through try to leave the monitored arrays in a clean state, block attempts to mark the array dirty, and stop servicing the socket. When we are killed by sigterm don't remove the pidfile let that be cleaned up by the next monitor. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-15 14:43:57 -07:00
Dan Williams	3d2c4fc7b6	trivial warn_unused_result squashing Made the mistake of recompiling the F9 mdadm rpm which has a patch to remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots of errors: config.c:568: warning: ignoring return value of asprintf Assemble.c:411: warning: ignoring return value of asprintf Assemble.c:413: warning: ignoring return value of asprintf super0.c:549: warning: ignoring return value of posix_memalign super0.c:742: warning: ignoring return value of posix_memalign super0.c:812: warning: ignoring return value of posix_memalign super1.c:692: warning: ignoring return value of posix_memalign super1.c:1039: warning: ignoring return value of posix_memalign super1.c:1155: warning: ignoring return value of posix_memalign super-ddf.c:508: warning: ignoring return value of posix_memalign super-ddf.c:645: warning: ignoring return value of posix_memalign super-ddf.c:696: warning: ignoring return value of posix_memalign super-ddf.c:715: warning: ignoring return value of posix_memalign super-ddf.c:1476: warning: ignoring return value of posix_memalign super-ddf.c:1603: warning: ignoring return value of posix_memalign super-ddf.c:1614: warning: ignoring return value of posix_memalign super-ddf.c:1842: warning: ignoring return value of posix_memalign super-ddf.c:2013: warning: ignoring return value of posix_memalign super-ddf.c:2140: warning: ignoring return value of write super-ddf.c:2143: warning: ignoring return value of write super-ddf.c:2147: warning: ignoring return value of write super-ddf.c:2150: warning: ignoring return value of write super-ddf.c:2162: warning: ignoring return value of write super-ddf.c:2169: warning: ignoring return value of write super-ddf.c:2172: warning: ignoring return value of write super-ddf.c:2176: warning: ignoring return value of write super-ddf.c:2181: warning: ignoring return value of write super-ddf.c:2686: warning: ignoring return value of posix_memalign super-ddf.c:2690: warning: ignoring return value of write super-ddf.c:3070: warning: ignoring return value of posix_memalign super-ddf.c:3254: warning: ignoring return value of posix_memalign bitmap.c:128: warning: ignoring return value of posix_memalign mdmon.c:94: warning: ignoring return value of write mdmon.c:221: warning: ignoring return value of pipe mdmon.c:327: warning: ignoring return value of write mdmon.c:330: warning: ignoring return value of chdir mdmon.c:335: warning: ignoring return value of dup monitor.c:415: warning: rv may be used uninitialized in this function ...some of these like the write() ones are not so trivial so save those fixes for the next patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-10-15 14:15:52 -07:00
Dan Williams	4065aa816a	monitor: clean up some debug messages Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-09-15 20:58:43 -07:00
Dan Williams	1770662bca	'mdadm --wait-clean' wait for array to be marked clean For use in distro shutdown scripts with a RAID root file system. Returns immediately if the array is 'readonly', or not an externally managed array. It is up to the distro's scripts to make sure no new writes hit the device after this returns 'true'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-09-15 20:58:42 -07:00
Dan Williams	0c0c44db5a	monitor: don't mark dirty on resync complete ...instead look at array state to determine if the array is consistent Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-09-15 20:58:42 -07:00
Dan Williams	d797a0621f	monitor: mark clean on active-idle This also handles the case where 'clean' is set directly. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-09-15 20:58:42 -07:00
NeilBrown	e9dd159873	Allow an externally managed array to be marked readonly If the metadata_version is -mdXXX/whatever rather than /mdXXX/whatever then the array is readonly and should be left alone by mdmon. Signed-off-by: NeilBrown <neilb@suse.de>	2008-08-19 17:55:15 +10:00
NeilBrown	01f157d74a	Extra option for set_array_state: you choose dirty or clean. When we first start an array, it might be good to start recovery straight away. That requires setting the array to 'dirty', but only the metadata handler can know if that is required or not. So have a third possible 'consistent' option to set_array_state. Either 'no' or 'yes' or 'you choose'. Return value indicates what was chosen. '1' (no) should be chosen unless there is a good reason. Signed-off-by: NeilBrown <neilb@suse.de>	2008-08-19 14:54:55 +10:00
Dan Williams	9296754385	mdmon: handle failures versus readauto arrays Transition readauto arrays to active before failing drives. Hmm... why do we keep reblocking / renotifying in the readonly case? Need to bottom out on this, but not right now. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-08-15 10:58:43 -07:00
Dan Williams	272906ef49	mdmon: use activate spare for re-add Disks that are not in-sync or failed are not assembled into member arrays by mdadm. Teach mdmon to resolve this situation by checking for spares at start. imsm_activate_spare() is updated to prefer devices that can be re-added versus new spares. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-08-12 02:25:46 -07:00
Dan Williams	5802a8118e	imsm: handle degraded->normal transitions in set_disk Removes the need for the call to ->set_array_state when sync_action transitions from 'recover' to 'idle'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-07-24 17:26:23 -07:00
NeilBrown	33af8567de	monitor: call get_resync_start on array shutdown. If the array is shutdown as soon as resync finishes, we might not notice the resync finish. So on array shutdown, check for current resync pos. Signed-off-by: Neil Brown <neilb@suse.de>	2008-07-18 16:37:26 +10:00
NeilBrown	1eb252b848	mdmon: ping will wait for manage_mon to catch up. When a 'ping' (empty message) is sent to mdmon, we wait for 'monitor' to do a full loop to make sure it has caught up with anything that needs doing. This allows synchronisation between mdadm and mdmon. Maybe monitor should signal managemon rather than managemon polling... Signed-off-by: Neil Brown <neilb@suse.de>	2008-07-18 16:37:06 +10:00
Neil Brown	103f2410ec	Make sure resync_start is initialised properly and maintained properly Signed-off-by: Neil Brown <neilb@suse.de>	2008-07-18 16:37:04 +10:00
Dan Williams	00e021427e	mdmon: close possibility of re-marking the metadata dirty on shutdown Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-07-14 14:59:39 -07:00
Dan Williams	0a6bdbee8d	mdmon: notify metadata of recovery completion Array may no longer be degraded. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-07-14 14:59:39 -07:00
Neil Brown	207aac36d5	Make sure we remove pid file in monitor before manager exits.	2008-07-12 20:27:42 +10:00
Neil Brown	a35c070bcd	Remove some noisy printfs.	2008-07-12 20:27:41 +10:00
Neil Brown	bfa44e2e7a	Revise message passing code. More here	2008-07-12 20:27:40 +10:00
Neil Brown	4d43913ce0	Remove mgr_pipe for communicating from manage to monitor. Data is being passed in shared memory, so the pipe is only being use as a wakeup. This can more easily be done with a thread-signal.	2008-07-12 20:27:40 +10:00
Neil Brown	2f64e61a50	Remove mon_pipe for communicating from monitor to manager The returned value was never used, and we don't really want this return path anyway as writing to a pipe could conceivably block, and the monitor must not block.	2008-07-12 20:27:40 +10:00
Neil Brown	f94d52f43e	Handle device removal from container This really should be done in mdadm, not mdmon. We ensure the device won't be suddenly commited as a hot-spare using O_EXCL, then check the 'holders' sysfs directory to make sure it is only in use once.	2008-07-12 20:27:40 +10:00
Dan Williams	4e6e574a3e	mdmon: add debug print statements for profiling mdmon for development only as console output can block leading to monitor deadlocks in low mem situations Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-06-16 15:50:07 -07:00
Neil Brown	7e1432fb14	Add DDF code for activate_spare Plus various bug fixes etc.	2008-06-12 10:13:32 +10:00

1 2

65 Commits