Commit Graph

65 Commits

Author SHA1 Message Date
Krzysztof Wojcik 3a5d04735b FIX: imsm: Rebuild does not start on second failed disk
Problem:
If we have an array with two failed disks and the array is in degraded
state (now it is possible only for raid10 with 2 degraded mirrors) and
we have two spare devices in the container, recovery process should be
triggered on booth failed disks. It does not.
Recovery is triggered only for first failed disk.
Second failed disk remains unchanged although the spare drive exists
in the container and is ready to recovery.

Root cause:
mdmon does not check if the array is degraded after recovery of first
drive is completed.

Resolution:
Check if current number of disks in the array equals target number of disks.
If not, trigger degradation check and then recovery process.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 10:10:56 +11:00
NeilBrown 7023e0b8ae mdmon: Stop keeping track of RAID0 (and LINEAR) arrays.
Tracking RAID0 arrays doesn't really work.  There is no need,
and there are some sysfs files which won't exist when the array
appears and then won't be opened when the level is changed.

So simply ignore RAID0 and LINEAR arrays - don't add them when they
appear and if an array we are monitoring turns into one of these,
discard it promptly.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 17:23:17 +11:00
NeilBrown d998b738f5 mdmon: don't wait for O_EXCL when shutting down.
If mdmon is shutting down because there are no devices
left to look at, then don't wait 5 seconds for an O_EXCL open,
and that can block progress of --grow.

Only wait for O_EXCL if we received a signal.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 16:10:22 +11:00
NeilBrown de6a92199e Merge branch 'master' into devel-3.2 2011-03-14 18:49:57 +11:00
NeilBrown e40512fddb monitor: close recovery_fd when closing state_Fd
These should be open or closed together.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:24:01 +11:00
Krzysztof Wojcik 3275e05ec1 FIX: Reset disk state if disk is missing
If we can't read actual disk state, it shoud be initiated
to 0.
Overwise it may be out of date value resulting false action
later in code (e.g. set disk to improper state).

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:07:04 +11:00
Adam Kwolek 6d4225a131 FIX: Last checkpoint is not set
When reshape is finished monitor has to set last checkpoint
to the array end to allow metatdata for reshape finalization.
Metadata has to know if reshape is finished or it is broken
On reshape finish metadata finalization is required.
When reshape is broken, metadata must remain as is to allow
for reshape restart from checkpoint.

This can be resolved based on reshape_position sysfs entry.
When it is equal to 'none', it means that md finishes work.
In such situation move checkpoint to the end of array.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-03 17:34:27 +11:00
Krzysztof Wojcik 10ce18083d FIX: Reset disk state if disk is missing
If we can't read actual disk state, it shoud be initiated
to 0.
Overwise it may be out of date value resulting false action
later in code (e.g. set disk to improper state).

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-26 10:47:52 +10:00
Adam Kwolek 2a9f840972 FIX: sync_completed == 0 causes reshape cancellation in metadata
md signals reshape completion (whole area or parts) by setting
sync_completed to 0.  This causes in set_array_state() to rollback
metadata changes (super-intel.c:4977.  To avoid this do not allow for
set last_checkpoint to 0 if reshape is finished.

This was also root cause of my previous fix for finalization reshape
that I agreed earlier is not necessary,

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-17 12:44:52 +11:00
Adam Kwolek 4867e068e3 Raid0: detect reshape on array start
When raid0 array is takeovered to raid4 for reshape it should be possible to detect
that array for reshape is monitored now for metadata update.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-06 19:27:25 +11:00
Adam Kwolek 57f8c76946 Detect level change
For level migration support it is necessary to allow mdmon to react for level changes.
It has to have ability to change configuration of active array,
and for array level change to raid0 finish array monitoring.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-06 19:17:29 +11:00
NeilBrown aad6f216a1 Handle checkpointing during reshape
We need to allow metadata to handle progress of reshape,
completion, and abort-before-start.

Include all those in ->set_array_state()

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 15:48:05 +11:00
NeilBrown 0f99b4bd73 mdmon: when a reshape is detected, add any newly added devices to the array.
When mdadm starts a reshape, it might add some devices to the array
first.  mdmon needs to notice the reshape starting and check for any
new devices.  If there are any they need to be provided to be
monitored.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 09:07:52 +11:00
Hawrylewicz Czarnowski, Przemyslaw 6f4cdfd927 fix: mdadm -Ss for external metadata don't stop container
Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its device
and do not close it before exit(). The period between open and release of
handle is too long and md is not able stop device. Releasing handle before
exit does not block md.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-07 21:12:38 +11:00
Dan Williams 4f0a7acc9a mdmon: record sync_completed directly to the metadata
When sync_action is idle mdmon takes the latest value of md/resync_start
or md/<dev>/recovery_start to record the resync/rebuild checkpoint in
the metadata.  However, now that mdmon is reading sync_completed there
is no longer a need to wait for, or force an idle event to take a
checkpoint.

Simply update the forward progress of ->last_checkpoint at every wakeup
event and force it to be recorded at least every 1/16th array-size
interval.  It may be recorded more frequently if a ->set_array_state()
event occurs.

This also cleans up some confusion in handling the dual-rebuild case.
If more than one spare has been activated the kernel starts the rebuild
at the lowest recovery offset, so we do not need to worry about
min_recovery_start().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-15 18:41:57 -07:00
Dan Williams 484240d8a3 mdmon: periodically checkpoint recovery
The kernel updates and notifies md/sync_completed when it is time to
take a checkpoint.  When this occurs (at 1/16 array size intervals)
write 'idle' to md/sync_action to have the current recovery position
updated in recovery_start and resync_start.

Requires the metadata handler to reset ->last_checkpoint when it has
determined that recovery has ended.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-05-14 17:42:49 -07:00
NeilBrown fa716c83c5 mdmon: insist on creating .pid file at startup.
Now that we don't "mdadm --takeover" until /var/run is writable
there is no need to continually try to create files in there.

So only create these files at startup and fail if they cannot be
made.  This means that to start an array with externally managed
metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be
writable.  To 'takeover' from a previous mdmon instance, /var/run
must be writable.

This means we don't need to worry about SIGHUP (which was once used to
tell us it was time to create .pid) and SIGALRM.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-08 17:26:18 +11:00
Dan Williams b7528a20cc Introduce MaxSector
Replace occurrences of ~0ULL to make it clear we are talking about maximal
resync/recovery position.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 10:23:26 -07:00
Dan Williams e1516be1db Add scaffolding for handling md/dev-XXX/recovery_start
Prepare the code to handle saving a recovery checkpoint.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 10:06:14 -07:00
Dan Williams b7941fd68d mdmon: cleanup resync_start
We don't need to sprinkle reads of this attribute all over the place,
just once at the entry of read_and_act().  Also, the mdinfo structure
for the array already has a 'resync_start' member, so just reuse that.
Finally, rename get_resync_start() to read_resync_start to make it
consistent with the other sysfs accessors in monitor.c.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-14 12:57:55 -07:00
NeilBrown e736b62389 Update copyright dates and remove references to @cse.unsw.edu.au
Also removed 'paper' addresses.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-02 14:35:45 +10:00
NeilBrown 2800528713 Wait for POLLPRI on /proc or /sys files.
From 2.6.30, /proc/mounts and various /sys files will
probably always returns 'readable' to select, so we will need
to wait on POLLPRI to get the 'new data is available' signal.

When using select, this corresponds to an 'exception', so
adjust calls to select accordingly.
In one case we sometimes wait on a socket and sometime on
/proc/mounts, so we need to test which.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-14 14:59:24 +10:00
Dan Williams 7e7fffc402 mdmon: fix resync completion detection
Starting with 2.6.30 the md/resync_start attribute will no longer return
a non-sensical number when resync is complete, instead it now returns
'none'.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-04-12 00:58:28 -07:00
Dan Williams 140d3685fb mdmon: fix missed 'clean' event
mdmon may miss events because it re-reads state after read_and_act.  The
additional read is used to determine dirty status before allowing a
sigterm to proceed.  Since read_and_act is in the best position to
determine 'dirty' status and its return value is not used, modify it to
return true if the array is dirty.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-02-24 18:45:57 -07:00
NeilBrown e8a70c8958 mdmon: pass symbolic name to mdmon instead of device name.
Now that names in /dev are usually created (eventually) by udev,
it isn't really safe to rely in finding a name in /dev to pass to
mdmon to identify which array to monitor.
And it isn't really necessary to have a name in /dev.
So just pass the symbolic name, e.g. md127 or md123.

Change util.c to pass that name, and change mdmon to process the
name sensibly.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-20 14:51:42 +11:00
Dan Williams a54d52625a update copyright headers
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-28 10:55:29 -07:00
Dan Williams 6144ed4414 mdmon: terminate clean
We generally don't want mdmon to be terminated, but if a SIGTERM gets
through try to leave the monitored arrays in a clean state, block
attempts to mark the array dirty, and stop servicing the socket.

When we are killed by sigterm don't remove the pidfile let that be
cleaned up by the next monitor.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-15 14:43:57 -07:00
Dan Williams 3d2c4fc7b6 trivial warn_unused_result squashing
Made the mistake of recompiling the F9 mdadm rpm which has a patch to
remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots
of errors:

config.c:568: warning: ignoring return value of asprintf
Assemble.c:411: warning: ignoring return value of asprintf
Assemble.c:413: warning: ignoring return value of asprintf
super0.c:549: warning: ignoring return value of posix_memalign
super0.c:742: warning: ignoring return value of posix_memalign
super0.c:812: warning: ignoring return value of posix_memalign
super1.c:692: warning: ignoring return value of posix_memalign
super1.c:1039: warning: ignoring return value of posix_memalign
super1.c:1155: warning: ignoring return value of posix_memalign
super-ddf.c:508: warning: ignoring return value of posix_memalign
super-ddf.c:645: warning: ignoring return value of posix_memalign
super-ddf.c:696: warning: ignoring return value of posix_memalign
super-ddf.c:715: warning: ignoring return value of posix_memalign
super-ddf.c:1476: warning: ignoring return value of posix_memalign
super-ddf.c:1603: warning: ignoring return value of posix_memalign
super-ddf.c:1614: warning: ignoring return value of posix_memalign
super-ddf.c:1842: warning: ignoring return value of posix_memalign
super-ddf.c:2013: warning: ignoring return value of posix_memalign
super-ddf.c:2140: warning: ignoring return value of write
super-ddf.c:2143: warning: ignoring return value of write
super-ddf.c:2147: warning: ignoring return value of write
super-ddf.c:2150: warning: ignoring return value of write
super-ddf.c:2162: warning: ignoring return value of write
super-ddf.c:2169: warning: ignoring return value of write
super-ddf.c:2172: warning: ignoring return value of write
super-ddf.c:2176: warning: ignoring return value of write
super-ddf.c:2181: warning: ignoring return value of write
super-ddf.c:2686: warning: ignoring return value of posix_memalign
super-ddf.c:2690: warning: ignoring return value of write
super-ddf.c:3070: warning: ignoring return value of posix_memalign
super-ddf.c:3254: warning: ignoring return value of posix_memalign
bitmap.c:128: warning: ignoring return value of posix_memalign
mdmon.c:94: warning: ignoring return value of write
mdmon.c:221: warning: ignoring return value of pipe
mdmon.c:327: warning: ignoring return value of write
mdmon.c:330: warning: ignoring return value of chdir
mdmon.c:335: warning: ignoring return value of dup
monitor.c:415: warning: rv may be used uninitialized in this function

...some of these like the write() ones are not so trivial so save those
fixes for the next patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-15 14:15:52 -07:00
Dan Williams 4065aa816a monitor: clean up some debug messages
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:43 -07:00
Dan Williams 1770662bca 'mdadm --wait-clean' wait for array to be marked clean
For use in distro shutdown scripts with a RAID root file system.
Returns immediately if the array is 'readonly', or not an externally
managed array.  It is up to the distro's scripts to make sure no new
writes hit the device after this returns 'true'.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:42 -07:00
Dan Williams 0c0c44db5a monitor: don't mark dirty on resync complete
...instead look at array state to determine if the array is consistent

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:42 -07:00
Dan Williams d797a0621f monitor: mark clean on active-idle
This also handles the case where 'clean' is set directly.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:42 -07:00
NeilBrown e9dd159873 Allow an externally managed array to be marked readonly
If the metadata_version is
    -mdXXX/whatever
rather than
    /mdXXX/whatever

then the array is readonly and should be left alone by mdmon.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-08-19 17:55:15 +10:00
NeilBrown 01f157d74a Extra option for set_array_state: you choose dirty or clean.
When we first start an array, it might be good to start recovery
straight away.  That requires setting the array to 'dirty', but
only the metadata handler can know if that is required or not.
So have a third possible 'consistent' option to set_array_state.
Either 'no' or 'yes' or 'you choose'.

Return value indicates what was chosen.

'1' (no) should be chosen unless there is a good reason.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-08-19 14:54:55 +10:00
Dan Williams 9296754385 mdmon: handle failures versus readauto arrays
Transition readauto arrays to active before failing drives.

Hmm... why do we keep reblocking / renotifying in the readonly case?
Need to bottom out on this, but not right now.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-08-15 10:58:43 -07:00
Dan Williams 272906ef49 mdmon: use activate spare for re-add
Disks that are not in-sync or failed are not assembled into member
arrays by mdadm.  Teach mdmon to resolve this situation by checking for
spares at start.  imsm_activate_spare() is updated to prefer devices
that can be re-added versus new spares.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-08-12 02:25:46 -07:00
Dan Williams 5802a8118e imsm: handle degraded->normal transitions in set_disk
Removes the need for the call to ->set_array_state when sync_action
transitions from 'recover' to 'idle'.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-24 17:26:23 -07:00
NeilBrown 33af8567de monitor: call get_resync_start on array shutdown.
If the array is shutdown as soon as resync finishes, we might not
notice the resync finish.  So on array shutdown, check for current
resync pos.

Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:26 +10:00
NeilBrown 1eb252b848 mdmon: ping will wait for manage_mon to catch up.
When a 'ping' (empty message) is sent to mdmon, we wait for
'monitor' to do a full loop to make sure it has caught up
with anything that needs doing.
This allows synchronisation between mdadm and mdmon.

Maybe monitor should signal managemon rather than managemon polling...

Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:06 +10:00
Neil Brown 103f2410ec Make sure resync_start is initialised properly and maintained properly
Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:04 +10:00
Dan Williams 00e021427e mdmon: close possibility of re-marking the metadata dirty on shutdown
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-14 14:59:39 -07:00
Dan Williams 0a6bdbee8d mdmon: notify metadata of recovery completion
Array may no longer be degraded.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-14 14:59:39 -07:00
Neil Brown 207aac36d5 Make sure we remove pid file in monitor before manager exits. 2008-07-12 20:27:42 +10:00
Neil Brown a35c070bcd Remove some noisy printfs. 2008-07-12 20:27:41 +10:00
Neil Brown bfa44e2e7a Revise message passing code.
More here
2008-07-12 20:27:40 +10:00
Neil Brown 4d43913ce0 Remove mgr_pipe for communicating from manage to monitor.
Data is being passed in shared memory, so the pipe is only being
use as a wakeup.  This can more easily be done with a thread-signal.
2008-07-12 20:27:40 +10:00
Neil Brown 2f64e61a50 Remove mon_pipe for communicating from monitor to manager
The returned value was never used, and we don't really want
this return path anyway as writing to a pipe could conceivably
block, and the monitor must not block.
2008-07-12 20:27:40 +10:00
Neil Brown f94d52f43e Handle device removal from container
This really should be done in mdadm, not mdmon.
We ensure the device won't be suddenly commited as a hot-spare
using O_EXCL, then check the 'holders' sysfs directory
to make sure it is only in use once.
2008-07-12 20:27:40 +10:00
Dan Williams 4e6e574a3e mdmon: add debug print statements for profiling mdmon
for development only as console output can block leading to monitor deadlocks
in low mem situations

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-06-16 15:50:07 -07:00
Neil Brown 7e1432fb14 Add DDF code for activate_spare
Plus various bug fixes etc.
2008-06-12 10:13:32 +10:00