Commit Graph

62 Commits

Author SHA1 Message Date
Dan Williams d19e3cfb66 Merge branch 'fixes' into for-neil 2010-07-01 17:36:11 -07:00
Dan Williams b526e52dc7 Always assume SKIP_GONE_DEVS behaviour and kill the flag
...i.e. GET_DEVS == (GET_DEVS|SKIP_GONE_DEVS)

A null pointer dereference in Incremental.c can be triggered by
replugging a disk while the old name is in use.  When mdadm -I is called
on the new disk we fail the call to sysfs_read().  I audited all the
locations that use GET_DEVS and it appears they can tolerate missing a
drive.  So just make SKIP_GONE_DEVS the default behaviour.

Also fix up remaining unchecked usages of the sysfs_read() return value.

Reported-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-16 17:26:04 -07:00
Dan Williams 484240d8a3 mdmon: periodically checkpoint recovery
The kernel updates and notifies md/sync_completed when it is time to
take a checkpoint.  When this occurs (at 1/16 array size intervals)
write 'idle' to md/sync_action to have the current recovery position
updated in recovery_start and resync_start.

Requires the metadata handler to reset ->last_checkpoint when it has
determined that recovery has ended.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-05-14 17:42:49 -07:00
Dan Williams 63b4aae33e mdmon: fix missing open of md/<dev>/recovery_start
When activating a spare we neglect to open recovery_start and as such do
not see checkpoint events.  Move disk initialization to common routine
to mitigate recurrence.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-04-29 10:50:29 -07:00
NeilBrown fa716c83c5 mdmon: insist on creating .pid file at startup.
Now that we don't "mdadm --takeover" until /var/run is writable
there is no need to continually try to create files in there.

So only create these files at startup and fail if they cannot be
made.  This means that to start an array with externally managed
metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be
writable.  To 'takeover' from a previous mdmon instance, /var/run
must be writable.

This means we don't need to worry about SIGHUP (which was once used to
tell us it was time to create .pid) and SIGALRM.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-08 17:26:18 +11:00
NeilBrown 58a4ba2a6b mdmon: don't monitor /proc/mounts to decide when to create .pid file.
Monitoring /proc/mounts and creating a .pid file as soon as /var/run
is writable is racy.  Most distros clean all non-directories from
/var/run early in boot and if mdmon races with this it could
lose the files as soon as they are created.

Instead require that "mdmon --takeover" be run after /var is writable.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-08 17:26:18 +11:00
NeilBrown 5d4d1b26d3 mdmon: allow pid to be stored in different directory.
/var/run probably doesn't persist from early boot.
So if necessary, store in in /lib/init/rw or somewhere else
that does persist.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-04 16:47:28 +11:00
NeilBrown 688a1e5b07 mdmon: don't mkdir /var/run
Creating /var/run in mdmon is really not justifiable.

If /var/run doesn't exist, then it is either deliberate and it should
be left that way to make sure the mapfile gets created in /dev, or
it is a configuration error and not our problem to fix.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-04 16:37:20 +11:00
Dan Williams 2904b26f05 Support external metadata recovery-resume
Minimal changes needed to permit reassembling partially recovered
external metadata arrays.  The biggest logical change is that
->container_content() can now surface partially rebuilt members rather
than omitting them from the disk list.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 12:51:57 -07:00
Dan Williams d23534e464 Teach sysfs_add_disk() callers to use ->recovery_start versus 'insync' parameter
Also fixup 'in_sync' versus 'insync' typo.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 11:26:21 -07:00
Dan Williams e1516be1db Add scaffolding for handling md/dev-XXX/recovery_start
Prepare the code to handle saving a recovery checkpoint.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 10:06:14 -07:00
Dan Williams b7941fd68d mdmon: cleanup resync_start
We don't need to sprinkle reads of this attribute all over the place,
just once at the entry of read_and_act().  Also, the mdinfo structure
for the array already has a 'resync_start' member, so just reuse that.
Finally, rename get_resync_start() to read_resync_start to make it
consistent with the other sysfs accessors in monitor.c.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-14 12:57:55 -07:00
Dan Williams 071cfc4258 mdmon: cleanup manage_member() leak
free() the results of activate_spare().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-12 14:10:01 -07:00
Dan Williams 96a8270d46 mdmon: avoid writes in the startup path for mdmon on root arrays
When killing a previous monitor be careful not to cause writes to the
filesystem until the reads necessary to get the monitor operational have
completed.

The code is already prepared for errors creating the pid and socket
files, so simply defer creation of these files until after the first
call to manage().

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-10-13 17:41:57 -07:00
NeilBrown e736b62389 Update copyright dates and remove references to @cse.unsw.edu.au
Also removed 'paper' addresses.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-02 14:35:45 +10:00
NeilBrown 462906cdee incremental_container: preserve 'in_sync' flag when adding to existing array.
When building container members with -IR, we need to ensure that
devices added to an active array preserve the 'in_sync' status so they
don't needlessly get rebuilt.

So allow sysfs_add_disk to do this (only works in kernels since
2.6.30) and pass the relevant flag down.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-14 10:19:02 +10:00
NeilBrown 661dce3617 mdmon: allow incremental assembly of containers.
If mdmon sees a device added to a container, it should assume it is
a new spare.  It could be a part of the array that just hadn't been
assembled yet.  So check first.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-03-10 16:28:22 +11:00
Dan Williams 04a8ac089c mdmon: record added disks
Prevent duplicate disks from being sent to the monitor thread.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-02-24 18:45:57 -07:00
Dan Williams 7da80e6faa mdmon: fix removed disk handling
Use SKIP_GONE_DEVS when reading the container, and correct some confused
logic in manage_new().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-02-24 18:45:57 -07:00
Dan Williams a54d52625a update copyright headers
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-28 10:55:29 -07:00
Dan Williams 883a6142e6 mdmon: wait after trying to kill
Now that mdmon handles sigterm if another monitor wants to take over it
should wait until all managed arrays are clean.  So make WaitClean()
available to mdmon and teach try_kill_monitor() to wait on each subarray
in the container.

...since we may be communicating with a dieing process, we need to
block SIGPIPE earlier.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-15 14:43:57 -07:00
Dan Williams 6144ed4414 mdmon: terminate clean
We generally don't want mdmon to be terminated, but if a SIGTERM gets
through try to leave the monitored arrays in a clean state, block
attempts to mark the array dirty, and stop servicing the socket.

When we are killed by sigterm don't remove the pidfile let that be
cleaned up by the next monitor.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-15 14:43:57 -07:00
Dan Williams 695154b2e7 mdmon: periodically retry to create the socket
If initial socket creation fails, EROFS, set a periodic alarm to wake up
the manager and retry.  Include a kernel patch that will wake us up if
the mount flags are changed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-15 14:15:52 -07:00
NeilBrown 7801ac2092 Factor out add-disk code
The variety of approaches to 'add_disk' are factored out into
a separate function, and Incremental mode benefits by being
closer to supporting the assembly of containers.

Also remove the adding-to-array-data-structure out of sysfs_add_disk
and into add_disk.

And add some tests for --incremental mode to make sure we don't break it.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-09-18 15:13:32 +10:00
Dan Williams 295646b3d5 mdmon: recreate socket/pid file on SIGHUP
Allow mdmon to start while /var/run/mdadm is readonly.  Later a SIGHUP
can trigger mdmon to drop its pid and socket once /var/run/mdadm is
writable.  Of course one needs the pid to send a HUP, that can be stored
in a distribution specific rw-init directory... For now, rely on a
killall -HUP mdmon to get the files dumped.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:43 -07:00
Dan Williams 313a4a82f1 ping_manager() to prevent 'add' before 'remove' completes
It is currently possible to remove a device and re-add it without the
manager noticing, i.e. without detecting a mdstat->devcnt
container->devcnt mismatch.  Introduce ping_manager() to arrange for
mdmon to run manage_container() prior to mdadm dropping the exclusive
open() on the container.  Despite these precautions sysfs_read() may
still fail.  If this happens invalidate container->devcnt to ensure
manage_container() runs at the next event.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:43 -07:00
Dan Williams 93f7cacab3 mdmon: resume rebuild
If we started a degraded array that was previously rebuilding we may
have enough information to resume the rebuild without a trip through the
monitor.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:43 -07:00
NeilBrown e9dd159873 Allow an externally managed array to be marked readonly
If the metadata_version is
    -mdXXX/whatever
rather than
    /mdXXX/whatever

then the array is readonly and should be left alone by mdmon.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-08-19 17:55:15 +10:00
NeilBrown 3c558363a1 Factor out test for subarray version string.
We are about to change the syntax of the version string
for 'subarray's.  So factor out the test into a single function.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-08-19 17:55:15 +10:00
Dan Williams 43dad3d6fb mdadm: add device to a container
Adding a device updates the container and then mdmon takes action upon
noticing a change in devices.  This reuses the container version of
add_to_super to create a new record for the device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2008-08-19 17:19:51 +10:00
Dan Williams 7bc1962f8c mdmon: remove devices from container
Once the monitor thread has kicked a drive from all managed arrays mdadm
-r is permitted.  We are guaranteed that the drive is marked failed at
this point, so allow the drive to be re-added as a spare.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-08-19 14:55:12 +10:00
NeilBrown 3c00ffbe98 Make metadata updates from manage to monitor 'synchronous'
A metadata update may modify the data structure of the metadata
including freeing things, so it is not safe of the manager to touch
the metadata while an update is pending in the monitor.
So When an update has been submitted, don't do anything else in the
manager until it is complete.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-08-19 14:55:03 +10:00
Dan Williams f1d267661d mdmon: allow degraded arrays to be monitored
manage_new is too strict in the face of failed devices.  Teach it to
monitor degraded arrays.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-08-15 10:58:43 -07:00
Dan Williams 63d7cc784b mdmon: use 'recover' instead of 'repair' when activating a spare
Repair sets MD_RECOVERY_REQUESTED in md which may not result in the
spare device being recovered.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-08-07 11:54:09 -07:00
Dan Williams 836759d561 mdmon: ignore inactive arrays and other manage_new() cleanups
While mdadm is constructing an array mdmon may see an intermediate state
(some disks not yet added / redundancy attributes like sync_action not
available).  Waiting for mdstat->active == true ensures that the array
is ready to be handled.  This fixes a bug in create array via mdmon
update whereby failures are not detected in the new array.

Introduce aa_ready() to catch cases where the active_array is not
correctly initialized.  Barring a kernel bug this should never trigger,
nonetheless it precludes a class of bugs like the one mentioned above
from triggering.

Cleanup the exit paths and only call replace_array when the new array is
ready to be inserted into container->arrays.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-08-04 16:48:27 -07:00
NeilBrown 1eb252b848 mdmon: ping will wait for manage_mon to catch up.
When a 'ping' (empty message) is sent to mdmon, we wait for
'monitor' to do a full loop to make sure it has caught up
with anything that needs doing.
This allows synchronisation between mdadm and mdmon.

Maybe monitor should signal managemon rather than managemon polling...

Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:06 +10:00
Neil Brown 103f2410ec Make sure resync_start is initialised properly and maintained properly
Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:04 +10:00
Dan Williams 272bcc48d1 mdmon: initialize component_size in manage_new
When we go to activate a spare for an array we expect ->info.component_size
is valid.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-14 14:59:39 -07:00
Dan Williams 00451d9874 managemon: don't treat sysfs_add_disk as boolean function
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-14 14:10:02 -07:00
Neil Brown 48561b0142 Improve shutdown for container-based arrays.
1/ close a race where multiple arrays disappear at once
   and monitor isn't woken up to find out that the last one
   has gone.
2/ "mdadm -Ss" needs to pause briefly for mdmon to exit.
2008-07-12 20:27:42 +10:00
Neil Brown edd8d13c02 Create arrays via metadata-update
Support creating arrays inside an active ddf container by
sending a metadata update over a pipe to mdmon.
2008-07-12 20:27:40 +10:00
Neil Brown bfa44e2e7a Revise message passing code.
More here
2008-07-12 20:27:40 +10:00
Neil Brown 4d43913ce0 Remove mgr_pipe for communicating from manage to monitor.
Data is being passed in shared memory, so the pipe is only being
use as a wakeup.  This can more easily be done with a thread-signal.
2008-07-12 20:27:40 +10:00
Neil Brown 2f64e61a50 Remove mon_pipe for communicating from monitor to manager
The returned value was never used, and we don't really want
this return path anyway as writing to a pipe could conceivably
block, and the monitor must not block.
2008-07-12 20:27:40 +10:00
Neil Brown f94d52f43e Handle device removal from container
This really should be done in mdadm, not mdmon.
We ensure the device won't be suddenly commited as a hot-spare
using O_EXCL, then check the 'holders' sysfs directory
to make sure it is only in use once.
2008-07-12 20:27:40 +10:00
Neil Brown 904c1ef77b Fix freeing of updates that have been handled by monitor.
Yes, we do want to free the buf, and the space too if it is still
there.
2008-07-12 20:27:33 +10:00
Dan Williams 4e6e574a3e mdmon: add debug print statements for profiling mdmon
for development only as console output can block leading to monitor deadlocks
in low mem situations

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-06-16 15:50:07 -07:00
Neil Brown 7e1432fb14 Add DDF code for activate_spare
Plus various bug fixes etc.
2008-06-12 10:13:32 +10:00
Neil Brown 6c3fb95c44 Support adding a spare to a degraded array.
When signalled by the monitor, the manager will find spares and
add them to the array and initiate a recovery.
2008-06-12 10:13:29 +10:00
Neil Brown 2e735d1982 Allow passing metadata update to the monitor.
Code in manager can now just call queue_metadata_update with a
(freeable) buf holding the update, and it will get passed to the
monitor and written out.
2008-06-12 10:13:23 +10:00