The action we are waiting for may not be complete until the monitor has
had a chance to take action on the result.
The following script can now remove the device on the first attempt,
versus a few attempts with the original Wait():
#!/bin/bash
#export MDADM_NO_MDMON=1
export IMSM_DEVNAME_AS_SERIAL=1
./mdadm -Ss
./mdadm --zero-superblock /dev/loop[0-3]
echo 2 > /proc/sys/dev/raid/speed_limit_max
./mdadm --create /dev/imsm /dev/loop[0-3] -n 4 -e imsm -a md
./mdadm --create /dev/md/r1 /dev/loop[0-3] -n 4 -l 5 --force -a mdp
./mdadm --fail /dev/md/r1 /dev/loop3
./mdadm --wait /dev/md/r1
x=0
while ! ./mdadm --remove /dev/imsm /dev/loop3 > /dev/null 2>&1
do
x=$((x+1))
done
echo "removed after $x attempts"
./mdadm --add /dev/imsm /dev/loop3
Include 2 small cleanups:
* remove the almost open coded fd2devnum() in Wait() by introducing a
new utility routine stat2devnum()
* teach connect_monitor() to parse the container device from a subarray
string
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If the metadata_version is
-mdXXX/whatever
rather than
/mdXXX/whatever
then the array is readonly and should be left alone by mdmon.
Signed-off-by: NeilBrown <neilb@suse.de>
We are about to change the syntax of the version string
for 'subarray's. So factor out the test into a single function.
Signed-off-by: NeilBrown <neilb@suse.de>
When we first start an array, it might be good to start recovery
straight away. That requires setting the array to 'dirty', but
only the metadata handler can know if that is required or not.
So have a third possible 'consistent' option to set_array_state.
Either 'no' or 'yes' or 'you choose'.
Return value indicates what was chosen.
'1' (no) should be chosen unless there is a good reason.
Signed-off-by: NeilBrown <neilb@suse.de>
Transition readauto arrays to active before failing drives.
Hmm... why do we keep reblocking / renotifying in the readonly case?
Need to bottom out on this, but not right now.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The cmd_filter patch merged for 2.6.27 broke retrieving the serial
number via an ioctl to /dev/sgN. In debugging this I found that other
utilities like sdparm simply run the ioctl on /dev/sdX. So just convert
to that for protection in numbers, but scream on the mailing list for
the inconvenience grr...
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Using buffered IO risks non-atomic updates to parts of the
device that we don't actually want to write to. This isn't in
general safe.
So switch to O_DIRECT for all that IO and make sure we have
properly aligned buffers.
The returned value was never used, and we don't really want
this return path anyway as writing to a pipe could conceivably
block, and the monitor must not block.
This really should be done in mdadm, not mdmon.
We ensure the device won't be suddenly commited as a hot-spare
using O_EXCL, then check the 'holders' sysfs directory
to make sure it is only in use once.
When loading the metadata for a subarray (super_by_fd), we set
->subarray to be the name read from md/metadata_version so that
getinfo_super can return info about the correct array.
With this we can differentiate between a container and
an array within the container by looking at ->subarray[0].
Only one superswitch should be externally visible for each
general type. Others which handle different flavours
(e.g. container/data-array) should be internal only.
Code in manager can now just call queue_metadata_update with a
(freeable) buf holding the update, and it will get passed to the
monitor and written out.
'container_member' isn't really a well defined concept.
Each metadata might enumerate members differently, so just
let each format /mdX/YYYY as appropriate.
I want the metadata handler to have more control over the 'version',
particularly for arrays which are members of containers.
So discard st->text_version and instead use info->text_version
which getinfo_super can initialise.
mark_dirty is just a special case of mark_clean - with sync_pos == 0.
mark_sync is not required. We don't modify the metadata when sync
finishes. Only when the array becomes non-writeable at which point we
use mark_clean to record how far the resync progressed.
From: Dan Williams <dan.j.williams@intel.com>
Each md_message encapsulates a single command. A command includes an 'action'
member which describes what if any data comes after the action. Communication
with the monitor involves updating the active_cmd pointer and then writing to
mgr_pipe. Pass/fail status is returned via mon_pipe.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
Added curr_state as a parameter to set_disk. Handlers look at this to
record components failures, and set global 'degraded' or 'failed'
status.
When reading the state as faulty:
1/ mark the disk failed in the metadata
2/ write '-blocked' to the rdev state to allow the kernel's failure
mechanism to advance
3/ the kernel will take away the drive's role in remove_and_add_spares()
4/ once the disk no longer has a role writing 'remove' to the rdev state
will get the disk out of array.
There is a window after writing '-blocked' where the kernel will return
-EBUSY to remove requests. We rely on the fact that the disk will
continue to show faulty so we lazily wait until the kernel is ready to
remove the disk. If the manager thread needs to get the disk out of the
way it can ping the monitor and wait, just like the replace_array()
case.
[buglet fix: swap the parameters of attr_match in read_dev_state]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
1/ Block attempts to add/remove devices from container members
2/ Forward add/remove requests to containers
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
Metadata handlers set mdinfo.resync_start depending on the state of the
array. By default mdadm assumes the array is dirty and needs a full
resync.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
From: Dan Williams <dan.j.williams@intel.com>
The following now work:
--examine
--examine --brief
Signed-off-by: Dan Williams <dan.j.williams@intel.com>