Allow per-device "failfast" flag to be set when creating an
array or adding devices to an array.
When re-adding a device which had the failfast flag, it can be removed
using --nofailfast.
failfast status is printed in --detail and --examine output.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
A 'faulty' drive is being removed from a container after it has been
released by an array, however there is a race there. The drive is
released asynchronously by a monitor but sometimes it doesn't happen
before container checks it. It results in a container refusing to remove
a drive as it still seems to be a part of some array.
It seems 'ping_monitor' could be a solution here to assure monitor has
had a chance to process the events, however it doesn't resolve the
problem - sometimes an array has to request a release of the drive few
times (as the array is busy) and single 'ping_monitor' call is not
sufficient. As there is no way to query monitor progress, it forces us
to retry a check several times before an error is returned.
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
dev_st is only ever assigned if array->not_persistent == 0, so move
the second use of it into the same scope where the assignment is
made.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
sysfs_read() allocates and populates a struct mdinfo, however the code
forgot to free it again, before dropping the reference to the pointer.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Commit d180d2aa2a ("Manage: fix test for 'is array failed'.")
introduced a regression which would not allow to re-add new
drivers to a failed array.
Fixes: d180d2aa2a ("Manage: fix test for 'is array failed'.")
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Coly Li <colyli@suse.de>
Cc: Neil Brown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
sysfs_read() may return NULL, so we should check the validity of the
pointer before dereferencing it.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2.6.28+ kernels handle this themselves and issuing the event here can
cause a race.
Reported-by: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This patch tries recreates missing/faulty journal in mdadm.
Example:
./mdadm --fail /dev/md1 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1
./mdadm --stop /dev/md1
mdadm: stopped /dev/md1
./mdadm -A --scan --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md/1 has been started with 15 drives.
./mdadm --add-journal /dev/md1 /dev/sdb2
mdadm: added /dev/sdb2
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
If sysfs node existed, we should try to write "re-add" to it.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
If it is a cluster raid, the disc.state need to be
changed accordingly when do re-add.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
When parse_cluster_confirm_arg return 0, it means the
arg are parsed successfully, so change !rv to rv.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
If the array is reshaping to more devices, then stopping
during that initial critical section is a bad idea.
So check for it and wait a bit.
Should probably handle final critical section of a reduction
too.
same-size reshape should be handled correctly already.
Signed-off-by: NeilBrown <neilb@suse.de>
A race can allow 'completed' to read as 2^63-1, which takes
a long time to count up to.
So guard against that possibility.
Signed-off-by: NeilBrown <neilb@suse.com>
A 'devnm' never starts with '/', so this test is pointless.
The code should use the passed-in devname unless it is clearly
not usable. So fix it to do that.
Signed-off-by: NeilBrown <neilb@suse.de>
These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.
Signed-off-by: NeilBrown <neilb@suse.de>
A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:
--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)
The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
avail_size1 requires ->sb, so we must only call it if ->sb
was loaded.
If ->sb wasn't loaded, then we are only proceding on the basis that
the kernel might be able to work something out - we don't need to
do any tests on size.
Reported-by: Christoffer Hammarström <christoffer.hammarstrom@linuxgods.com>
Signed-off-by: NeilBrown <neilb@suse.de>
URL: https://bugs.debian.org/784874
We 'active_disks' does not count spares, so if array is rebuilding,
this will not necessarily find all devices, so may report an array
as failed when it isn't.
Counting up to nr_disks is better.
Signed-off-by: NeilBrown <neilb@suse.de>
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.
Only strings which contain a newline can be broken
into multiple lines:
"It is OK to\n"
"break this string\n"
Signed-off-by: NeilBrown <neilb@suse.de>
"--remove detached" and others stopped working a while
back when I refactored some code.
For 'remove' and 'fail', the device may not exist so
if it is "MM:mm", (e.g. added by "detached"), just parse
out the numbers.
Reported-by: Killian De Volder <killian.de.volder@megasoft.be>
Signed-off-by: NeilBrown <neilb@suse.de>
The only use 'struct stat stb' to get the 'rdev', and sometimes
we don't even use 'stat'.
So make 'rdev' a stand-alone variable, and only declare stb'
when we actually need it.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ enough_fd doesn't use avail_disks any more, so discard it.
2/ Manage_Add increments 'found' at the wrong place, so it can
waste time before calling enough().
Signed-off-by: NeilBrown <neilb@suse.de>
--add-spare is like --add, but a --re-add is never attempted.
So it is equivalent to two separate commands:
--zero-metadata
--add
Signed-off-by: NeilBrown <neilb@suse.de>
We really need to make sure assemble_container_content()
gets called to finished the assembly of these.
Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Current "mdadm --run /dev/mdX" will not handle external metadata
properly. mdmon won't be started etc.
So use the code from "mdadm -IRs" instead - that already does all
the right things.
Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This call to validate_geometry is really rather gratuitous.
It is purely about the fact that super0 cannot use more than 4TB.
So just make it an explicit test - less confusing that way.
With this, validate_geometry is only called from Create, which
makes it easier to reason about.
Also validate_geometry is now never passed NULL for the 'chunk'
parameter, so we can remove those annoying tests for NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
If we stop too soon after reshape starts (probably only during
testing), we can get confused by the status of the reshape.
If that might be happening - sleep a bit longer.
Also allow for reshape going unusually slowly (again, probably only
during testing).
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible for 'sync_completed' to be further ahead than
we deduced from 'reshape_position'. However we cannot read it while
the array is frozen, so it is hard to know.
Once that array is unfrozen, check and if sync_completed is ahead of
'sync_max', push 'sync_max' well ahead if 'sync_completed' so it
will all synchronise up properly.
Signed-off-by: NeilBrown <neilb@suse.de>
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.
The kernel only enforces the new stripe size multiple.
So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".
Signed-off-by: NeilBrown <neilb@suse.de>
Manage_runstop has an open-coded version of use_udev() which is no
longer correct. So make it use use_udev() explicitly.
Signed-off-by: NeilBrown <neilb@suse.de>
A RAID10 array can have 'sets' of devices which are reported by
--detail.
They can now be collectively failed or removed.
Signed-off-by: NeilBrown <neilb@suse.de>
When stopping an mdmon array, at reshape might be being aborted
which inhibets O_EXCL. So if that is possible, call flush_mdmon
to make sure mdmon isn't still busy.
Reported-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
There are a number of fields which should not
be left uninitialised. e.g. attempt_re_add can get
confused if ->writemostly is not set correctly.
Signed-off-by: NeilBrown <neilb@suse.de>
When asked to incrementally-remove a device, try marking the array
read-auto first. That will delay recording the failure in the
metadata until it is really relevant.
This way, if the device are just unplugged when the array is not
really in use, the metadata will remain clean.
If marking the default as faulty fails because it is EBUSY, that
implies that the array would be failed without the device. As the
device has (presumably gone) - that means the array is dead. So try
to stop it. If that fails because it is in use, send a uevent to
report that it is gone. Hopefully whoever mounted it will now let go.
This means that if you plug in some devices and they are
auto-assembled, then unplugging them will auto-deassemble relatively
cleanly.
To be complete, we really need the kernel to disassemble the array
after the last close somehow. Maybe if a REMOVE has failed and a STOP
has failed and nothing else much has happened, it could safely stop
the array on last close.
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm /dev/mdXX --re-add faulty
will identify any faulty devices in the array, remove them, and
--re-add them.
Signed-off-by: NeilBrown <neilb@suse.de>
A recent change to improve error messages for subdev management broken
all use cases were device names like %d:%d were used.
Re-arrange the code again so we use dev_open first - which understands
those names - and then only try 'stat' if that failed.
The important thing is to base the 'Cannot find' message on the result
of 'stat', not on the result of 'open'.
Signed-off-by: NeilBrown <neilb@suse.de>