Commit Graph

3027 Commits

Author SHA1 Message Date
NeilBrown bcbb92d4ee Remove some trailing white space
It looks ugly in my editor.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-02 08:26:30 +10:00
NeilBrown 52b6ccad34 Manage: fix no-op test in Manage_stop.
A 'devnm' never starts with '/', so this test is pointless.
The code should use the passed-in devname unless it is clearly
not usable.  So fix it to do that.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-02 08:16:59 +10:00
NeilBrown 9581efb1ae mdstat: discard 'dev' field, just use 'devnm'
These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-02 08:15:10 +10:00
NeilBrown caf9ac0ca4 Grow: fix typo in comment
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-18 15:51:45 +10:00
NeilBrown 56fcbcbb6f Assemble: ensure stripe_cache is big enough to handle new chunk size
If you reshape to a larger chunk size, and need to restart,
it can have problems.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-18 15:49:52 +10:00
NeilBrown 2a6493cfe1 Grow: fix a couple of typos.
Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-28 17:21:06 +10:00
NeilBrown 4a859abdc9 test: make 'check wait' more reliable.
'recover' etc doesn't appear in /proc/mdstat immediately.
The "sync" thread must be started first.
But 'sync_action' shows it as soon as MD_RECOVERY_NEEDED is set
in the kernel.  So look there too.

Now maybe I can get rid of some of those silly 'sleep' calls.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-28 17:00:36 +10:00
NeilBrown 7d1dda2c55 tests/imsm-grow-template change 'wait' to 'check wait'
'wait' is a shell builtin that isn't doing anything useful.
It should be calling 'check wait' I think.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-28 16:51:23 +10:00
NeilBrown 8e7ddc5f50 Grow: fix problem with --grow --continue
If an array is being reshaped using backup space on a 'spare' device,
then
  mdadm --grow --continue
won't find it as by the time it runs, nothing looks like a spare are
more.  The spare has been added to the array, but has no data yet.

So allow reshape_prepare_fdlist to find a newly-incorporated spare and
report this so it can be used.

Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-28 16:43:15 +10:00
NeilBrown 20c993e2e5 tests: wait a bit long for reshape to complete.
As the kernel now does less locking, 'check wait' doesn't
always wait long enough.  Add some pauses.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-25 16:59:19 +10:00
NeilBrown e0cc1c8d8b Grow: another attempt to fix stop-during-reshape race.
When the array is stopped during a critical section, we sometimes
erase the backup, which is bad.
This happens when 'completed' is zero.
This can happen easily when 'stop' freezes reshape.

So try to be more careful and check 'reshape_position'.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-25 16:33:45 +10:00
Andrew Burgess 4a984120ea Fix minor typo in mdadm manpage.
Appologies if this is the wrong mailing list for this patch.

This is a very small patch for the manual page for the mdadm utility.

Thanks,
Andrew

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-23 09:29:45 +10:00
Sergey Vidishev 1e08717f0b mdadm: monitor: fix nullptr dereference when get_md_name() returns NULL
Function add_new_arrays() expects that function get_md_name() should
return pointer to devname, but also get_md_name() may return NULL. So
check the pointer before use it in add_new_arrays().

Signed-off-by: Sergey Vidishev <sergeyv@yandex-team.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-20 13:16:09 +10:00
NeilBrown dd0468af57 test: forcefully clean up old loop devices.
sometimes these can get left around, and udev can be looking
at them at awkward times so they don't disappear.
So be forceful.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-20 13:16:00 +10:00
NeilBrown 3ee556f8b6 Grow: be even more careful about handing a '0' completed value.
Some old kernels set 'completed' to '0' too soon.
But modern kernels don't.
And when 'mdadm --stop' freezes and resume the grow,
'completed' goes back to zero briefly, which can confuse this
logic.
So only  think '0' might be wrong from an old kernel when
the reshape has gone idle.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 15:11:48 +10:00
NeilBrown 2c3e39ebf9 tests/07reshape5intr : retry if writing 'check' fails.
It can sometimes.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 15:09:08 +10:00
NeilBrown df995e3af3 tests/19raid6repair: don't flushbufs on non-existent array.
..that triggers an error.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 12:34:27 +10:00
NeilBrown e2a8e9dcf6 tests: wait for complete rebuild in integrity checks
'check wait' seems a bit racy now.
Wait for the array to be fully optimal before proceeding.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 11:07:27 +10:00
NeilBrown ada38ebbcb Grow: retry when writing 'reshape' to 'sync_action' is EBUSY.
EBUSY can be returned if something has recently happened
to cause md to want to check if recovery is needed, but hasn't
had a chance yet.

This can easily happen in testing.

So retry a few times in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 11:07:25 +10:00
NeilBrown 670fe20aa0 tests/05r6tor0: minor adjustments
1/ use correct data-offset for cmp - that has changed.
2/ flushbufs on the block device before reading to avoid cache issues

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 11:07:25 +10:00
NeilBrown f7f6c9f161 tests: 05r6tor0 - add some more waiting.
I don't really know why this is needed, but there is a delay
between the reshape finishing and the level/etc changing.
So add some sleeps.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 11:07:25 +10:00
NeilBrown 18d5cb9ee5 tests/imsm-grow-template: sleep a bit more.
The current sleep/wait doesn't seem long enough,
particularly when two arrays are being reshaped in the one
container.

So wait a bit more...

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 11:07:25 +10:00
NeilBrown e0184a0cd0 Grow: be more careful if array is stopped during critical section.
In that case, updating 'completed' to 'max_progress' is wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 11:07:25 +10:00
NeilBrown a5a6a7d9fa Grow: add missing space in message.
Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-15 11:07:25 +10:00
NeilBrown dd243f561f Grow: only warn about incompatible metadata when no fallback available.
We might be trying to set_new_data_offset() for RAID10, when it is
a necessary requirement, or for RAID5 where it is optional.
In the latter case, a message about metadata versions is no helpful.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-14 11:17:39 +10:00
NeilBrown 2609f33902 Manage: when re-adding, do check avail size if ->sb cannot be found.
avail_size1 requires ->sb, so we must only call it if ->sb
was loaded.

If ->sb wasn't loaded, then we are only proceding on the basis that
the kernel might be able to work something out - we don't need to
do any tests on size.

Reported-by: Christoffer Hammarström <christoffer.hammarstrom@linuxgods.com>
Signed-off-by: NeilBrown <neilb@suse.de>
URL: https://bugs.debian.org/784874
2015-05-13 14:08:41 +10:00
NeilBrown b638e7d440 tests: don't "dd" indefinitely.
This will trigger an error.  And now that errors are fatal....

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-13 13:24:33 +10:00
NeilBrown 7d63efc8d8 tests: ignore failure status from mdadm -IRs
This can report non-zero if there was nothing to do,
and that isn't really an error.
If the array doesn't get started, something else
will complain.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-13 13:11:02 +10:00
NeilBrown ec6db5ba71 Assemble: don't check for pre-existing array when updating uuid.
This is a very corner-case, but the self-tests tripped on it,
and it makes sense not to trust the uuid when it is being changed.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-13 12:41:48 +10:00
Martin Wilck b87fdf4e89 DDF: _write_super_to_disk: fix anchor header type
Since commit 30bee0201, the anchor is updated from the active
DDF header. This requires fixing the header type before the
anchor is written.

The LSI Software RAID code will reject DDF meta data with wrong
anchor type and will erase all meta data when it encounters
such a broken anchor. Thus starting Linux md once on a system
with LSI RAID BIOS may cause the meta data to get destroyed.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-13 10:33:35 +10:00
NeilBrown 3c899cab4d tests: never fail if --wait fails.
"--wait" will return non-zero status if it didn't need to wait.
This is no a reason to fail a test.

So ignore the return status from those commands.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-07 17:00:57 +10:00
NeilBrown 42129b3f80 Add "Name" defines to some ancillary programs
All programs now need to declare their "Name".

Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: d56dd607ba ("Change way of printing name of a process")
2015-05-07 14:46:05 +10:00
NeilBrown d180d2aa2a Manage: fix test for 'is array failed'.
We 'active_disks' does not count spares, so if array is rebuilding,
this will not necessarily find all devices, so may report an array
as failed when it isn't.

Counting up to nr_disks is better.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-06 15:03:50 +10:00
Pawel Baldysiak 72a4577704 IMSM: Count arrays per orom
Active arrays with IMSM metadata are counted per hba so far.
This is bad due to new functionality of orom shared between multiple
controllers i.e. more arrays can be created than is supported by orom.
This patch changes the way of counting arrays, so the result will be
sum of arrays under every hba supported by specific orom.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-09 09:06:23 +10:00
NeilBrown 87af7267bd Assemble/force: make it possible to "force" a new device in a reshape.
Normally we do not "force"-assemble devices which are in the
middle of recovery, as they are unlikely to have useful data.

However, when a reshape increases the number of devices,
the newly added devices appear to be recovering because they
do not have complete data on them yet, but then they aren't expected
to until the reshape completes.
So in this case, it can be appropriate to force-assemble them.

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 12:04:00 +10:00
NeilBrown c34fef774a Assemble: remove stray ':' from error message.
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 11:27:34 +10:00
NeilBrown 330d6900bb Assemble: allow a RAID4 to assemble easily when parity devices is missing.
If the parity device of a RAID4 is missing, then there is no immediate
risk to data.  So it doesn't matter if the array is dirty or not.

This can be important when reshaping a RAID0, and is a much better
solution that that in the resent-reverted.
   b720636a58

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 09:39:02 +10:00
NeilBrown d316dba7c9 Revert "Assemble: support assembling of a RAID0 being reshaped."
This reverts commit b720636a58.

As it said, this was a hack.  It causes problems when trying to
--force assemble a RAID4.  There is a better way.

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 09:31:32 +10:00
NeilBrown ee466574f2 Assemble: fix "no uptodate device" message.
Since we introduced replacement devices, the 'i' used in
start_array() is twice the slot number.

So we need to adjust when printing.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 09:20:26 +10:00
NeilBrown 04e27c2084 Monitor: use the "space protocol" for "Wrong-Level".
"Wrong-Level" is a reason, not a component device, so it should
start with a space to indiciate this to alert().

Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 09:18:55 +10:00
NeilBrown b033913a3c Monitor: Obey "space protocol" when writing to syslog.
"alert" treats the "disc" arg differently if it starts with a space.

At least it does for sending email.  It doesn't for writing to syslog.

Make this consistent and obey the 'space protocol' when writing to
syslog.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 09:17:17 +10:00
NeilBrown 783bbc2b13 reshape: support raid5 grow on certain older kernels.
Kernels between
  c6563a8c38fde3c1c7fc925a v3.5-rc1~110^2~53
and
  b5254dd5fdd9abcacadb5101 v3.5-rc1~110^2~51

allow new_offset to be set, but don't then allow a RAID5
to be reshaped to change that offset.
Due to selective backports, this includes the SLES11-SP3 kernel.

It is quite easy to handle this case in mdadm, so we do.
Specifically: if the reshape with data-offset fails with EINVAL,
abort the data-offset change and try the "old" way.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-26 10:06:26 +11:00
Pawel Baldysiak 4d149ab517 IncRemove: Set "auto-read" only after successful excl open.
"mdadm -If" - triggered from udev rules when disk is removed from OS -
tries to set array in auto-read-only mode. This can interrupt rebuild
process which is started automatically, e.g. if array is mounted and
spare disk is available (I/O error is detected faster than removing
failed disk by mdadm).
This patch prevents "mdadm -If" from setting array into "auto-read-only",
by requiring exclusive open to succeed.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:59:53 +11:00
Pawel Baldysiak f666bcc652 IMSM-orom: make sure, that device list is supported
Devices list in PCI Data Structure is supported only in
3 and above revision. Make sure that this is checked.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:59:53 +11:00
Artur Paszkiewicz 5e1d612824 imsm: simplified multiple OROMs support
Replaced oroms array with list, add_orom() now only appends to this list
and add_orom_device_id() only appends devid_list node to an orom_entry.

Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:56:56 +11:00
NeilBrown e9e6894d4b Assemble: don't ignore the return value from stat.
static checkers complain about that.
So change the code to use 'fstat', as we really don't want
to see an error here..

Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:56:52 +11:00
Jes Sorensen 68641cdb64 write_super_imsm_spares(): C statements are terminated by ;
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:56:47 +11:00
Jes Sorensen 5d94384e93 IncrementalScan(): Make sure 'st' is valid before dereferencing it
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:56:46 +11:00
Jes Sorensen 9eb5ce5ae2 Grow.c: Fix classic readlink() buffer overflow
The buffer passed on to readlink() needs to contain space for the
terminating \0. See 'man 3 readlink' for details.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-25 08:06:45 +11:00
NeilBrown 7a862a020f Don't break long strings onto multiple lines.
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

 "It is OK to\n"
 "break this string\n"


Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:46:53 +11:00