Commit Graph

2777 Commits

Author SHA1 Message Date
NeilBrown 399e0b9709 Subject: Make wait_for and open_dev_excl faster
When we crete or assemble an array, we wait for udev to create the
device file in /dev so that as soon as mdadm complete, the device can
be used.

This waiting is performed in multiples of 200ms, which can sometimes
be too long to wait.

So change to an exponential backoff.  Wait 1, then 2, then 4 msec etc.
Once we get to 256msec, stop backing off and continue waiting 256ms at
a time until we reach the limit which is now 4.608sec rather than 5sec
which it was before.

Ditto for open_dev_excl.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-25 15:56:22 +10:00
NeilBrown 8010806bab tests: add device size tests when change raid leve to/from 0
There was a kernel bug that got this wrong, so better check for it.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-25 15:54:44 +10:00
NeilBrown dea3786ae2 Grow: fix bug in raid0 -> raid5 conversion.
The moment we change a RAID0 to a RAID5 it will try to recovery.  This
will abort quite quickly as there are not spare devices, but it could
confuse the attempt to freeze the array.

So allow 'freeze' to work even on a recovering array.

Signed-off-by: NeilBrown  <neilb@suse.de>
2013-06-25 15:52:58 +10:00
NeilBrown 688eb823bc Make: CXFLAGS should be conditionally assigned.
As the Makefile encourages users to set CXFLAGS for extra flags,
we should only conditionally set it.
That way it can be over-ridden in the environment as well as on
the command line.

Suggested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 16:59:37 +10:00
mwilck@arcor.de 41a663b267 Detail: deterministic ordering in --brief --verbose
Have mdadm --Detail --brief --verbose print the list of devices in
alphabetical order.

This is useful for debugging purposes. E.g. the test script
10ddf-create compares the output of two mdadm -Dbv calls which
may be different if the order is not deterministic.

(I confess: I use a modified "test" script that always runs
"mdadm --verbose" rather than "mdadm --quiet", otherwise this
wouldn't happen in 10ddf-create).

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 16:56:47 +10:00
NeilBrown 0ddc35beed super1: fix space_{before,after} for RAID0
For RAID0 we need to use 'data_size', no 'size' as later is 0.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 16:24:08 +10:00
NeilBrown 1c8b90df15 Grow: allow "--add" with "--grow --level=??"
This is useful for reshaping a RAID0 to a higher level.
The recovery will happen at the same time as the reshape.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 16:13:00 +10:00
NeilBrown e5ba75ce03 Grow: chose default layout when converting from RAID0.
If we don't do this explicitly, we end up keeping the "current"
layout, which is meaningless for RAID0.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 16:06:21 +10:00
NeilBrown 35698c6e91 tests: add test for converting levels to raid0 and back.
Now that I have this mostly working, I should make sure
it doesn't break...

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 15:57:58 +10:00
NeilBrown 9ccfd3be30 test/00names: use appropriate mdadm.conf
Using non-numeric names needs an mdadm.conf setting,
so make sure we have one.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 15:48:40 +10:00
NeilBrown 97e3a6a0e0 Grow: centralise level-change code.
There are now 3 places which change level.
And they all do it slightly differently with different
messages etc.

Make a single function for this and use it.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 15:27:07 +10:00
NeilBrown 6fb8746e4a Grow: remove excess drives when converting to RAID0.
When converting to RAID0, all spares and non-data drives
need to be removed first.
It is possible that the first HOT_REMOVE_DISK will fail because the
personality hasn't let go of it yet, so retry a few times.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 14:08:41 +10:00
NeilBrown 9030d55ff2 Grow: clear new_layout when we change the level.
After changing the level, the meaning of layout numbers changes,
so we will keeping a new_layout value around can cause later confusion.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 13:08:13 +10:00
NeilBrown ddbf2ebb0e Grow: analyse_change needs to set new_size even if nothing much is happening.
This means it will be set for a "--data-offset" only reshape so that
case doesn't complain that the array is getting smaller.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 13:06:32 +10:00
NeilBrown b397d7f3e0 Grow: fix two problems with new_data_offset
1/ ignore failed devices - obviously
2/ We need to tell the kernel which direction the reshape should
   progress even if we didn't choose the particular data_offset
   to use.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 13:04:38 +10:00
NeilBrown a6a78630ac Grow: Try hard to set new_offset.
Setting new_offset can fail if the v1.x "data_size" is too small.
So if that happens, try increasing it first by writing "0".
That can fail on spare devices due to a kernel bug, so if it doesn't
try writing the correct number of sectors.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 13:02:35 +10:00
NeilBrown 534f543296 Grow: Make sure new data-offset is well-aligned
If we choose a new data-offset, make sure it is rounded to a largest
power of to possible, up to 1Meg

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-24 12:55:41 +10:00
NeilBrown e09233d048 Grow: a data_offset should not be tested against 0.
It should always be tested against INVALID_SECTORS!!!

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 16:55:35 +10:00
NeilBrown 97882bc806 tests: add test for non-numeric device names
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 16:44:18 +10:00
NeilBrown 71417de6fe Add test for interaction of --assemble with --incr
and fix the bug that it found.  The refactor of start_array()
missed a test.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 16:34:47 +10:00
NeilBrown ccec2685ab Add test for --update=metadata and fix bug it found.
We were not setting device size correctly for raid0.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 16:28:05 +10:00
NeilBrown 033e098c07 tests: rearrange sometest groupings.
All 'update' tests in 04
More imsm tests in 09

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 13:46:53 +10:00
NeilBrown 1011e8344a Remove lots of unnecessary white space.
Now that I am using white-space mode in Emacs I can see all of this,
and I don't like it :-)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 12:31:45 +10:00
NeilBrown e6dd89da86 Manage: allow "--stop" on kernel names.
e.g.
   mdadm --stop md4

This works even if udev has become confused or killed.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 11:39:14 +10:00
NeilBrown fe7e0e64b0 Manage: split Manage_runstop into Manage_run and Manage_stop
The two branches have virtually nothing in common, so it is simpler if
they are separate.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 11:23:44 +10:00
NeilBrown 8cde842b18 Assemble: when forcing a single-degraded RAID6 array, trigger a 'repair'.
When an active/degraded RAID6 array is force-started we clear the
'active' flag, but it is still possible that some parity is
no in sync.  This is because there are two parity block.
It would be nice to be able to tell the kernel "P is OK, Q maybe not".
But that is not possible.

So when we force-assemble such an array, trigger a 'repair' to fix up
any errant Q blocks.

This is not ideal as a restart during the repair will not be continued
after the restart, but it is the best we can do without kernel help.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 11:09:33 +10:00
NeilBrown 54def20f8b Detail: add device information to --detail --export
We may well want more per-device information here, but this
is a start.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:39:36 +10:00
NeilBrown 64e103fe19 sysfs_read: return devices in same order as in filesystem.
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear.  That makes more sense when exposed to a
human (as the next patch will).

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:33:47 +10:00
Bernd Schubert 2161adce8f raid6check: Check return value of lseek64()
If lseek64() failed it was still writing to the disks, which would introduce
data corruption.

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:05:38 +10:00
Bernd Schubert 2c7b668df7 raid6check: Fix compiler warnings.
Fix some compiler warnings appearing with optimization levels.

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:04:43 +10:00
Bernd Schubert 635b5861c3 raid6check: Use enums for repair type
Using hard coded numbers is error prone and hard to read by humans.

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:04:18 +10:00
Bernd Schubert 3a89d75488 raid6check: Fix memory leaks detected by valgrind
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10
==2389947==    at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947==    by 0x408067: xmalloc (xmalloc.c:36)
==2389947==    by 0x401B19: check_stripes (raid6check.c:151)
==2389947==    by 0x4030C6: main (raid6check.c:521)
==2389947==
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10
==2389947==    at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947==    by 0x408067: xmalloc (xmalloc.c:36)
==2389947==    by 0x401B67: check_stripes (raid6check.c:155)
==2389947==    by 0x4030C6: main (raid6check.c:521)
==2389947==

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:03:44 +10:00
Bernd Schubert f8fcf7a1c5 raid6check: Fix build of raid6check
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.

Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)

Move check_env() from util.c to lib.c

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:03:12 +10:00
NeilBrown 7506f86012 Makefile: add "-O3" to WARN_UNUSED options.
This finds more errors

Also remove some trailing spaces.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:02:54 +10:00
NeilBrown c0f0d8128a Grow: fix up recent changes to set_new_data_offset.
The second 'info2' wasn't being initialised.  So don't use it.

Reported by -O3

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 09:58:02 +10:00
NeilBrown f69bb60857 super0: set uninitialized variable.
Reported by -O3

Signed-off-by: NeilBrown  <neilb@suse.de>
2013-06-19 09:51:01 +10:00
NeilBrown f80057aec5 Assemble/Incr: Don't include spares with too-high event count.
Some failure scenarios can leave a spare with a higher event count
than an in-sync device.  Assembling an array like this will confuse
the kernel.
So detect spares with event counts higher than the best non-spare
event count and exclude them from the array.

Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-17 16:55:31 +10:00
NeilBrown e2f408a4c0 mdadm.h: add little bits of doco for 'struct superswitch'.
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-17 16:04:59 +10:00
NeilBrown a7dec3fd92 Make sure NOFILE resource limit is big enough.
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit  the NOFILE limit.

So raise the limit if it ever looks like it might be a problem.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-30 14:31:09 +10:00
NeilBrown 041b815f17 Incremental: allow --quiet to silence from errors from "-If"
-q is currently ineffective on "mdadm -If".   Messages that are not
usage errors should be suppressed.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-29 09:13:25 +10:00
NeilBrown 8ecf12b9f8 Grow_continue: handle RESHAPE_NO_BACKUP correctly.
If the reshape does not require a backup, Grow_continue can
abort early.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-28 16:58:18 +10:00
NeilBrown 26bf55874d super1: set RESHAPE_NO_BACKUP based on new_offset.
We need to check for a backup iff the data_offset has changed.
Testing against level==10 was an effective but short-sighted approach.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-28 16:58:18 +10:00
NeilBrown f9b08fecd8 Grow: allow for different sized devices when updating data_offset.
It is possible that the devices in an array have different sizes, and
different data_offsets.  So the 'before_space' and 'after_space' may
be different from drive to drive.
Any decisions about how much to change the data_offset must work on
all devices, so must be based on the minimum available space on
any devices.

So find this minimum first, then do the calculation.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-28 16:58:18 +10:00
NeilBrown 199f1a1fad Assemble: allow --update=revert-reshape
This will cause a reshape to start going backwards.
2013-05-28 16:44:23 +10:00
NeilBrown afa368f49a Assemble: --update=metadata converts v0.90 to v1.0
This allows the smooth conversion of legacy 0.90 arrays
to 1.0 metadata.
Old metadata is likely to remain but will be ignored.
It can be removed with
  mdadm --zero-superblock --metadata=0.90 /dev/whatever

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-28 16:44:22 +10:00
NeilBrown d6e4b44fdb super1: fix some casts of signed superblock fields.
These need to be cast to uint32_t before being cast to 'long', else
sign extension doesn't happen on 64bit hosts.

And bitmap_offset is le32, not le64 !!

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-28 16:43:03 +10:00
NeilBrown 5e1863d49d Examine/super1 - report Unused space, before and after.
Might be confusing, or might be useful when reshaping.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-22 16:37:19 +10:00
NeilBrown f79bbf4f69 super1: don't put the bblog at the end of the free space.
It seems like a nice location, but it means that we cannot
decrease the data_offset during a reshape.

So put it just after the bitmap, leaving 32K.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-22 16:00:21 +10:00
NeilBrown 8876bf0bb6 Grow: allow a reshape which only changes --data-offset
Sometimes, that is all we want to do.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-22 12:27:40 +10:00
NeilBrown d7e1f52bb8 Grow: E2BIG should be reporte differently if --data-offset was requested.
In that case the problem is almost certainly that --data-offset is too big.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-22 12:27:35 +10:00