Commit Graph

3084 Commits

Author SHA1 Message Date
NeilBrown 4a3a795a8b tests: raid6 repair is now tested on every different layout.
Signed-off-by: NeilBrown <neilb@suse.de>
2015-08-05 14:57:08 +10:00
NeilBrown d80f7aa9a1 Assemble: correctly capture error from ->write_bitmap
else 'err' might be undefined.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-05 14:55:31 +10:00
NeilBrown 380487fdc9 main: remove use of uninitialized 'rv'.
If c.homecluster was not NULL, might get an
error anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-05 14:53:33 +10:00
NeilBrown 598f8904ac raid6check: don't ignore return value from posix_memalign.
Compilers don't like that.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-08-05 14:51:25 +10:00
NeilBrown 5997585200 Merge branch 'mdadm-3.3.x' 2015-08-03 16:21:37 +10:00
NeilBrown 69818a5c75 Release mdadm-3.3.4
Important bugfix release.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 16:17:13 +10:00
NeilBrown 8360760457 Assemble: really don't assemble IMSM array without OROM.
Previous patch missed on case.

Also print more useful information when rejecting
a device with IMSM metadata.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 16:06:51 +10:00
NeilBrown 187f157bf0 mdassemble: include mapfile support.
This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 15:42:31 +10:00
NeilBrown 7eee461e91 Assemble: don't assemble IMSM array without OROM.
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 15:42:16 +10:00
NeilBrown 53a087b105 mdassemble: include mapfile support.
This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 11:54:16 +10:00
NeilBrown 7d55dca2cc mdassemble: don't try to perform cluster check.
mdassemble is meant to be small an simple, so avoid
trying to check for a cluster.
Currently it doesn't, but it still includes the code,
which doesn't build because the library isn't provided.

So just exclude the get_cluster_name code from mdassemble.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 11:53:01 +10:00
Guoqing Jiang 2cf42394f0 md-cluster: use %-64s to print cluster_name
Left align is better for cluster with name less than 64. Also
make the output of cluster name is aligned with others.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29 17:26:12 +10:00
Guoqing Jiang d7a493695a mdadm: fix wrong condition for go to abort
When parse_cluster_confirm_arg return 0, it means the
arg are parsed successfully, so change !rv to rv.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29 17:26:12 +10:00
NeilBrown 9f2e55a421 Assemble: don't assemble IMSM array without OROM.
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29 14:38:37 +10:00
NeilBrown 653299b699 Merge branch 'cluster'
Now that 3.3.3 is out, it is time to include the cluster-support code.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-27 11:01:08 +10:00
NeilBrown 3cab8baec5 Release mdadm-3.3.3
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:18:17 +10:00
NeilBrown e4fa82a858 mdassemble: add "Name" definition.
That allows it to compile again :-(

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:18:13 +10:00
NeilBrown 27aefbdb3d Don't ignore return value from read and write
New gcc sometimes complains about this.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:11:23 +10:00
NeilBrown d8f82d1d88 bitmap: convert "inline" to "static inline"
Otherwise new gcc ignores them with some compile options.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:10:44 +10:00
NeilBrown 86b77ddf87 Assemble: extend --homehost='<ignore>' to allow --name= to ignore homehost
Also make --homehost='<ignore>' work properly.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 12:50:54 +10:00
NeilBrown 6fba5a339c test: assume recovery has completed if sync_completed says so.
The final completion of a recovery can be delayed, so use
sync_completed to check if it is finished, just not been reaped.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-23 11:17:10 +10:00
NeilBrown 4108d695e3 tests: flushbufs after writing zeros
sometimes the removed device is re-added before the writes
get all the way to the md device - so the array doesn't need
any recovery and the test fails.
So flush first to be safe.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-23 11:09:19 +10:00
NeilBrown d51e39c0a4 test: add -F flag to mkfs
newer versions of mkfs.extX ask before creating a filesystem
on a device which appears to already have a filesystem.
We don't want that, so add the -F flag.
Also be explicit about fs type as one shouldn't depend on defaults.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22 09:58:41 +10:00
NeilBrown 49325eac3a mdadm: document --homehost=any functionality.
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22 09:33:17 +10:00
NeilBrown 00f23a8861 Assemble: improve tests for matching --name= request.
If the name in the array has a home-host, then
require that it matches, or is "any", or requested
homehost is "any".

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22 09:24:36 +10:00
NeilBrown 12ee2a8d75 raid6check: use O_DIRECT instead of O_SYNC.
O_DIRECT is more direct and is faster.
This requires aligned memory allocation, but that isn't hard.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-20 17:17:37 +10:00
NeilBrown eae01ef02f restripe: fix data block order in raid6_2_data_recov
... rather than relying on the caller getting them in the
correct order.
This is better engineering and fixes a bug, but because the
failed_slotX numbers are used later with assumption that
they weren't swapped

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-20 17:15:13 +10:00
NeilBrown 50786d4731 raid6check: various cleanup/fixes
- document meaning of various arrays. In particular:
   stripes[]
   blocks[]
   blocks_page[]
   block_index_for_slot[]

  It needs to be clear if these are indexed by raid_disk
  number or syndrome number.

- changed meaning of block_index_for_slot[].  It didn't seem
  to be used consistently.  It also made use of the block numbers
  in array data ordering, which is not directly relevant for syndrome
  calculations.

- reduced number of args to autorepair and manual_repair
  There don't need both stripes[] and blocks[].  And they don't need
  diskP or diskQ.
  blocks[-1] is the P chunk, blocks[-2] is the Q chunk.
  block_index_for_slot[] can be used to find the target device for
  a particular syndrome block.

- remove stripe locking from within manual_repair, and instead
  use the global stripe locking used for check and autorepair.

- this necessitated changes to raid6_datap_recov and raid5_2data_reov
  so the P and Q blocks could be before or after the data blocks.



Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-20 14:11:33 +10:00
NeilBrown 29a312f2f3 Assemble: really ensure stripe_cache is bit enough to handle new chunk size
Earlier patch:
  56fcbcbb6f
calculated the proper chunk size - but didn't use it..

Let's actually use it this time.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-17 13:10:25 +10:00
NeilBrown ad1a3c2f08 raid6check
fix checking of DDF layouts.

Stuff probably still broken.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-16 12:07:54 +10:00
NeilBrown 76cd79d3d1 raid6check: get device ordering correct for syndrome calculation.
The order of devices used for the syndrome calculation is not
the same as the order of data in the array.
The D block immediately after Q is first, then they continue
cyclicly in raid-disk order, skipping over the P disk if it is seen.

This gets the 'check' right for all layouts other than DDF, which is
quite different.

I haven't confirmed that this does't break repair.


Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-16 11:25:40 +10:00
NeilBrown 0832fb09d9 tests: slow down --stop a bit to allow revert-inplace to work.
revert-inplace would sometimes find that the original reshape had
finished.
So slow down the reshaping during --stop (which needs to be a little
bit fast so that stop doesn't timeout waiting) and don't wait quite
so long before stopping.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-16 10:50:40 +10:00
NeilBrown 7cb2815a15 tests: add 19raid6check
This checks that raid6check finds no errors in newly created array
with all different layouts.
(it doesn't...)

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-16 08:02:52 +10:00
NeilBrown 21a1287ac9 test: clear out old metadata from loop devices.
Old metadata can tempt udev to assemble things, which
just gets in the way.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-16 07:49:14 +10:00
NeilBrown 108bd87457 raid6check: report role of suspect device.
i.e. -2 for Q, -1 for P, 0-N for data.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-10 14:46:59 +10:00
NeilBrown 5bc29745a0 tests: save failure logs to logdir
If --save-logs is given we already save all logs to --logdir
If not, we should still save erroneous logs to --logdir.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-10 14:44:58 +10:00
NeilBrown 439c196491 tests: do not try to 'flushbufs' after stopping a array
If the array is stopped, there is nothing to flush, and
blockdev can signal an error.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-10 14:42:20 +10:00
NeilBrown bc6ccf969e test: add dmesg output to logs on error.
This can help isolate the problem.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-06 15:32:46 +10:00
NeilBrown a76b3a345b test: check sync_action as well when checking for an action.
Some actions only appear in /proc/mdstat after a little delay,
so check in sync_action as well.

This applies when checking for recovery etc, and when waiting for idle.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-06 13:58:19 +10:00
NeilBrown 602b916951 test: speed up reshape when stopping arrays.
--stop needs to wait for reshape to get to a suitable
spot, so having really slow resync isn't helpful.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-06 13:52:04 +10:00
NeilBrown 5c351af129 test: stop all arrays before starting test.
As well a cleaning up loop devices, stop all arrays.
After all, we cannot do the one without the other.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-06 13:48:59 +10:00
NeilBrown 62844a4da6 Grow: remove stray tracing message.
Signed-off-by: NeilBrow <neilb@suse.com>
2015-07-06 13:47:45 +10:00
NeilBrown e3e0d0a843 Manage/stop: don't stop during initial critical section.
If the array is reshaping to more devices, then stopping
during that initial critical section is a bad idea.
So check for it and wait a bit.

Should probably handle final critical section of a reduction
too.
same-size reshape should be handled correctly already.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-06 13:45:39 +10:00
NeilBrown 932be6276e Manage/stop: improve some comments.
This code always confuses me - this might help a bit.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-06 13:37:19 +10:00
NeilBrown 30ddba7de5 Manage/stop: guard against 'completed' being too large.
A race can allow 'completed' to read as 2^63-1, which takes
a long time to count up to.
So guard against that possibility.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-06 13:33:20 +10:00
NeilBrown d3f6cf4f9b Monitor: don't Wait forever on a 'frozen' array.
If Wait() finds the array resync is 'frozen', then wait
a little while to avoid races, but don't wait forever.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-06 13:26:41 +10:00
NeilBrown 5418499ae4 sysfs: reject reads that use the whole buffer.
If a read fills the whole buffer, then we possibly
missed something of the end, and we definitely shouldn't
put a '\0' beyond the end, so just return an error.
This should never happen anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-06 13:21:33 +10:00
NeilBrown bcbb92d4ee Remove some trailing white space
It looks ugly in my editor.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-02 08:26:30 +10:00
NeilBrown 52b6ccad34 Manage: fix no-op test in Manage_stop.
A 'devnm' never starts with '/', so this test is pointless.
The code should use the passed-in devname unless it is clearly
not usable.  So fix it to do that.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-02 08:16:59 +10:00
NeilBrown 9581efb1ae mdstat: discard 'dev' field, just use 'devnm'
These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-02 08:15:10 +10:00