Commit Graph

2897 Commits

Author SHA1 Message Date
NeilBrown 1c2cdb9072 tests: handle new raid10/ddf geometries.
Recent changes to support more ddf geometries using raid1e
requires updates to tests.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 476066a3d5 DDF: add support of --data-offset when creating array.
Infrastructure is there, so use it.

This requires making sure that ->data_offset is correctly set, even
for containers.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown fca6552000 DDF: factor out common code for search through extents.
Each place the uses "get_extents" has slightly different search code
to look through the result.

Factor this out into a single find_space() function.

This is will make it easier to add --data-offset support.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 708997ffb7 DDF: allow for unused slots when creating map list for getinfo_super_ddf.
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 4fe903aa8b DDF: DDF_Missing devices should not be reported as 'working' by getinfo_super_ddf
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 25532b88a0 DDF: remove old and wrong comment about settinig raid_disk.
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown ff84d05210 DDF: provide simple detail_super() implementation.
Just print the GUID, Seq and number of VDs in the container.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown f70d549f12 DDF: support more RAID10 levels.
The DDF "RAID1E" level is similar to md "raid10".

So use raid10 to support RAID1E, and create RAID1E for raid10
configs not already supported.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 2770f45a43 DDF: explain why spare_refs are ignored.
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 55e0007423 DDF: use array_size from metadata.
If some other controller sets a number smaller than a calculation
would give us, we really should honour it.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 2a351cea60 DDF: set utime for container from timestamp is superblock.
Also be more consistent about setting events from seq in superblock.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 30bee02013 DDF: don't assume the anchor is fully up-to-date.
We currently copy the anchor to both primary and secondary
blocks.
This assumes that the anchor is uptodate, but it might not be.
We should trust the 'active' block and copy from there.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 609ce16109 DDF: update timestamp/seqnum for virtual disks when config changes.
- we weren't updating this timestamp at all
- the 'vd_config' seqnum was updated on every write of the metadata,
  which is excessive.  Just update it when there is a change.


Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 07de268426 DDF: update timestamp in DDF header.
Doco says:
  Header update timestamp. MUST be set when the DDF
  header is updated.

So I guess we should.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
NeilBrown 0d255ff84e DDF: avoid ref outside array in getinfo_super_ddf_bvd
As we are range-checking 'cd', there is a chance that it is not
in-range.  In that case we should include all array indexes with 'cd'
inside the range-tested branch.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:47 +10:00
NeilBrown d2ec75fb3e DDF: examine_pds to also list devices that aren't in the metadata.
The phys disks table should list all disks, but if the metadata
is corrupt, it might not even list the disk it was read from.
So check for and report any known disks that aren't listed.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:47 +10:00
NeilBrown 217dead48f DDF: fix usage of ->used_pdes
The "used_pdes" value counts the number of physdisk entries that
are in used.
It may not be the last one in use as there may be unused slots in
the middle.

So when were are iterating over phys disks, we need to use max_pdes
and skip unused entries.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:47 +10:00
NeilBrown 41bcbc14c4 DDF: more guards against pdnum being negative.
With consistent metdata, pdnum should never be negative,
but it is better to be safe than sorry.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:47 +10:00
NeilBrown 4e0eb0dbbd Reshape: use systemd to continue containers as well as native arrays.
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-20 17:00:27 +10:00
NeilBrown b0b67933dc Grow: split continue_via_systemd into a separate function.
This allows it to be used for containers too.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-20 16:56:51 +10:00
NeilBrown b0140ae83c Grow: add 'forked' option to reshape_container.
This is a better match for reshape_array() and means that
"mdadm --grow --continue" will run in the foreground, which
makes more sense.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-20 16:51:56 +10:00
NeilBrown 5e76dce1ac Grow: try to let "--grow --continue" from systemd complete a reshape.
If "--assemble" or "--incremental" is started by udev, then
monitoring the reshape in the background won't work.

So try asking systemd to start a grow-continue.

If that fails, just do it the old way.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-15 14:23:21 +10:00
NeilBrown 54ded86fbd Grow: store a link to current backup file in /run/mdadm or similar.
Subsequent patch will allow the background part of "mdadm --grow" to
be run from systemd.  This can require the passing of a backup file
name.
To do this, store that name as a symlink in /run/mdadm (or MAP_DIR)
and look for it when appropriate.

It might be useful to also store the name across reboot, but that
would be a different patch.  We would need to use the uuid to identify
it, and store it in stable storage.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-15 14:23:16 +10:00
Artur Paszkiewicz 39917e56cc Create: don't default to bitmap=internal when it is not supported
For large arrays (component size > 100GB) if write-intent bitmap is not
enabled, then it is set by default to "internal", even if the metadata
format does support internal bitmaps, which causes Create to fail.

This patch adds checking if add_internal_bitmap is set in the
superswitch before setting bitmap_file to "internal".

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-01 10:14:59 +10:00
Artur Paszkiewicz 19ad4b2cb2 Fix race between --create and --incremental
This modifies locking in Create to eliminate a situation where
--incremental can assemble a device between write_init_super() and
add_disk(), which causes Create to fail.

It sporadically occurs e.g. when metadata is written on a device,
causing an udev change event which triggers mdadm --incremental.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-01 10:14:53 +10:00
NeilBrown 8d1d32bb33 systemd: various fixes for boot with container-arrays.
1/ Add systemd shutdown script to ensure DDF and IMSM are
   clean before we actually shutdown

2/ Get udev to tell systemd to run the mdmon@mdXXX.service
   units when a member array appears.

   If we boot off a member array (with dracut at least),
   the mdmon started in the initramfs will lose track of
   /sys etc, so we need to restart it.
   systemd will try to forget about it too (but not actually
   kill it because we said not to do this).
   Having udev tell it to start it will allow a new mdmon to
   run which can see /sys, and systemd will know about it.

3/ Always use --offroot and --takeover when starting mdmon with
   systemd
   --offroot is needed else shutdown will hang.
   --takeover is needed incase an mdmon was started earlier
   (e.g. in initramfs).
   Neither hurt if they aren't actually needed.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-08 17:37:08 +10:00
NeilBrown f43f5b3299 DDF: Don't fail compare_super_ddf due to re-configure changes.
It is possible that one device has seem some reconfig but the other
hasn't.  In that case  they are still the "same" DDF, even though
one might be older.  Such age will be detected by 'seq' differences.

If A is new and B is old, then it is import that
  mdadm -I B
  mdadm -I A

doesn't get confused because A has the same uuid as B, but compare_super fails.

So: if the seq numbers are different, then just accept as two
different superblocks.
If they are the same, then look to copy data from new to old.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-02 15:26:35 +11:00
NeilBrown 188d31ed2b DDF: fix possible mdmon crash when updating metadata.
Testing 'c' and then using 'vdc' assumes that the two are in sync,
but sometimes they aren't.
Testing 'vdc' is safer.
This avoids a crash in some cases when failing/removing/added devices
to a DDF.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-02 15:14:43 +11:00
NeilBrown a44e993e37 DDF: guard against ->pdnum being negative.
It is conceivable that ->pdnum could be -1, though only if
the metadata is corrupt.
We should be careful not to use it if it is.

Also remove an assignment for pdnum to ->container_member.
This is never used and cannot possibly mean anything.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-02 13:34:10 +11:00
NeilBrown e5a03804dc DDF: mark missing-on-assembly device properly.
As well as removing from the array we really should mark
it is 'failed', and mark the array as degraded.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-01 16:15:06 +11:00
NeilBrown 56cb05c463 DDF: Fix assorted typos and do some reformatting.
..because it is more fun when new patches are harder to apply to old version :-)

Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-01 16:02:08 +11:00
Piergiorgio Sartor 497b6d6bd2 raid6check.c: move manual repair code to separate function
This patch cleans up a bit the code by moving
the second repair mode, that is the manual
repair, to a separate function.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-31 16:06:34 +11:00
Piergiorgio Sartor c22ac3b107 raid6check.c: move autorepair code to separate function
This patch cleans up a bit the code by moving
the autorepair part into a separate function.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-31 16:06:34 +11:00
Piergiorgio Sartor 3cfd0297a1 raid6check.c: lock the stripe until necessary
The stripe locking mechanism must be atomic between
the check and the, potential, autorepair.
For this reason, the autorepair code needs to be just
after the check and both parts (check and autorepair)
must be excuted under stripe lock.
Of course, the manual repair can operate as before.
This patch reorganize the code and provides the single,
atomic, stripe lock.
It should be confirmed that this new locking is not
too demanding.
In case it is, some other solutions will be required
(suggestions wellcome).

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-31 16:06:33 +11:00
NeilBrown 303a263544 ddf-sudden-degraded test fix.
Change how sudden-degraded devices should appear.
We don't record failure, we record that the device isn't there.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-26 14:30:21 +11:00
NeilBrown 5a46fcd7f5 DDF: when first activating an array, record any missing devices.
We must remember they are missing so that if they re-appear we
don't get confused.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-26 14:26:53 +11:00
NeilBrown eba2859f50 DDF: report seq counter as events.
Also don't treat two devices with different seq numbers as completely
unrelated.

This allows split-brain detection to work properly for ddf.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-26 14:19:43 +11:00
Jes Sorensen 76d0f1886f Work around architectures having statfs.f_type defined as long
Having RAMFS_MAGIC defined as 0x858458f6 causing problems when trying
to compare it directly against statfs.f_type being cast from long to
unsigned long.

This hack is extremly ugly, but it should at least do the right thing
for every situation.

Thanks to Arnd Bergmann for suggesting the fix.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-20 09:24:27 +11:00
NeilBrown fdcd157a80 tests: add test that DDF marks missing devices as failed on assembly.
If we assemble a newly-degraded array, the missing devices must be marked
as 'failed' so we don't expect them in future.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-11 17:11:08 +11:00
Pawel Baldysiak 2167de78aa mdmon@.service: Change type of process start-up to 'forking'.
Mdadm does not wait enough time when mdmon is started by systemd.
It causes various problems with behaviour of a RAID volume with external metadata.
For example: mdmon does not update a value of checkpoint during migration
and second RAID5 volume is read-only after reboot done during
container reshape (both problems occur with IMSM matadata).
If a type of process start-up is changed to 'forking', systemctl will
wait until mdmon (parent) process exits after calling fork.
This way mdmon will always be fully initialized after start_mdmon
and these problems will not occur.
In this case it is recommended to add a path to PIDFile, so that systemd
does not have to guess a PID of the mdmon process.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-11 14:42:57 +11:00
NeilBrown 56bbc588f7 Assemble: change load_devices to return most_recent 'st' value.
This means that

	st->ss->getinfo_super(st, content, NULL);
	clean = content->array.state & 1;

will get an up-to-date value for 'clean'.  This fix allows
  tests/03r5assem-failed
to work.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-25 15:04:16 +11:00
NeilBrown 9ee314dab9 Assemble: re-arrange freeing of 'tst' in load_devices().
When we return in error, we need to free(tst), and ->free_super(tst);
Sometimes we didn't.

Also the final ->free_super(tst) should be followed by free(tst)
but wasn't.

Move that file free forward in the code a bit as we will want to use
the tst there in the next patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-25 14:59:12 +11:00
NeilBrown df842e69a3 Assemble: allow load_devices to change the 'st' which is passed in.
The given 'st' might not be best.  Making this interface change
will allow load_devices to return a better 'st'.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-25 14:54:34 +11:00
NeilBrown 06f3dae93a New test: 03r5assem-failed
This test currently fails, confirming a bug which was recently
reported.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-25 14:52:14 +11:00
Piergiorgio Sartor 237e40cef2 raid6check.c: reduce verbosity
This patch will remove some legacy code.
It is part of the verbosity "cleanup".
In any case, if information about the P
and Q parity mismatches is required, it
should go inside the code handling page
size blocks, not full stripe size.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-06 09:28:24 +11:00
Piergiorgio Sartor e645b3417c raid6check.c: add O_SYNC to open
It could be better to make sure the
data reaches the disks, so open the
drives with O_SYNC flag.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-04 16:06:46 +11:00
Piergiorgio Sartor 21d648132a raid6check.c: fix Q parity generation
In the transition to 4K page processing,
the Q parity generation had a wrong offset
in the buffer.
This patche fix this.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-04 16:06:46 +11:00
Piergiorgio Sartor afc755e9a6 raid6check.c: fix position printout
This patch make a bit more clear
the position, in the disk, where
an error is found.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-04 16:06:46 +11:00
Piergiorgio Sartor 15c1bfb34c raid6check.c: reduce verbosity
This patch removes some printouts, which
are not really useful here.
These could be re-added later, in case a
verbosity parameter will be provided.

Signed off: piergiorgio.sartor@nexgo.de

Signed-off-by: NeilBrown <neilb@suse.de>
2014-02-04 16:06:46 +11:00
Piergiorgio Sartor 3b9c96032c raid6check.c add page size check and repair
raid6check current performs checks and repair on a whole chunk at a
time.  This is often not ideal as corruption can happen with smaller
granularity.

This patches changes raid6check to use a page-size (4K) granularity.

We still process a chunk at a time, but within each chunk we process a
page at a time.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-23 12:18:35 +11:00