As we are range-checking 'cd', there is a chance that it is not
in-range. In that case we should include all array indexes with 'cd'
inside the range-tested branch.
Signed-off-by: NeilBrown <neilb@suse.de>
The phys disks table should list all disks, but if the metadata
is corrupt, it might not even list the disk it was read from.
So check for and report any known disks that aren't listed.
Signed-off-by: NeilBrown <neilb@suse.de>
The "used_pdes" value counts the number of physdisk entries that
are in used.
It may not be the last one in use as there may be unused slots in
the middle.
So when were are iterating over phys disks, we need to use max_pdes
and skip unused entries.
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible that one device has seem some reconfig but the other
hasn't. In that case they are still the "same" DDF, even though
one might be older. Such age will be detected by 'seq' differences.
If A is new and B is old, then it is import that
mdadm -I B
mdadm -I A
doesn't get confused because A has the same uuid as B, but compare_super fails.
So: if the seq numbers are different, then just accept as two
different superblocks.
If they are the same, then look to copy data from new to old.
Signed-off-by: NeilBrown <neilb@suse.de>
Testing 'c' and then using 'vdc' assumes that the two are in sync,
but sometimes they aren't.
Testing 'vdc' is safer.
This avoids a crash in some cases when failing/removing/added devices
to a DDF.
Signed-off-by: NeilBrown <neilb@suse.de>
It is conceivable that ->pdnum could be -1, though only if
the metadata is corrupt.
We should be careful not to use it if it is.
Also remove an assignment for pdnum to ->container_member.
This is never used and cannot possibly mean anything.
Signed-off-by: NeilBrown <neilb@suse.de>
Also don't treat two devices with different seq numbers as completely
unrelated.
This allows split-brain detection to work properly for ddf.
Signed-off-by: NeilBrown <neilb@suse.de>
When we call "getinfo_super", we report the working/failed status
of the particular device, and also (via the 'map') the working/failed
status of every other device that this metadata is aware of.
It is important that the way we calculate "working or failed" is
consistent.
As it is, getinfo_super_ddf() will report a spare as "working", but
every other device will see it as "failed", which leads to failure to
assemble arrays with spares.
For getinfo_super_ddf (i.e. for the container), a device is assumed
"working" unless flagged as DDF_Failed.
For getinfo_super_ddf_bvd (for a member array), a device is assumed
"failed" unless DDF_Online is set, and DDF_Failed is not set.
Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
See commit 357ac10678
which made a similar change for super-intel, and really should have
fixed DDF at the same time.
Signed-off-by: NeilBrown <neilb@suse.de>
Some vendor DDF structures interpret workspace_lba
very differently then us. Make a sanity check on the value
before using it for config_size.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The sequence number check in compare_super_ddf was broken,
anchor sequence number is always -1.
With this patch, mdadm will refuse to add a disk with non-matching
sequence number.
This fixes Francis Moreau's problem reported with subject
"mdadm 3.3 fails to kick out non fresh disk".
FIXME: More work is needed here. Currently mdadm won't even add the
disk to the container, that's wrong. It should be added as a spare.
Signed-off-by: NeilBrown <neilb@suse.de>
Print an array name in brief output, like IMSM does.
SUSE's YaST2 (libstorage) needs this in order to detect MD arrays
during installation.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The same algorithm was used in getinfo_super_ddf_bvd and
container_content_ddf. Put it in a common function.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
If there isn't, we currently write the second copy at some
random location :-)
Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some fake RAID BIOSes (in particular, LSI ones) change the
VD GUID at every boot. These GUIDs are not suitable for
identifying an array. Luckily the header GUID appears to
remain constant.
We construct a pseudo-UUID from the header GUID and those
properties of the subarray that we expect to remain constant.
This is only array name and index; all else might change e.g.
during grow.
Don't do this for all non-MD arrays, only for those known
to use varying volume GUIDs.
This patch obsoletes my previous patch "DDF: new algorithm
for subarray UUID"
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
That is the same value that IMSM uses. The current default of 200ms
seems to have been copied from the native MD meta data. That value
appears to be much too low for DDF, given that writing the DDF meta
data means that easily several MB worth of data need to be written to
disk.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Set safe_mode_delay to something >0, otherwise all container subarrays
assembled will have safe_mode_delay=0. That will break the assumption that
meta data becomes clean after running mdadm --wait-clean.
Use the same value as in getinfo_super_ddf_bvd. It would be cleaner
to call that directly from container_content_ddf, but I need to check
possible side effects first.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Have mdadm -E --export print the number of RAID devices,
like other meta data formats do. Anaconda (RHEL/CentOS installer)
depends on it.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
At this point 'di' and 'rv' both have the same value. gcc doesn't
realise that and a human reader might not either.
'rv' makes more sense too, so use that.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible that mdadm creates a new subarray containing failed
devices. This may happen if a device has failed, but the meta data
containing that information hasn't been written out yet.
This code tests for this situation, and handles it in the monitor.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
The recent change to skip over invalid conf entries was bad because
it could leave garbage on the disk.
But we don't to write each entry separately as the writes a O_DIRECT
and so synchronous so it takes way too long.
So allocate a large buffer (probably the one used to read the config records)
and fill that then write it all at once.
Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
We should skip known failed disks when allocating space for
new arrays. This fixes the problem with 10ddf-fail-spare.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Commit c7079c84 arrange for DDF to forget about any device
that is failed and not still marked as part of any array.
However such devices could still be part of the container and this
removal and updating of 'pdnum' can result in multiple devices having
the same pdnum. This in turn easily leads to confusion and
corruption.
So only discard pd entries for devices which are failed, not listed in
any virtual device, and for which we don't have a handle on the
device.
pd entries will not get removed until a new device is added after
the device has been removed from the container, either by
"mdadm --remove" or by assembling without the failed devices.
Reported-by: Albert Pauw <albert.pauw@gmail.com>
Analysed-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Recent commit 273989b93a
skipped writing some large blocks of 0xFF, but didn't seek
over the space, so subsequent data was written wrongly.
When we don't write, we need to seek.
Signed-off-by: NeilBrown <neilb@suse.de>
With the previous patch, mdmon will provide the layout property for us.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Adds more verbose debugging in ddf_set_disk, to understand failures
better.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Try to determine problem if load_ddf_header fails. May be useful
for determining compatibility problems with Fake RAID BIOSes.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
I needed this for tracking a bug with wrong offsets after array
creation.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
In particular, include refnum for better tracking. This makes
it a little easier for humans to track what happened to which disk.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Move the check for good drives in the dl loop - otherwise dl
may be NULL and mdmon may crash.
Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
This call to validate_geometry is really rather gratuitous.
It is purely about the fact that super0 cannot use more than 4TB.
So just make it an explicit test - less confusing that way.
With this, validate_geometry is only called from Create, which
makes it easier to reason about.
Also validate_geometry is now never passed NULL for the 'chunk'
parameter, so we can remove those annoying tests for NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
Metadata updates for secondary RAID (RAID10) need to cover
all BVDs. Compare with code in write_init_super_ddf().
Signed-off-by: NeilBrown <neilb@suse.de>
Fix a bug that caused the wrong conf record to be used to derive
data offset and size on secondary RAID (RAID10).
Signed-off-by: NeilBrown <neilb@suse.de>