If the mdadm thread that monitors a reshape gets SIGTERM it should
exit cleanly and clear the 'suspended' region of the array.
However it mustn't clear 'sync_max' as that would allow the
reshape to continue unmonitored.
If the thread ever does get killed, the array should really be
shutdown soon after if possible.
Signed-off-by: NeilBrown <neilb@suse.de>
This call to validate_geometry is really rather gratuitous.
It is purely about the fact that super0 cannot use more than 4TB.
So just make it an explicit test - less confusing that way.
With this, validate_geometry is only called from Create, which
makes it easier to reason about.
Also validate_geometry is now never passed NULL for the 'chunk'
parameter, so we can remove those annoying tests for NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
This was waiting on "reshape_position" which doesn't
get update events.
Before sysfs_wait was introduced, the code to wait didn't
wait at all, so it spun.
With sysfs_wait, it would wait forever.
Change to wait in sync_completed which does get events.
Signed-off-by: NeilBrown <neilb@suse.de>
We have several places that wait for activity on a sysfs
file. Combine most of these into a single 'sysfs_wait' function.
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the metadata on a device to be saved and later restored.
This can be useful before experimenting on an array that is misbehaving.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Transition from "degraded" to "recovery" made in OROM is slightly different
than the same transision in mdadm. Missing disk is not removed from list of
raid devices, but just from map. Therefore mdadm should not end migration
basing on existence of list of missing disks but should rely on count of
failed disks.
Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Tested-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.
So get rid of devnum (a number) and use devnm (a 32char string).
eg.
md0
md_d2
md_home
Signed-off-by: NeilBrown <neilb@suse.de>
find_intel_devices() has take a little while to run as it scans
some directory tree, and the result isn't likely to change
often.
So cache the value and only discard it after 10 seconds.
Signed-off-by: NeilBrown <neilb@suse.de>
Attaching disks to multiple controllers of the same type has been
allowed so far. Now spanning between multiple controllers is disallowed
at all by IMSM metadata.
Signed-off-by: Marcin Tomczak <marcin.tomczak@intel.com>
Reviewed-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
It is important to check for compatibility with 'platform' or
Option ROM when creating or changing and array. However there is no
real need when simply assembling the array.
On some systems there are situations where the platform information is
not available. e.g. on some UEFI systems, UEFI is not available
during 'kdump' handling. This makes it impossible to assemble
an IMSM array to receive the dump.
So remove the requirements that the platform be visible to assemble
an IMSM array.
Signed-off-by: NeilBrown <neilb@suse.de>
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...
The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
This is currently only useful for 1.x metadata and will allow an
explicit --data-offset request on command line.
Signed-off-by: NeilBrown <neilb@suse.de>
Inappriopriate error messages (e.g. mdadm: platform does not support
raid5 with 0 disk) have been displayed when too small size was given.
This patch fixes it.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Size expansion to the 'max' value has been broken since
the following patch:
commit d04f65f48c
Change the values for "max size" from -1 to 1.
This patch re-enables it.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Both are impossible, and '1' allows size to be unsigned,
which is neater.
Also #define MAX_SIZE to be '1' to make it all more explicit.
Signed-off-by: NeilBrown <neilb@suse.de>
If we change some functions to accept 'verbose', where <0 means to be
quiet, in place of 'quiet', then we will be able to merge
'quiet' and 'verbose' together for simplicity.
Signed-off-by: NeilBrown <neilb@suse.de>
malloc should never fail, and if it does it is unlikely
that anything else useful can be done. Best approach is to
abort and let some super-daemon restart.
So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit. Then use those
removing all the tests for failure.
Also replace all "malloc;memset" sequences with 'xcalloc'.
Signed-off-by: NeilBrown <neilb@suse.de>
We do not check the return value of sysfs_get_ll() now. It is wrong.
If reading of the sysfs "degraded" key does not succeed,
the "new_degraded" variable will not be initiated
and accidentally it can have the value of "degraded" variable.
In that case the change of degradation will not be checked.
It happens if mdadm is compiled with gcc's "-fstack-protector" option
when one tries to stop a volume under reshape (e.g. OLCE).
Reshape seems to be finished then (metadata is in normal/clean state)
but it is not finished, it is broken and data are corrupted.
Now we always check the return value of sysfs_get_ll().
Even if reading of the sysfs "degraded" key does not succeed
(rv == -1) the change of degradation will be checked.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
We do not check if requested size of expansion is larger than maximum
available size now. If it is larger the output message is a bit misleading,
for example:
mdadm: Cannot set size on array members.
mdadm: Cannot set device size for /dev/md/vol: Device or resource busy
Now we check if requested size of expansion is larger than maximum
available size and the appropriate output message was added.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Use closedir() on dirp from opendir() to avoid resource leaking.
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This should be harmless, but lets be consistent and not try to close a
negative file descripter.
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Spare superblock doesn't depend on other spares in container.
When loading container metadata thunderdome
may pick a small disk for the champion. This will result in incorrect
interpretation of sizes of other disks in container when joint superblock
is returned. If any disk in container has the 2TB attribute set, the result
must have it set too.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
If system is rebooted during rebuild, md driver changes sync_action
from 'recover' to 'idle' (during stopping all md devices).
If mdmon is still running then, it detects the change of sync_action state,
finishes rebuild and writes metadata to disks. After computer's restart
the RAID volume is in Normal state in OROM and rebuild seems to be finished.
After system's start-up RAID volume is in auto-read-only state
and metadata is in Dirty state. Rebuild seems to be finished but it is not.
Data is inconsistent (out-of-sync).
When mdmon detects the change of sync_action from 'recover' to 'idle',
it has to check if rebuild is really finished. Appropriate test was added.
Now mdmon examines each volume's member if it is being rebuilt.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When mdadm is compiled using e.g. 'everything' option, mdassemble
compilation is broken.
Change code to enable mdassemble compilation.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Put currently existing code for alignment correction in to function
imsm_component_size_aligment_check() and use it for align component size
to chunk size during volume size expansion operation.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Add support for setting max size for size change operation using
imsm_get_free_size() function for computing maximum available space.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Add function imsm_imsm_get_free_size() using part of code from function
reserve_space().
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Add metadata rollback specific code for imsm.
Let reshape_super() ability to differentiate metadata apply and rollback
actions.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Function reshape_super() guards metadata changes.
It is used to apply changes rollback in error case also.
As change (apply and rollback) can be not bi-directional reshape_super()
has to know if current action is metadata change that should be guarded
using metadata restrictions, or this is metadata rollback change
executed due to error occurrence.
In second case change has to be unconditional.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Add new meatdata update type imsm_update_size_change, and update metadata
for volume size expansion operation.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Patch adds ability to function imsm_analyze_change() for:
1. Detect size change request for volume operation.
2. Check and correct size for change.
3. Set new change kind to CH_ARRAY_SIZE
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Function imsm_num_data_members() returns wrong value for raid 1 and 10.
It returns all data member but it should return number of unique data
members (excluding mirror devices)
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Display maximum volumes per array and per controller
in --detail-platform command.
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This patch ensures metadata attribute is set correctly also for spares.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Creation of a container using disks over 2TB should be allowed only when orom supports large disks
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
When orom does not support volumes over 2TB the creation should be disallowed
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>