Commit Graph

459 Commits

Author SHA1 Message Date
Corey Hickey cab114c5ca Fix reshape for decreasing data offset
...when not changing the number of disks.

This patch needs context to explain. These are the relevant parts of
the original code (condensed and annotated):

if (dir > 0) {
    /* Increase data offset (reshape backwards) */
    if (data_offset < sd->data_offset + min) {
        pr_err("--data-offset too small on %s\n",
               dn);
        goto release;
    }
} else {
    /* Decrease data offset (reshape forwards) */
    if (data_offset < sd->data_offset - min) {
        pr_err("--data-offset too small on %s\n",
               dn);
        goto release;
    }
}

When this code is reached, mdadm has already decided on a reshape
direction. When increasing the data offset, the reshape runs backwards
(dir==1); when decreasing the data offset, the reshape runs forwards
(dir==-1).

The conditional within the backwards reshape is correct: the requested
offset must be larger than the old offset plus a minimum delta; thus the
reshape has room to work.

For the forwards reshape, the requested offset needs to be smaller than
the old offset minus a minimum delta; to do this correctly, the
comparison must be reversed.

Also update the error message.

Note: I have tested this change on a RAID 5 on Linux 4.18.0 and verified
that there were no errors from the kernel and that the device data
remained intact. I do not know if there are considerations for different
RAID levels.

Signed-off-by: Corey Hickey <bugfood-c@fatooh.org>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2019-02-13 15:07:32 -05:00
Dimitri John Ledkov ebf3be9931 Fix spelling typos.
Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2019-02-11 14:42:50 -05:00
NeilBrown 76d505dec6 Grow: report correct new chunk size.
When using "--grow --chunk=" to change chunk
size, the old chunksize is reported instead of the new.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-12-06 07:57:19 -05:00
NeilBrown 085df42259 Grow: avoid overflow in compute_backup_blocks()
With a chunk size of 16Meg and data drive count of 8,
this calculate can easily overflow the 'int' type that
is used for the multiplications.
So force it to use "long" instead.

Reported-and-tested-by: Ed Spiridonov <edo.rus@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-12-06 07:56:21 -05:00
Mariusz Tkaczyk 84d88fd885 Grow: Frozen array can't be idle
When array is frozen but there is no recovery/reshape in mdstat,
check_idle() will not return error but grow countinue can still working.

Check is array frozen. Do not use sysfs sync_action parameter because it
doesn't exist for Raid0, simply check metadata_version in mdstat.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-08-01 11:57:43 -04:00
Anthony Youngman d94eb07f82 Coverity: Resource leak: close fd before return
Anthony Youngman <anthony@youngman.org.uk>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-07-11 13:11:17 -04:00
Roman Sobanski 5d518de84e mdadm/grow: correct size and chunk_size casting
With commit 4b74a905a6
("mdadm/grow: Component size must be larger than chunk size") mdadm returns
incorrect message if size given to grow was greater than 2 147 483 647 K.
Cast chunk_size to "unsigned long long" instead of casting size to "int".

Signed-off-by: Roman Sobanski <roman.sobanski@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-04-27 09:30:32 -04:00
Mariusz Tkaczyk a3b831c9e1 Grow.c: Block any level migration with chunk size change
Mixing level and chunk changes in one grow operation is not supported.
Mdadm performs level migration correctly and ignores new chunk, but
after migration it tries to write this chunk to sysfs properties.
This is dangerous and can cause unexpected behaviours.

Block it before level migration starts.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-01-25 14:31:47 -05:00
Zhilong Liu 56e1e6ace0 mdadm/grow: correct the s->size > 1 to make 'max' work
s->size > 1 : s->size is '1' when '--grow --size max'
parameter is specified, so correct this test here.

Fixes: 1b21c449e6 ("mdadm/grow: adding a test to ensure resize was required")
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-11-28 11:05:47 -05:00
Guoqing Jiang 5339f99606 To support clustered raid10
We are now considering to extend clustered raid to
support raid10. But only near layout is supported,
so make the check when create the array or switch
the bitmap from internal to clustered.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-11-09 11:56:10 -05:00
Zhilong Liu 1b21c449e6 mdadm/grow: adding a test to ensure resize was required
To fix the commit: 4b74a905a6
(mdadm/grow: Component size must be larger than chunk size)

array.level > 1 : against the raids which chunk_size is meaningful.
s->size > 0 : ensure that changing component size has required.
array.chunk_size / 1024 > s->size : ensure component size should
be always >= current chunk_size when requires resize, otherwise,
mddev->pers->resize would be set mddev->dev_sectors as '0'.

Reported-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-10-11 13:30:55 -04:00
Jes Sorensen 358ef9bfdd Grow: Use all 80 characters
Try to use the full line length and avoid breaking up lines excessively.
Equally break up lines that are too long for no reason.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-10-02 17:21:40 -04:00
Pawel Baldysiak 41b25549f0 Grow: fix switching on PPL during recovery
If raid memeber is not in sync - it is skipped during
enablement of PPL. This is not correct, since the drive that
we are currently recovering to does not have ppl_size and ppl_sector
properly set in sysfs.
Remove this skipping, so all drives are updated during turning on the PPL.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-10-02 16:13:25 -04:00
Zhilong Liu 4b74a905a6 mdadm/grow: Component size must be larger than chunk size
Grow: Changing component size must be larger than current
chunk size against stripe raids, otherwise Grow_reshape()
would set s->size to '0'.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-10-02 15:59:24 -04:00
Tomasz Majchrzak e1b942b9af Grow: stop previous reshape process first
If array is stopped during reshape and assembled again straight away,
reshape process in a background might still be running. systemd doesn't
start a new service if one already exists. If there is a race, previous
process might terminate and new one is not created. Reshape doesn't
continue after assemble.

Tell systemd to restart the service rather than just start it. It will
assure previous service is stopped first. If it's not running, stopping
has no effect and only new process is started.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-10-02 15:41:45 -04:00
NeilBrown 8e5b52cdda Error messages should end with a newline character.
Add "\n" to the end of error messages which don't already
have one.  Also spell "opened" correctly.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-08-16 08:25:07 -04:00
Tomasz Majchrzak 922a58292f Grow: don't allow to enable PPL when reshape is in progress
Don't allow to enable PPL consistency policy when reshape is in progress.
Current PPL implementation doesn't work when reshape is taking place.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-06-09 10:58:36 -04:00
Tomasz Majchrzak b208f817ec Grow: don't allow array geometry change with ppl enabled
Don't allow array geometry change (size expand, disk adding) when PPL
consistency policy is enabled. Current PPL implementation doesn't work when
reshape is taking place.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-06-09 10:56:22 -04:00
Tomasz Majchrzak 07c45a1871 Grow: set component size prior to array size
It is a partial revert of commit 758b327cf5 ("Grow: Remove unnecessary
optimization"). For native metadata component size is set in kernel for
entire disk space. As external metadata supports multiple arrays within
one disk, the component size is set to array size. If component size is
not updated prior to array size update, the grow operation fails.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-06-05 11:06:35 -04:00
Jes Sorensen d16a749444 mdadm: Fixup != broken formatting
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-16 14:09:57 -04:00
Jes Sorensen d7be7d8736 mdadm: Fixup more broken logical operator formatting
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-16 13:59:43 -04:00
Jes Sorensen fc54fe7a7e mdadm: Fixup a large number of bad formatting of logical operators
Logical oprators never belong at the beginning of a line.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-16 13:52:15 -04:00
Zhilong Liu 0a6bff09d4 mdadm/util: unify fstat checking blkdev into function
declare function fstat_is_blkdev() to integrate repeated fstat
checking block device operations, it returns true/1 when it is
a block device, and returns false/0 when it isn't.
The fd and devname are necessary parameters, *rdev is optional,
parse the pointer of dev_t *rdev, if valid, assigned the device
number to dev_t *rdev, if NULL, ignores.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-05 11:04:02 -04:00
Zhilong Liu 99148c19bd change back 0644 permission for Grow.c
Fixes commit:
26714713cd ("mdadm: Change timestamps to unsigned data type.")

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-03 10:50:59 -04:00
Jes Sorensen 9e4524df1c Grow: Grow_continue_command: Avoid aliasing array variable
While this would cause a warning since the two are different types,
lets avoid aliasing an existing variable.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-02 11:46:49 -04:00
NeilBrown a250ce240f Grow_continue_command: ensure 'content' is properly initialised.
Grow_continue_command() call verify_reshape_position(), which assumes
that info->sys_name is initialised.
'info' in verify_reshape_position() is 'content' in Grow_continue_command().

In the st->ss->external != 0 branch of that function, sysfs_init() is called
to initialize content->sys_name.
In the st->ss->external == 0 branch, ->sys_name is not initialized so
verify_reshape_position() will not do the right thing.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-04-20 12:56:21 -04:00
Jes Sorensen 6ae8b2b314 Grow: Stop bothering about md driver versions older than 0.90.00
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-04-05 15:29:29 -04:00
Jes Sorensen dae131379f sysfs: Make sysfs_init() return an error code
Rather than have the caller inspect the returned content, return an
error code from sysfs_init(). In addition make all callers actually
check it.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-30 16:52:37 -04:00
Jes Sorensen 49948a3561 Grow: Do not shadow an existing variable
Declaring 'int rv' twice within the same function is asking for
trouble.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-30 10:46:01 -04:00
Jes Sorensen 758b327cf5 Grow: Remove unnecessary optimization
Per explanation by Neil, this optimization of writing "size" to the
attribute of each device, however when reducing the size of devices,
the size change isn't permitted until the array has been shrunk, so
this will fail anyway.

This effectively reverts 65a9798b58

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-30 10:44:36 -04:00
Jes Sorensen 018a488238 util: Introduce md_set_array_info()
Switch from using ioctl(SET_ARRAY_INFO) to using md_set_array_info()

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 15:43:53 -04:00
Jes Sorensen d97572f5a5 util: Introduce md_get_disk_info()
This removes all the inline ioctl calls for GET_DISK_INFO, allowing us
to switch to sysfs in one place, and improves type checking.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 15:23:50 -04:00
Jes Sorensen 9cd39f0155 util: Introduce md_get_array_info()
Remove most direct ioctl calls for GET_ARRAY_INFO, except for one,
which will be addressed in the next patch.

This is the start of the effort to clean up the use of ioctl calls and
introduce a more structured API, which will use sysfs and fall back to
ioctl for backup.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 14:35:41 -04:00
Jes Sorensen 6ebf34e6bd Grow: Fixup a pile of cosmetic issues
No code change, simply cleanup ugliness.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 12:15:20 -04:00
Artur Paszkiewicz 860f11ed4d Grow: support consistency policy change
Extend the --consistency-policy parameter to work also in Grow mode.
Using it changes the currently active consistency policy in the kernel
driver and updates the metadata to make this change permanent. Currently
this supports only changing between "ppl" and "resync" policies, that is
enabling or disabling PPL at runtime.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 11:35:16 -04:00
Artur Paszkiewicz e97a7cd011 super1: PPL support
Enable creating and assembling raid5 arrays with PPL for 1.x metadata.

When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.

While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 11:33:52 -04:00
NeilBrown 1ab9ed2afb Add 'force' flag to *hot_remove_disk().
In rare circumstances, the short period that *hot_remove_disk()
waits isn't long enough to IO to complete.  This particularly happens
when a device is failing and many retries are still happening.

We don't want to increase the normal wait time for "mdadm --remove"
as that might be use just to test if a device is active or not, and a
delay would be problematic.
So allow "--force" to mean that mdadm should try extra hard for a
--remove to complete, waiting up to 5 seconds.

Note that this patch fixes a comment which claim the previous
wait time was half a second, where it was really 50msec.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-28 14:32:35 -04:00
NeilBrown 2dd271fe70 Retry HOT_REMOVE_DISK a few times.
HOT_REMOVE_DISK can fail with EBUSY if there are outstanding
IO request that have not completed yet.  It can sometimes
be helpful to wait a little while for these to complete.

We already do this in impose_level() when reshaping a device,
but not in Manage.c in response to an explicit --remove request.

So create hot_remove_disk() to central this code, and call it
where-ever it makes sense to wait for a HOT_REMOVE_DISK to succeed.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-28 14:25:23 -04:00
Tomasz Majchrzak cf52eff58a Increase buffer for sysfs disk state
Bad block support has incremented sysfs disk state reported by kernel
("external_bbl") so it became longer than 20 bytes. It causes reshape to
fail as it reads truncated entry from sysfs.

Increase buffer so it can accommodate the string including all state
values currently implemented in kernel at the same time.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-17 09:46:42 -05:00
Mariusz Dabrowski ddab63c7de Allow level migration only for single-array container
IMSM doesn't allow to change RAID level of array in container with two
arrays but array count check is being done too late (after removing disks)
and in some cases (e. g. RAID 0 and RAID 1 migrated to RAID 0) both arrays
become degraded. This patch adds array count check before disks are being
removed.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-10-19 11:26:49 -04:00
Xiao Ni 8800f85381 MDADM:Check mdinfo->reshape_active more times before calling Grow_continue
When reshaping a 3 drives raid5 to 4 drives raid5, there is a chance that
it can't start the reshape. If the disks are not enough to have spaces for
relocating the data_offset, it needs to call start_reshape and then run
mdadm --grow --continue by systemd. But mdadm --grow --continue fails
because it checkes that info->reshape_active is 0.

The info->reshape_active is got from the superblock of underlying devices.
Function start_reshape write reshape to /sys/../sync_action. Before writing
latest superblock to underlying devices, mdadm --grow --continue is called.
There is a chance info->reshape_active is 0. We should wait for superblock
updating more time before calling Grow_continue.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-06-16 13:53:45 -04:00
Mike Lovell 13db17bd1f Use dev_t for devnm2devid and devid2devnm
Commit 4dd2df0966 added a trip through makedev(), major(), and minor() for
device major and minor numbers. This would cause mdadm to fail in operating
on a device with a minor number bigger than (2^19)-1 due to it changing
from dev_t to a signed int and back.

Where this was found as a problem was when a array was created with a device
specified as a name like /dev/md/raidname and there were already 128 arrays
on the system. In this case, mdadm would chose 1048575 ((2^20)-1) for the
array and minor number. This would cause the major and minor number to become
negative when generated from devnm2devid() and passed to major() and minor()
in open_dev_excl(). open_dev_excl() would then call dev_open() which would
detect the negative minor number and call open() on the *char containing the
major:minor pair which isn't a valid file.

Signed-off-by: Mike Lovell <mlovell@bluehost.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-06-03 15:35:26 -04:00
Jes Sorensen 6ac963cef0 Grow: Apply some more consistent formatting to Grow_addbitmap()
This should be purely cosmetic and cause no functional change
... famous last words!

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-05-12 15:27:24 -04:00
Jes Sorensen 4ed129aca7 Grow: Simplify error paths in Grow_addbitmap()
This gets rid of some repeated exit paths, making the code a little
cleaner.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-05-12 15:27:18 -04:00
Jes Sorensen 2ec2b7e9d5 mdadm: Make add_internal_bitmap() return 0 on success
add_internal_bitmap() returned 1 on success and 0 on error which is
inconsistent. This changes it to return 0 on success and use more
reasonable error codes on error.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-05-12 15:19:16 -04:00
Jes Sorensen c152f3610f Grow: Handle failure to load superblock in Grow_addbitmap()
Reported-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-05-12 14:30:10 -04:00
Jes Sorensen dac1b1115f Grow: Grow_addbitmap() reduce indentation
This makes the code a little more readable.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-05-12 14:27:11 -04:00
Guoqing Jiang 81306e021e Change the option from NoUpdate to NodeNumUpdate
Actually, we need to use NodeNumUpdate here to
ensure there are enough spaces for those nodes.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-03-24 12:33:27 -04:00
Guoqing Jiang 31dbeda730 Grow: goto release if Manage_subdevs failed
If failure happened when add disk to array
by grow mode, need to goto release instead
of continue the reshape.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-03-22 13:53:10 -04:00
Yi Zhang a58e0da443 Grow: analyse_change add notification about only 2-device can be convert from RAID1 to RAID5
Notify "Can only convert a 2-device array to RAID5" instead of
"Impossibly level change request for RAID1" when convert from
RAID1 to RAID5 if the disk num is not equal two like RAID4/5->RAID1
did.

Signed-off-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-03-11 12:40:47 -05:00