There are now 3 places which change level.
And they all do it slightly differently with different
messages etc.
Make a single function for this and use it.
Signed-off-by: NeilBrown <neilb@suse.de>
When converting to RAID0, all spares and non-data drives
need to be removed first.
It is possible that the first HOT_REMOVE_DISK will fail because the
personality hasn't let go of it yet, so retry a few times.
Signed-off-by: NeilBrown <neilb@suse.de>
After changing the level, the meaning of layout numbers changes,
so we will keeping a new_layout value around can cause later confusion.
Signed-off-by: NeilBrown <neilb@suse.de>
This means it will be set for a "--data-offset" only reshape so that
case doesn't complain that the array is getting smaller.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ ignore failed devices - obviously
2/ We need to tell the kernel which direction the reshape should
progress even if we didn't choose the particular data_offset
to use.
Signed-off-by: NeilBrown <neilb@suse.de>
Setting new_offset can fail if the v1.x "data_size" is too small.
So if that happens, try increasing it first by writing "0".
That can fail on spare devices due to a kernel bug, so if it doesn't
try writing the correct number of sectors.
Signed-off-by: NeilBrown <neilb@suse.de>
When an active/degraded RAID6 array is force-started we clear the
'active' flag, but it is still possible that some parity is
no in sync. This is because there are two parity block.
It would be nice to be able to tell the kernel "P is OK, Q maybe not".
But that is not possible.
So when we force-assemble such an array, trigger a 'repair' to fix up
any errant Q blocks.
This is not ideal as a restart during the repair will not be continued
after the restart, but it is the best we can do without kernel help.
Signed-off-by: NeilBrown <neilb@suse.de>
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear. That makes more sense when exposed to a
human (as the next patch will).
Signed-off-by: NeilBrown <neilb@suse.de>
If lseek64() failed it was still writing to the disks, which would introduce
data corruption.
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
Using hard coded numbers is error prone and hard to read by humans.
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B19: check_stripes (raid6check.c:151)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B67: check_stripes (raid6check.c:155)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.
Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)
Move check_env() from util.c to lib.c
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Some failure scenarios can leave a spare with a higher event count
than an in-sync device. Assembling an array like this will confuse
the kernel.
So detect spares with event counts higher than the best non-spare
event count and exclude them from the array.
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit the NOFILE limit.
So raise the limit if it ever looks like it might be a problem.
Signed-off-by: NeilBrown <neilb@suse.de>
We need to check for a backup iff the data_offset has changed.
Testing against level==10 was an effective but short-sighted approach.
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible that the devices in an array have different sizes, and
different data_offsets. So the 'before_space' and 'after_space' may
be different from drive to drive.
Any decisions about how much to change the data_offset must work on
all devices, so must be based on the minimum available space on
any devices.
So find this minimum first, then do the calculation.
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the smooth conversion of legacy 0.90 arrays
to 1.0 metadata.
Old metadata is likely to remain but will be ignored.
It can be removed with
mdadm --zero-superblock --metadata=0.90 /dev/whatever
Signed-off-by: NeilBrown <neilb@suse.de>
These need to be cast to uint32_t before being cast to 'long', else
sign extension doesn't happen on 64bit hosts.
And bitmap_offset is le32, not le64 !!
Signed-off-by: NeilBrown <neilb@suse.de>
It seems like a nice location, but it means that we cannot
decrease the data_offset during a reshape.
So put it just after the bitmap, leaving 32K.
Signed-off-by: NeilBrown <neilb@suse.de>
If space_after and space_before are zero (the default) then assume that
metadata doesn't support changing data_offset.
Signed-off-by: NeilBrown <neilb@suse.de>
If we can modify the data_offset, we can avoid doing any backups at all.
If we can't fall back on old approach - but not if --data-offset
was requested.
Signed-off-by: NeilBrown <neilb@suse.de>
raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.
Signed-off-by: NeilBrown <neilb@suse.de>