When an active/degraded RAID6 array is force-started we clear the
'active' flag, but it is still possible that some parity is
no in sync. This is because there are two parity block.
It would be nice to be able to tell the kernel "P is OK, Q maybe not".
But that is not possible.
So when we force-assemble such an array, trigger a 'repair' to fix up
any errant Q blocks.
This is not ideal as a restart during the repair will not be continued
after the restart, but it is the best we can do without kernel help.
Signed-off-by: NeilBrown <neilb@suse.de>
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear. That makes more sense when exposed to a
human (as the next patch will).
Signed-off-by: NeilBrown <neilb@suse.de>
If lseek64() failed it was still writing to the disks, which would introduce
data corruption.
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
Using hard coded numbers is error prone and hard to read by humans.
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B19: check_stripes (raid6check.c:151)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10
==2389947== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947== by 0x408067: xmalloc (xmalloc.c:36)
==2389947== by 0x401B67: check_stripes (raid6check.c:155)
==2389947== by 0x4030C6: main (raid6check.c:521)
==2389947==
Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.
Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)
Move check_env() from util.c to lib.c
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Some failure scenarios can leave a spare with a higher event count
than an in-sync device. Assembling an array like this will confuse
the kernel.
So detect spares with event counts higher than the best non-spare
event count and exclude them from the array.
Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit the NOFILE limit.
So raise the limit if it ever looks like it might be a problem.
Signed-off-by: NeilBrown <neilb@suse.de>
We need to check for a backup iff the data_offset has changed.
Testing against level==10 was an effective but short-sighted approach.
Signed-off-by: NeilBrown <neilb@suse.de>
It is possible that the devices in an array have different sizes, and
different data_offsets. So the 'before_space' and 'after_space' may
be different from drive to drive.
Any decisions about how much to change the data_offset must work on
all devices, so must be based on the minimum available space on
any devices.
So find this minimum first, then do the calculation.
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the smooth conversion of legacy 0.90 arrays
to 1.0 metadata.
Old metadata is likely to remain but will be ignored.
It can be removed with
mdadm --zero-superblock --metadata=0.90 /dev/whatever
Signed-off-by: NeilBrown <neilb@suse.de>
These need to be cast to uint32_t before being cast to 'long', else
sign extension doesn't happen on 64bit hosts.
And bitmap_offset is le32, not le64 !!
Signed-off-by: NeilBrown <neilb@suse.de>
It seems like a nice location, but it means that we cannot
decrease the data_offset during a reshape.
So put it just after the bitmap, leaving 32K.
Signed-off-by: NeilBrown <neilb@suse.de>
If space_after and space_before are zero (the default) then assume that
metadata doesn't support changing data_offset.
Signed-off-by: NeilBrown <neilb@suse.de>
If we can modify the data_offset, we can avoid doing any backups at all.
If we can't fall back on old approach - but not if --data-offset
was requested.
Signed-off-by: NeilBrown <neilb@suse.de>
raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.
Signed-off-by: NeilBrown <neilb@suse.de>
1/ these must allow for bad-block-list
2/ they must match the kernel, which has a 32k buffer after the
superblock.
Signed-off-by: NeilBrown <neilb@suse.de>
For RAID5, not being able to set new_data_offset because of
old kernel is not a problem. So make this fatal on for RAID10.
Also remove an unused assignment to 'rv'.
Signed-off-by: NeilBrown <neilb@suse.de>
This allows the metadata on a device to be saved and later restored.
This can be useful before experimenting on an array that is misbehaving.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>