Commit Graph

18 Commits

Author SHA1 Message Date
NeilBrown 1011e8344a Remove lots of unnecessary white space.
Now that I am using white-space mode in Emacs I can see all of this,
and I don't like it :-)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 12:31:45 +10:00
Bernd Schubert 2161adce8f raid6check: Check return value of lseek64()
If lseek64() failed it was still writing to the disks, which would introduce
data corruption.

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:05:38 +10:00
Bernd Schubert 2c7b668df7 raid6check: Fix compiler warnings.
Fix some compiler warnings appearing with optimization levels.

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:04:43 +10:00
Bernd Schubert 635b5861c3 raid6check: Use enums for repair type
Using hard coded numbers is error prone and hard to read by humans.

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:04:18 +10:00
Bernd Schubert 3a89d75488 raid6check: Fix memory leaks detected by valgrind
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 1 of 10
==2389947==    at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947==    by 0x408067: xmalloc (xmalloc.c:36)
==2389947==    by 0x401B19: check_stripes (raid6check.c:151)
==2389947==    by 0x4030C6: main (raid6check.c:521)
==2389947==
==2389947== 24 bytes in 1 blocks are definitely lost in loss record 2 of 10
==2389947==    at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2389947==    by 0x408067: xmalloc (xmalloc.c:36)
==2389947==    by 0x401B67: check_stripes (raid6check.c:155)
==2389947==    by 0x4030C6: main (raid6check.c:521)
==2389947==

Signed-off-by: Bernd Schubert <bernd.schubert@fastmail.fm>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:03:44 +10:00
Bernd Schubert f8fcf7a1c5 raid6check: Fix build of raid6check
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.

Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)

Move check_env() from util.c to lib.c

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:03:12 +10:00
Robert Buchholz 8a63c73123 raid6check: Auto-repair mode
When calling raid6check in regular scanning mode, specifiying
"autorepair" as the last positional parameter will cause it
to automatically repair any single slot failes it identifies.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-10 17:28:21 +10:00
Robert Buchholz 351d768026 raid6check: Extract (un)locking into functions
Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-10 17:28:03 +10:00
Robert Buchholz 696e95a1df raid6check: Repair mode used geo_map incorrectly
In repair mode, the data block indices to be repaired were calculated
using geo_map() which returns the disk slot for a data block index
and not the reverse. Now we simply store the reverse of that calculation
when we do it anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-10 17:25:27 +10:00
Robert Buchholz b67e45b858 raid6check: Fix off-by-one in argument check
In repair mode, specifying a failed slot that is equal to the number of
devices in the raid could cause a segfault.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-10 17:23:59 +10:00
Robert Buchholz f2e29ad691 Repair mode for raid6
In repair mode, raid6check will rewrite one single stripe
by regenerating the data (or parity) of two raid devices that
are specified via the command line.
If you need to rewrite just one slot, pick any other slot
at random.

Note that the repair option will change data on the disks
directly, so both the md layer above as well as any layers
above md (such as filesystems) may be accessing the stripe
data from cached buffers. Either instruct the kernels
to drop the caches or reassemble the raid after repair.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:22:45 +10:00
NeilBrown 503975b9d5 Remove scattered checks for malloc success.
malloc should never fail, and if it does it is unlikely
that anything else useful can be done.  Best approach is to
abort and let some super-daemon restart.

So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit.  Then use those
removing all the tests for failure.

Also replace all "malloc;memset" sequences with 'xcalloc'.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown e7b84f9d50 Introduce pr_err for printing error messages.
'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": '
cont_err() is also available.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
Piergiorgio Sartor 8d8ab389a0 RAID-6 check standalone suspend array
Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-16 17:29:08 +10:00
Piergiorgio Sartor 2cf3112111 RAID-6 check standalone fix component list parsing
Fix the parsing of the component list, i.e. skipping the "spare" one.

I also added a check in case the array is degraded.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-14 17:28:31 +10:00
Piergiorgio Sartor af3c375034 RAID-6 check standalone code cleanup
Major change is code cleanup and simplification.
Furthermore, a better error handling and a couple
of bug fixes.
Last but not least, the command line parameters are
changed from "bytes" to "stripes", which is more
convenient, I guess.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 09:16:55 +10:00
Piergiorgio Sartor a9c2c6c697 RAID-6 check standalone md device
Allow RAID-6 check to be passed only the
MD device, start and length.
The three parameters are mandatory.

All necessary information is collected using
the "sysfs_read()" call.
Furthermore, if "length" is "0", then the check
is performed until the end of the array.

Some checks are done, for example if the md device
is really a RAID-6. Nevertheless I guess it is not
bullet proof...

Next patch will include the "suspend" action.
My idea is to do it "per stripe", please let me
know if you've some better options.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:56:41 +10:00
Piergiorgio Sartor 979afcb82b RAID-6 check standalone
Hi Neil,

please find attached a patch, to mdadm-3.2 base, including
a standalone versione of the raid-6 check.

This is basically a re-working (and hopefully improvement)
of the already implemented check in "restripe.c".

I splitted the check function into "collect" and "stats",
so that the second one could be easily replaced.
The API is also simplified.

The command line option are reduced, since we only level
is raid-6, but the ":offset" option is included.

The output reports the block/stripe rotation, P/Q errors
and the possible HDD (or unknown).

BTW, the patch applies also to the already patched "restripe.c",
including the last ":offset" patch (which is not yet in git).

Other item is that due to "sysfs.c" linking (see below) the
"Makefile" needed some changes, I hope this is not a problem.

Next steps (TODO list you like) would be:

1) Add the "sysfs.c" code in order to retrieve the HDDs info
from the MD device. It is already linked, together with the
whole (mdadm) universe, since it seems it cannot leave alone.
I'll need some advice or hint on how to do use it. I checked
"sysfs.c", but before I dig deep into it maybe better to
have some advice (maybe just one function call will do it).

2) Add the suspend lo/hi control. Fellow John Robinson was
suggesting to look into "Grow.c", which I did, but I guess
the same story as 1) is valid: better to have some hint on
where to look before wasting time.

3) Add a repair option (future). This should have different
levels, like "all", "disk", "stripe". That is, fix everything
(more or less like "repair"), fix only if a disk is clearly
having problems, fix each stripe which has clearly a problem
(but maybe different stripes may belong to different HDDs).

So, for the point 1) and 2) would be nice to have some more
detail on where to look what. Point 3) we will discuss later.

Thanks, please consider for inclusion,

bye,

pg

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-21 13:52:44 +11:00