Commit Graph

3105 Commits

Author SHA1 Message Date
Song Liu 198d54787c add crc32c and use it for r5l checksum
In kernel space, r5l checksum will use crc32c:
http://marc.info/?l=linux-raid&m=144598970529191
mdadm need to change too.

This patch ports a simplified crc32c algorithm from kernel code,
and used in super1.c:write_empty_r5l_meta_block();

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-30 17:38:28 +11:00
Song Liu 356e69de79 mdadm: add test script for raid456 journal
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-22 12:20:08 +11:00
Song Liu 28f83f6d3b mdadm: Add description of write journal to md.4
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-22 12:19:11 +11:00
Song Liu 051f326550 mdadm: refactor write journal code in Assemble and Incremental
As discussed, standalone require_journal() in struct superswitch
is not a very good idea. Instead, journal related information
fits well in struct mdinfo.

This patch simplifies journal support code in Assemble and
Incremental as:

- Add journal_device_required and journal_clean to struct mdinfo;
- Remove function require_journal from struct superswitch;
- Update Assemble and Incremental to use journal_device_required
and journal_clean from struct mdinfo (instead of separate var).

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-22 12:19:09 +11:00
Guoqing Jiang e80357f825 Make cmap_* also has same policy as dlm_*
Let libcmap lib and related funs also only need one-time
setup during mdadm running period.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-21 11:19:35 +11:00
Guoqing Jiang d15a1f72bd Safeguard against writing to an active device of another node
Modifying an exiting device's superblock or creating a new superblock
on an existing device needs to be checked because the device could be
in use by another node in another array. So, we check this by taking
all superblock locks in userspace so that we don't  step onto an active
device used by another node and safeguard against accidental edits.
After the edit is complete, we release all locks and the lockspace so
that it can be used by the kernel space.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-21 11:19:05 +11:00
Song Liu 28d744468e Add help message and man entry for --write-journal
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:21 +11:00
Song Liu 5c6ad21150 Check write journal in incremental
If journal device is missing, do not start the array, and shows:

./mdadm -I /dev/sdf
mdadm: journal device is missing, not safe to start yet.

The array will be started when the journal device is attached with -I

./mdadm -I /dev/sdb1
mdadm: /dev/sdb1 attached to /dev/md/0_0, which has been started.

To force start without journal device:

./mdadm -I /dev/sdf --run
mdadm: Trying to run with missing journal device
mdadm: /dev/sdf attached to /dev/md/0_0, which has been started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:18 +11:00
Song Liu 69a481166b Assemble array with write journal
Example output:

./mdadm --assemble /dev/md0 /dev/sd[c-f] /dev/sdb1
mdadm: /dev/md0 has been started with 4 drives and 1 journal.

mdadm checks superblock for journal devices. If the journal device
is missing or faulty, mdadm will show warning

./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1
mdadm: Not safe to assemble with missing or stale journal device, consider --force.

User can insist to start the array (read only) with --force

./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1 --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md0 has been started with 15 drives.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:15 +11:00
Song Liu cc1799c3dd Enable create array with write journal (--write-journal DEVICE).
Specify the write journal device with --write-journal DEVICE

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Only one journal device is allowed. If multiple --write-journal
are given, mdadm will use the first and ignore others

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1 --write-journal /dev/sdx
mdadm: Please specify only one journal device for the array.
mdadm: Ignoring --write-journal /dev/sdx...
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:12 +11:00
Song Liu ed94976d84 Show device as journal in --detail --examine
Example output:

./mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Wed May 13 17:01:12 2015
     Raid Level : raid5
     Array Size : 11720662464 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 3906887488 (3725.90 GiB 4000.65 GB)
   Raid Devices : 4
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed May 13 17:01:12 2015
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 32K

           Name : 0
           UUID : 8fb9ee05:3831d52f:e5c23825:28cd6881
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf

       4       8       17        -      journal   /dev/sdb1

./mdadm -E /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x201
     Array UUID : 562b2334:35b9bcc1:add50892:1f30c4bd
           Name : 0
  Creation Time : Thu Aug 27 12:55:26 2015
     Raid Level : raid5
   Raid Devices : 15

 Avail Dev Size : 249796608 (119.11 GiB 127.90 GB)
     Array Size : 54696423936 (52162.57 GiB 56009.14 GB)
  Used Dev Size : 7813774848 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : active
    Device UUID : 5015e522:d39ba566:5909cf3c:9c51f2ff

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Aug 27 13:16:55 2015
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4e6fd76d - correct
         Events : 262

         Layout : left-symmetric
     Chunk Size : 256K

   Device Role : Journal
   Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:07 +11:00
Song Liu fa7574f6d4 add macros for MD_DISK_ROLE_(SPARE/FAULTY)
Replace special disk roles (0xffff, 0xfffe) with macros:

define MD_DISK_ROLE_SPARE      0xffff
define MD_DISK_ROLE_FAULTY     0xfffe

Will add macro for journal device in next patch:
define MD_DISK_ROLE_JOURNAL    0xfffd

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:05:59 +11:00
Artur Paszkiewicz 2139b03c20 imsm: don't call abort_reshape() in imsm_manage_reshape()
Calling abort_reshape() in imsm_manage_reshape() is unnecessary in case
of an error because it is handled by reshape_array(). Calling it when
reshape completes successfully is also unnecessary and leads to a race
condition:
- reshape ends
- mdadm calls abort_reshape() -> sets sync_action to idle
- MD_RECOVERY_INTR is set and md_reap_sync_thread() does not finish the
  reshape

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Konrad Dabrowski <konrad.dabrowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-08 15:22:16 +11:00
Guoqing Jiang 9465f17058 re-add: make re-add try to write sysfs node first
If sysfs node existed, we should try to write "re-add" to it.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-08 11:08:40 +11:00
NeilBrown 8266a36ad6 Merge branch 'fix' of git://github.com/ldzhong/mdadm 2015-10-01 08:30:58 +10:00
Guoqing Jiang bff96f7366 mdadm: make cluster raid also could support re-add
If it is a cluster raid, the disc.state need to be
changed accordingly when do re-add.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-28 14:55:02 +10:00
Goldwyn Rodrigues 9d9202e301 Fix --incremental handling on cluster array.
Commit 06bd679317 ("Skip clustered devices in incremental")
disabled incremental completely on clustered arrays.
What we really want is that mdadm should not start or create
a clustered array but still be able to add or readd to an existing
device. This would enable udev scripts to automatically add
or re-add a device after transient errors.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-28 14:42:55 +10:00
NeilBrown 86a406c226 super1: Do not create bad block log for clustered devices.
We currently have no synchronization techniques for the bad
block log, so disable it for the cluster.

Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-28 12:27:37 +10:00
Goldwyn Rodrigues 6d9c7c2551 Increment version for clustered bitmaps
Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels
to assemble a clustered device.

In order to maximize compatibility, the major version is set to
BITMAP_MAJOR_CLUSTERED *only* if the bitmap is clustered.

Also, added MD_FEATURE_CLUSTERED in order to return error
for older kernels which would assemble MD in case bitmap is
corrupted.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-28 11:47:04 +10:00
Lidong Zhong d7c6d75dcf mdadm: remove duplicate logic when c.delay is 0 2015-08-26 14:03:56 +08:00
NeilBrown ccc93b33ca Makefile: test -s flag and suppress echo when set.
Some rules do their own tracing and so aren't affected
by -s.
So add a test for -s in MAKE_FLAGS and avoid echo when present.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-05 15:10:43 +10:00
NeilBrown 4a3a795a8b tests: raid6 repair is now tested on every different layout.
Signed-off-by: NeilBrown <neilb@suse.de>
2015-08-05 14:57:08 +10:00
NeilBrown d80f7aa9a1 Assemble: correctly capture error from ->write_bitmap
else 'err' might be undefined.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-05 14:55:31 +10:00
NeilBrown 380487fdc9 main: remove use of uninitialized 'rv'.
If c.homecluster was not NULL, might get an
error anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-05 14:53:33 +10:00
NeilBrown 598f8904ac raid6check: don't ignore return value from posix_memalign.
Compilers don't like that.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-08-05 14:51:25 +10:00
NeilBrown 5997585200 Merge branch 'mdadm-3.3.x' 2015-08-03 16:21:37 +10:00
NeilBrown 69818a5c75 Release mdadm-3.3.4
Important bugfix release.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 16:17:13 +10:00
NeilBrown 8360760457 Assemble: really don't assemble IMSM array without OROM.
Previous patch missed on case.

Also print more useful information when rejecting
a device with IMSM metadata.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 16:06:51 +10:00
NeilBrown 187f157bf0 mdassemble: include mapfile support.
This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 15:42:31 +10:00
NeilBrown 7eee461e91 Assemble: don't assemble IMSM array without OROM.
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 15:42:16 +10:00
NeilBrown 53a087b105 mdassemble: include mapfile support.
This does make mdassemble a bit bigger, but it also means
it actually works properly with named arrays.

Ref: https://bbs.archlinux.org/viewtopic.php?id=198196
Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 11:54:16 +10:00
NeilBrown 7d55dca2cc mdassemble: don't try to perform cluster check.
mdassemble is meant to be small an simple, so avoid
trying to check for a cluster.
Currently it doesn't, but it still includes the code,
which doesn't build because the library isn't provided.

So just exclude the get_cluster_name code from mdassemble.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 11:53:01 +10:00
Guoqing Jiang 2cf42394f0 md-cluster: use %-64s to print cluster_name
Left align is better for cluster with name less than 64. Also
make the output of cluster name is aligned with others.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29 17:26:12 +10:00
Guoqing Jiang d7a493695a mdadm: fix wrong condition for go to abort
When parse_cluster_confirm_arg return 0, it means the
arg are parsed successfully, so change !rv to rv.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29 17:26:12 +10:00
NeilBrown 9f2e55a421 Assemble: don't assemble IMSM array without OROM.
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29 14:38:37 +10:00
NeilBrown 653299b699 Merge branch 'cluster'
Now that 3.3.3 is out, it is time to include the cluster-support code.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-27 11:01:08 +10:00
NeilBrown 3cab8baec5 Release mdadm-3.3.3
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:18:17 +10:00
NeilBrown e4fa82a858 mdassemble: add "Name" definition.
That allows it to compile again :-(

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:18:13 +10:00
NeilBrown 27aefbdb3d Don't ignore return value from read and write
New gcc sometimes complains about this.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:11:23 +10:00
NeilBrown d8f82d1d88 bitmap: convert "inline" to "static inline"
Otherwise new gcc ignores them with some compile options.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 16:10:44 +10:00
NeilBrown 86b77ddf87 Assemble: extend --homehost='<ignore>' to allow --name= to ignore homehost
Also make --homehost='<ignore>' work properly.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 12:50:54 +10:00
NeilBrown 6fba5a339c test: assume recovery has completed if sync_completed says so.
The final completion of a recovery can be delayed, so use
sync_completed to check if it is finished, just not been reaped.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-23 11:17:10 +10:00
NeilBrown 4108d695e3 tests: flushbufs after writing zeros
sometimes the removed device is re-added before the writes
get all the way to the md device - so the array doesn't need
any recovery and the test fails.
So flush first to be safe.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-23 11:09:19 +10:00
NeilBrown d51e39c0a4 test: add -F flag to mkfs
newer versions of mkfs.extX ask before creating a filesystem
on a device which appears to already have a filesystem.
We don't want that, so add the -F flag.
Also be explicit about fs type as one shouldn't depend on defaults.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22 09:58:41 +10:00
NeilBrown 49325eac3a mdadm: document --homehost=any functionality.
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22 09:33:17 +10:00
NeilBrown 00f23a8861 Assemble: improve tests for matching --name= request.
If the name in the array has a home-host, then
require that it matches, or is "any", or requested
homehost is "any".

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-22 09:24:36 +10:00
NeilBrown 12ee2a8d75 raid6check: use O_DIRECT instead of O_SYNC.
O_DIRECT is more direct and is faster.
This requires aligned memory allocation, but that isn't hard.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-20 17:17:37 +10:00
NeilBrown eae01ef02f restripe: fix data block order in raid6_2_data_recov
... rather than relying on the caller getting them in the
correct order.
This is better engineering and fixes a bug, but because the
failed_slotX numbers are used later with assumption that
they weren't swapped

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-20 17:15:13 +10:00
NeilBrown 50786d4731 raid6check: various cleanup/fixes
- document meaning of various arrays. In particular:
   stripes[]
   blocks[]
   blocks_page[]
   block_index_for_slot[]

  It needs to be clear if these are indexed by raid_disk
  number or syndrome number.

- changed meaning of block_index_for_slot[].  It didn't seem
  to be used consistently.  It also made use of the block numbers
  in array data ordering, which is not directly relevant for syndrome
  calculations.

- reduced number of args to autorepair and manual_repair
  There don't need both stripes[] and blocks[].  And they don't need
  diskP or diskQ.
  blocks[-1] is the P chunk, blocks[-2] is the Q chunk.
  block_index_for_slot[] can be used to find the target device for
  a particular syndrome block.

- remove stripe locking from within manual_repair, and instead
  use the global stripe locking used for check and autorepair.

- this necessitated changes to raid6_datap_recov and raid5_2data_reov
  so the P and Q blocks could be before or after the data blocks.



Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-20 14:11:33 +10:00
NeilBrown 29a312f2f3 Assemble: really ensure stripe_cache is bit enough to handle new chunk size
Earlier patch:
  56fcbcbb6f
calculated the proper chunk size - but didn't use it..

Let's actually use it this time.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-17 13:10:25 +10:00