Commit Graph

3146 Commits

Author SHA1 Message Date
NeilBrown c61b1c0bb5 Release mdadm-3.4
My last release!

Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-28 17:14:56 +11:00
NeilBrown 7071320a18 Assorted fixed for a "make everything" build
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-28 13:28:58 +11:00
NeilBrown d5ff855d47 super1: allow reshape that hasn't really started to be reverted.
A simple revert doesn't work here because the reshape_position is
in the critical section.
The best approach is to let the reshape progress a bit and then
go backwards.
If that isn't possible, assembling with --update=revert-reshape and
--invalid-backup should work.

Reported-by-tested-by: George Rapp <george.rapp@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-28 12:57:08 +11:00
NeilBrown 9f7f28ee50 super0: Fix reporting of devices between 2GB and 4GB
v0.90 metadata can handle devices between 2GB and 4GB, but we need
to treat the 'size' and unsigned.  In a couple of places we don't.

URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=809447
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-28 11:57:54 +11:00
NeilBrown cec72c071b systemd/mdadm-last-resort: add Conflicts to .service file.
It seems that having the Conflicts in the .timer file is not sufficient.
Sometimes it works, but if the timer gets requested after the conflicting
block device appears (or was it "before" ...) the timer is not aborted.

Having the Conflicts in both files seems to work reliably.

URL: https://bugzilla.suse.com/show_bug.cgi?id=853944
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-28 11:45:53 +11:00
NeilBrown ac92b44a87 super1: fix calculation of space_before
This code was meant to update 'earliest' but clearly never doesn't.

This bug would only affect an array with a very large bitmap so it is unlikely
to be significant.

Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-28 11:44:27 +11:00
Guoqing Jiang 32539f74d2 util: fix wrong return value of cluster_get_dlmlock
Actually lksb.sb_status means that a node got the lock
or not instead of the return value of dlm_lock.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
2016-01-27 11:43:02 +11:00
Khem Raj 50d72ed429 Add casts for the addr arg of connect and bind
glibc allows the addr arg to connect and socket to be any of a number
of 'sockaddr_*' types, but musl requires 'const struct sockaddr *'
which is in line with open group specs.  So add casts to allow
compilation with musl.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-15 08:36:45 +11:00
Khem Raj cf80bce8df Define _POSIX_C_SOURCE if undefined
config.c uses _POSIX_C_SOURCE which is defined in features.h when
glibc/uclibc is used, but isn't defined when musl is used.
So provide a reasonable default.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-15 08:36:19 +11:00
NeilBrown dfd7822ca6 Create: minor fix when adding a journal device
The check of "is there a filesystem here" is still appropriate for a
journal device.

Also set active_disks correctly - even though it is ignored.

Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-14 14:13:17 +11:00
NeilBrown f170a5a9a0 Create: fix regression in setting raid_disk
Recent commit caused 'missing' declarations to not be handled correctly.

Fixes: cc1799c3dd ("Enable create array with write journal (--write-journal DEVICE).")
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-14 13:22:17 +11:00
NeilBrown ef639064b6 restripe: fix compilation of "make test"
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-13 10:01:02 +11:00
Guoqing Jiang b3774a48db Fix wrong description in manpage
The careless change was introduce by 'commit 7e6e839a26
(mdadm: change the num of cluster node)'. Which should be
revert to avoid misunderstanding.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-13 08:19:41 +11:00
Artur Paszkiewicz c85338c675 imsm: don't update migration record when reshape is interrupted
Abort imsm_manage_reshape() without updating the migration record if any
error occurs when checking progress. If reshape is interrupted and the
migration record is then updated, the checkpoint will be wrong and will
cause reshape to fail when the array is restarted.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-07 11:10:34 +11:00
Artur Paszkiewicz 5ff3a780ab imsm: use timeout when waiting for reshape progress
Waiting for reshape progress is done by using select() on sync_completed
to block until an exception condition is signalled on the
filedescriptor. This happens when the attribute's value is updated by
the kernel, but if the array is stopped when mdadm is blocked on
select() this will never happen, because this attribute is then removed
and apparently the kernel doesn't do sysfs_notify() when removing a
sysfs attribute. So set a 3 second timeout for the sysfs_wait() call.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-07 11:09:53 +11:00
Pawel Baldysiak 60f0f54d6f IMSM: Add support for VMD
The Intel Volume Management Device (VMD) is an integrated
endpoint on the platform's PCIe root complex that acts
as a host bridge to a secondary PCIe domain.

This patch adds proper handling of NVMe devices attached to VMD domain.
Each VMD domain is treated as a separate controller (HBA).
Spanning between domains is forbidden.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-06 12:22:57 +11:00
Artur Paszkiewicz d7d3809a1b imsm: abort reshape if sync_action is not "reshape"
When reshape was interrupted, an incorrect checkpoint would be saved in
the migration record. Change wait_for_reshape_imsm() to return -1 when
sync_action is not "reshape" to abort early in imsm_manage_reshape()
without writing the migration record.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-24 10:04:24 +11:00
Artur Paszkiewicz 10df72a080 Grow: close file descriptor earlier to avoid "still in use" when stopping
Close fd2 as soon as it is no longer needed, before calling
Grow_continue(). Otherwise, we won't be able to stop an array with
external metadata during reshape, because mdadm running in background
will be keeping it open.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-24 10:00:00 +11:00
NeilBrown e652d1aa29 Detail: fix wrong condition in recent change.
Now that we can print device details with a specific raid_disk but not
disk.number, the condition for "print either disk.number or disk.raid_disk"
must be make more specific.

Reported-by: Coly Li <colyli@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-23 12:15:32 +11:00
Xiao Ni f7cf9699dc Check and remove bitmap first when reshape to raid0
If reshape one raid device with bitmap to raid0, the reshape progress will
start. But it'll fail and lose some components. So it should remove bitmap
first.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-22 15:16:08 +11:00
Song Liu 38c2e05b6a in --add assign raid_disk of 0 to journal
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-22 07:50:05 +11:00
Song Liu 6fe4c61603 move journal to end of --detail list
As we give journal device raid_disk of 0, the output of --detail is:

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       5       8       24        0      journal   /dev/sdb8
       1       8       18        1      active sync   /dev/sdb2
       2       8       19        2      active sync   /dev/sdb3
       3       8       21        3      active sync   /dev/sdb5

       4       8       23        -      spare   /dev/sdb7

This patch makes it back to:
    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       18        1      active sync   /dev/sdb2
       2       8       19        2      active sync   /dev/sdb3
       3       8       21        3      active sync   /dev/sdb5

       4       8       23        -      spare   /dev/sdb7
       5       8       24        -      journal   /dev/sdb8

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-22 07:50:03 +11:00
NeilBrown 6dd16dac40 Add --update=force-no-bbl.
This forcibly removed the bad-block log.  There can be situations where it is hard to
remove bad blocks by writing to them - partiularly on RAID5.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-21 14:56:38 +11:00
NeilBrown a0d12d51a7 Merge branch 'fix-unlikely-potential-overflows' of https://github.com/sjvs/mdadm 2015-12-21 13:01:10 +11:00
NeilBrown 9da5a4897d Merge https://github.com/makelinux/mdadm
Fixes https://github.com/neilbrown/mdadm/issues/17
2015-12-21 12:57:06 +11:00
NeilBrown 78a5dc039b Detail: don't assume a particular 'disk' number of missing devices.
When a particular raid-disk is missing, we don't know which disk number
it should have, and reporting a number could result in duplicate
numbers (with v1.x metadata - never with the old 0.90).

So set the default to -1 and recoginise that when printing.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-18 13:51:54 +11:00
NeilBrown 9e70a453ed Detail: report correct raid-disk for removed drives.
Back in
  Commit: 8057db46a1 ("Detail: fix handling of 'disks' array.")
when we doubled the size of the 'disks' array to handle primary and
replacement, we should have halved the setting of the default raid_disk
number.

Reported-by: Coly Li <colyli@suse.de>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-18 13:49:30 +11:00
Guoqing Jiang 81a8a69415 mdadm: improve the safeguard for change cluster raid's sb
This commit does the following jobs:

1. rename is_clustered to dlm_funs_ready since it match the
   function better.
2. st->cluster_name can't be use to identify the raid is a
   clustered or not, we should check the bitmap's version to
   perform the identification.
3. for cluster_get_dlmlock/cluster_release_dlmlock funcs, both
   of them just need the lockid as parameter since the cluster
   name can get by get_cluster_name().

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-17 09:53:37 +11:00
Guoqing Jiang 1b78e47021 mdadm: do not try to hold dlm lock in free_super1
Since free_super1 actually doesn't change the sb, it
just free the addr space of sb. Also free_super1 is
called in lots of place within mdadm, so remove dlm
lock code since the func doesn't need the protection
and also reduce latency.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-17 09:51:46 +11:00
Guoqing Jiang 53e76b1def mdadm: do not display bitmap info if it is cleared
"mdadm -X DISK" is used to report information about a bitmap
file, it is better to not display all the related infos if
bitmap is cleared with "--bitmap=none" under grow mode.

To do that, the locate_bitmap is changed a little to have a
return value based on MD_FEATURE_BITMAP_OFFSET.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 13:24:04 +11:00
Guoqing Jiang 8b2202ded1 mdadm: don't show cluster name once the bitmap is cleared
Don't show cluster name if bitmap is cleared.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 13:24:01 +11:00
Guoqing Jiang 37d0ca9be6 mdadm: output info more precisely when change bitmap to none
WHen change bitmap to none, the infos could be more accurate
based on existed bitmap type.

And s->bitmap_file is passed from cmd "--bitmap=TYPE", so
remove s->bitmap_file from err info since it should means
change the bitmap to one type failed rather than the type is
already presented.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 13:23:58 +11:00
Guoqing Jiang 41dbb4da22 mdadm: let cluster raid could also add disk within incremental mode
For cluster raid, the disc.state need to be changed accordingly under
incremental mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 13:23:54 +11:00
Song Liu 01290056d0 recreate journal in mdadm
This patch tries recreates missing/faulty journal in mdadm.

Example:

./mdadm --fail /dev/md1 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1

./mdadm --stop /dev/md1
mdadm: stopped /dev/md1

./mdadm -A --scan --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md/1 has been started with 15 drives.

./mdadm --add-journal /dev/md1 /dev/sdb2
mdadm: added /dev/sdb2

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 12:43:56 +11:00
Song Liu 5aa644c68a add sysfs_array_state to struct mdinfo
Add sysfs_array_state to struct mdinfo, and add GET_ARRAY_STATE to
options of sysfs_read.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 12:43:45 +11:00
Deepa Dinamani 26714713cd mdadm: Change timestamps to unsigned data type.
32 bit signed timestamps will overflow in the year 2038.

Change the user interface mdu_array_info_s structure timestamps:
ctime and utime values used in ioctls GET_ARRAY_INFO and
SET_ARRAY_INFO to unsigned int. This will extend the field to last
until the year 2106.

Add time_after/time_before and supporting typecheck from
the kernel to take care of unsigned time wraparound.

The long term plan is to get rid of ctime and utime values in
this structure as this information can be read from the on-disk
meta data directly.

v0.90 on disk meta data uses u32 for maintaining time stamps.
So this will also last until year 2106.
Assumption is that the usage of v0.90 will be deprecated by
year 2106.

Timestamp fields in the on disk meta data for v1.0 version already
use 64 bit data types.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 12:43:25 +11:00
Constantine Shulyupin cd04f56212 Detail.c --test fix 2015-12-10 16:26:07 +02:00
Song Liu dbfbca4300 fix bug in assemble
In Assemble, getinfo_super() over-writes journal_clean.  To
ensure correct journal_clean, keep it in a local variable
before getinfo_super().

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-09 07:35:50 +11:00
Bas van Schaik 1158f25eae make sure 'path' buffer is large enough to fit 200 characters plus null terminator 2015-12-03 13:48:53 +00:00
Bas van Schaik fa9aca4930 avoid confusion with parameter 'devname' with same name, ensure buffer is large enough for two ints plus extras 2015-12-03 13:48:46 +00:00
Bas van Schaik a90ed30e74 ensure buffer is large enough for two ints and some extras 2015-12-03 13:48:37 +00:00
Song Liu 198d54787c add crc32c and use it for r5l checksum
In kernel space, r5l checksum will use crc32c:
http://marc.info/?l=linux-raid&m=144598970529191
mdadm need to change too.

This patch ports a simplified crc32c algorithm from kernel code,
and used in super1.c:write_empty_r5l_meta_block();

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-30 17:38:28 +11:00
Song Liu 356e69de79 mdadm: add test script for raid456 journal
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-22 12:20:08 +11:00
Song Liu 28f83f6d3b mdadm: Add description of write journal to md.4
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-22 12:19:11 +11:00
Song Liu 051f326550 mdadm: refactor write journal code in Assemble and Incremental
As discussed, standalone require_journal() in struct superswitch
is not a very good idea. Instead, journal related information
fits well in struct mdinfo.

This patch simplifies journal support code in Assemble and
Incremental as:

- Add journal_device_required and journal_clean to struct mdinfo;
- Remove function require_journal from struct superswitch;
- Update Assemble and Incremental to use journal_device_required
and journal_clean from struct mdinfo (instead of separate var).

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-22 12:19:09 +11:00
Guoqing Jiang e80357f825 Make cmap_* also has same policy as dlm_*
Let libcmap lib and related funs also only need one-time
setup during mdadm running period.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-21 11:19:35 +11:00
Guoqing Jiang d15a1f72bd Safeguard against writing to an active device of another node
Modifying an exiting device's superblock or creating a new superblock
on an existing device needs to be checked because the device could be
in use by another node in another array. So, we check this by taking
all superblock locks in userspace so that we don't  step onto an active
device used by another node and safeguard against accidental edits.
After the edit is complete, we release all locks and the lockspace so
that it can be used by the kernel space.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-21 11:19:05 +11:00
Song Liu 28d744468e Add help message and man entry for --write-journal
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:21 +11:00
Song Liu 5c6ad21150 Check write journal in incremental
If journal device is missing, do not start the array, and shows:

./mdadm -I /dev/sdf
mdadm: journal device is missing, not safe to start yet.

The array will be started when the journal device is attached with -I

./mdadm -I /dev/sdb1
mdadm: /dev/sdb1 attached to /dev/md/0_0, which has been started.

To force start without journal device:

./mdadm -I /dev/sdf --run
mdadm: Trying to run with missing journal device
mdadm: /dev/sdf attached to /dev/md/0_0, which has been started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:18 +11:00
Song Liu 69a481166b Assemble array with write journal
Example output:

./mdadm --assemble /dev/md0 /dev/sd[c-f] /dev/sdb1
mdadm: /dev/md0 has been started with 4 drives and 1 journal.

mdadm checks superblock for journal devices. If the journal device
is missing or faulty, mdadm will show warning

./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1
mdadm: Not safe to assemble with missing or stale journal device, consider --force.

User can insist to start the array (read only) with --force

./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1 --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md0 has been started with 15 drives.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:15 +11:00