Commit Graph

288 Commits

Author SHA1 Message Date
Jes Sorensen 9db2ab4e9b util: md_array_valid(): Introduce md_array_valid() helper
Using md_get_array_info() to determine if an array is valid is broken
during creation, since the ioctl() returns -ENODEV if the device is
valid but not active.

Where did I leave my stash of brown paper bags?

Fixes: ("40b054e mdopen/open_mddev: Use md_get_array_info() to determine valid array")
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-03 16:15:16 -04:00
Jes Sorensen 44356754ec util: Get rid of unused enough_fd()
enough_fd() is no longer used, so lets get rid of it.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-04-20 11:53:30 -04:00
Jes Sorensen 3ab8f4bf33 util: Introduce md_array_active() helper
Rather than querying md_get_array_info() to determine whether an array
is valid, do the work in md_array_active() using sysfs, and fall back
on md_get_array_info() if sysfs fails.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-04-20 00:12:34 -04:00
Jes Sorensen 32141c1765 Retire mdassemble
mdassemble doesn't handle container based arrays, no support for sysfs,
etc. It has not been actively maintained for years, so time to send it
off to retirement.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-04-11 12:54:26 -04:00
Jes Sorensen 303949f6f0 util: Finally kill off md_get_version()
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-04-05 15:49:18 -04:00
Jes Sorensen 700483a223 util/set_array_info: Simplify code since md_get_version returns a constant
md_get_version() always returns (0 * 1000) + (90 * 100) + 3, so no
point in calling it.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-04-05 15:06:24 -04:00
Jes Sorensen f5c924f441 util/must_be_container: Use sysfs_read(GET_VERSION) to determine valid array
Use sysfs_read() instead of ioctl(RAID_VERSION) to determine this is
in fact a valid raid array fd.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-04-05 14:01:30 -04:00
Jes Sorensen 018a488238 util: Introduce md_set_array_info()
Switch from using ioctl(SET_ARRAY_INFO) to using md_set_array_info()

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 15:43:53 -04:00
Jes Sorensen d97572f5a5 util: Introduce md_get_disk_info()
This removes all the inline ioctl calls for GET_DISK_INFO, allowing us
to switch to sysfs in one place, and improves type checking.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 15:23:50 -04:00
Jes Sorensen 9cd39f0155 util: Introduce md_get_array_info()
Remove most direct ioctl calls for GET_ARRAY_INFO, except for one,
which will be addressed in the next patch.

This is the start of the effort to clean up the use of ioctl calls and
introduce a more structured API, which will use sysfs and fall back to
ioctl for backup.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 14:35:41 -04:00
Jes Sorensen efa295309f util: Cosmetic changes
Fixup a number of indentation and whitespace issues

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 12:05:12 -04:00
NeilBrown 1ab9ed2afb Add 'force' flag to *hot_remove_disk().
In rare circumstances, the short period that *hot_remove_disk()
waits isn't long enough to IO to complete.  This particularly happens
when a device is failing and many retries are still happening.

We don't want to increase the normal wait time for "mdadm --remove"
as that might be use just to test if a device is active or not, and a
delay would be problematic.
So allow "--force" to mean that mdadm should try extra hard for a
--remove to complete, waiting up to 5 seconds.

Note that this patch fixes a comment which claim the previous
wait time was half a second, where it was really 50msec.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-28 14:32:35 -04:00
NeilBrown fdd015696c Introduce sys_hot_remove_disk()
The new hot_remove_disk() will retry HOT_REMOVE_DISK
several times in the face of EBUSY.
However we sometimes remove a device by writing "remove" to the
"state" attributed.  This should be retried as well.
So introduce sys_hot_remove_disk() to repeat this action a few times.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-28 14:30:49 -04:00
NeilBrown 2dd271fe70 Retry HOT_REMOVE_DISK a few times.
HOT_REMOVE_DISK can fail with EBUSY if there are outstanding
IO request that have not completed yet.  It can sometimes
be helpful to wait a little while for these to complete.

We already do this in impose_level() when reshaping a device,
but not in Manage.c in response to an explicit --remove request.

So create hot_remove_disk() to central this code, and call it
where-ever it makes sense to wait for a HOT_REMOVE_DISK to succeed.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-28 14:25:23 -04:00
Xiao Ni ff9239ee31 mdadm: Specify enough length when write to buffer
In Detail.c the buffer path in function Detail is defined as path[200],
in fact the max lenth of content which needs to write to the buffer is
287. Because the length of dname of struct dirent is 255.
During building it reports error:
error: ‘%s’ directive writing up to 255 bytes into a region of size 189
[-Werror=format-overflow=]

In function examine_super0 there is a buffer nb with length 5.
But it need to show a int type argument. The lenght of max
number of int is 10. So the buffer length should be 11.

In human_size function the length of buf is 30. During building
there is a error:
output between 20 and 47 bytes into a destination of size 30.
Change the length to 47.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-17 15:58:16 -04:00
Mariusz Dabrowski 31208db97e Always return last partition end address in 512B blocks
For 4K disks 'endofpart' is an index of the last 4K sector used by partition.
mdadm is using number of 512-byte sectors, so value returned by
get_last_partition_end must be multiplied by 8 for devices with 4K sectors.
Also, unused 'ret' variable has been removed.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-12-13 09:09:25 -05:00
Mariusz Dabrowski 41b06495ba Use disk sector size value to set offset for reading GPT
mdadm is using invalid byte-offset while reading GPT header to get
partition info (size, first sector, last sector etc.). Now this offset
is hardcoded to 512 bytes and it is not valid for disks with sector
size different than 512 bytes because MBR and GPT headers are aligned
to LBA, so valid offset for 4k drives is 4096 bytes.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-12-12 14:26:22 -05:00
Pawel Baldysiak 329715091c Add function for getting member drive sector size
This patch introduces the function for getting sector size of
given device (fd).

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-17 09:24:18 -05:00
James Clarke 8e2bca513e Fix bus error when accessing MBR partition records
Since the MBR layout only has partition records as 2-byte aligned, the
32-bit fields in them are not aligned. Thus, they cannot be accessed on
some architectures (such as SPARC) by using a "struct MBR_part_record *"
pointer, as the compiler can assume that the pointer is properly aligned.
Instead, the records must be accessed by going through the MBR struct
itself every time.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-10-19 12:38:02 -04:00
Mariusz Dabrowski fa219dd26a Fix RAID metadata check
mdadm recognizes devices with partition table as part of an RAID array
and invalid warning message is displayed. After this fix proper warning
messages are being displayed for MBR/GPT disks and devices with RAID
metadata.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-09-22 11:35:02 -04:00
Jes Sorensen c5f71c2417 Introduce random_uuid() helper function
This gets rid of 5 nearly identical copies of the same code, and
reduces the binary size of mdadm by over 700 bytes on x86_64.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-08-15 15:41:34 -04:00
Jes Sorensen 9f0ad56be0 util: Never have if and return on the same line
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-08-11 15:48:47 -04:00
Mike Lovell 13db17bd1f Use dev_t for devnm2devid and devid2devnm
Commit 4dd2df0966 added a trip through makedev(), major(), and minor() for
device major and minor numbers. This would cause mdadm to fail in operating
on a device with a minor number bigger than (2^19)-1 due to it changing
from dev_t to a signed int and back.

Where this was found as a problem was when a array was created with a device
specified as a name like /dev/md/raidname and there were already 128 arrays
on the system. In this case, mdadm would chose 1048575 ((2^20)-1) for the
array and minor number. This would cause the major and minor number to become
negative when generated from devnm2devid() and passed to major() and minor()
in open_dev_excl(). open_dev_excl() would then call dev_open() which would
detect the negative minor number and call open() on the *char containing the
major:minor pair which isn't a valid file.

Signed-off-by: Mike Lovell <mlovell@bluehost.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-06-03 15:35:26 -04:00
Jes Sorensen 15d230f730 util: Remove unnecesary NULL pointer checks when calling sysfs_free()
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-03-08 12:19:03 -05:00
Guoqing Jiang 21f541cc31 Remove dead code about LKF_CONVERT flag
Since flags is only set as LKF_NOQUEUE, the code
with LKF_CONVERT flag should be delete.

Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-03-07 15:21:04 -05:00
Maxin B. John 986b868817 util.c: include poll.h instead of sys/poll.h
This fixes a compile warning when building with musl:

 In file included from util.c:27:0:
 |
 qemux86-64/usr/include/sys/poll.h:1:2:
 error: #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
 [-Werror=cpp]
 |  #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
 |   ^

Signed-off-by: Maxin B. John <maxin.john@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-02-08 10:59:00 -05:00
Xiao Ni 1d13b59960 Fix some type comparison problems
As 26714713cd said, 32 bit signed
timestamps will overflow in the year 2038. It already changed the
utime and ctime in struct mdu_array_info_s from int to unsigned
int. So we need to change the values that compared with them to
unsigned int too.

Signed-off-by : Xiao Ni <xni@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-02-08 10:49:22 -05:00
NeilBrown 7071320a18 Assorted fixed for a "make everything" build
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-28 13:28:58 +11:00
Guoqing Jiang 32539f74d2 util: fix wrong return value of cluster_get_dlmlock
Actually lksb.sb_status means that a node got the lock
or not instead of the return value of dlm_lock.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
2016-01-27 11:43:02 +11:00
Guoqing Jiang 81a8a69415 mdadm: improve the safeguard for change cluster raid's sb
This commit does the following jobs:

1. rename is_clustered to dlm_funs_ready since it match the
   function better.
2. st->cluster_name can't be use to identify the raid is a
   clustered or not, we should check the bitmap's version to
   perform the identification.
3. for cluster_get_dlmlock/cluster_release_dlmlock funcs, both
   of them just need the lockid as parameter since the cluster
   name can get by get_cluster_name().

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-17 09:53:37 +11:00
Guoqing Jiang e80357f825 Make cmap_* also has same policy as dlm_*
Let libcmap lib and related funs also only need one-time
setup during mdadm running period.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-21 11:19:35 +11:00
Guoqing Jiang d15a1f72bd Safeguard against writing to an active device of another node
Modifying an exiting device's superblock or creating a new superblock
on an existing device needs to be checked because the device could be
in use by another node in another array. So, we check this by taking
all superblock locks in userspace so that we don't  step onto an active
device used by another node and safeguard against accidental edits.
After the edit is complete, we release all locks and the lockspace so
that it can be used by the kernel space.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-21 11:19:05 +11:00
NeilBrown 7d55dca2cc mdassemble: don't try to perform cluster check.
mdassemble is meant to be small an simple, so avoid
trying to check for a cluster.
Currently it doesn't, but it still includes the code,
which doesn't build because the library isn't provided.

So just exclude the get_cluster_name code from mdassemble.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 11:53:01 +10:00
Guoqing Jiang 4de9091302 Add a new clustered disk
A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:

--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)

The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 09:21:29 +10:00
Guoqing Jiang 7716570e6d Set home-cluster while creating an array
The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.

If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.

neilb: allow code to compile when corosync-devel is not installed.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 09:06:30 +10:00
NeilBrown 330d6900bb Assemble: allow a RAID4 to assemble easily when parity devices is missing.
If the parity device of a RAID4 is missing, then there is no immediate
risk to data.  So it doesn't matter if the array is dirty or not.

This can be important when reshaping a RAID0, and is a much better
solution that that in the resent-reverted.
   b720636a58

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 09:39:02 +10:00
NeilBrown 7a862a020f Don't break long strings onto multiple lines.
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

 "It is OK to\n"
 "break this string\n"


Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:46:53 +11:00
NeilBrown 1ade5cc15a Consistently print program Name and __func__ in debug messages.
make dprintf() print program name and __func__, so that
this messaging is consistent.

Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:21:17 +11:00
NeilBrown 93d3bd3b28 util: remove rounding error where reporting "human sizes".
The division
 1<<20 / 200
is not exact, so dividing by this to convert bytes into half-megs
is wrong and results in incorrect output.

As we are doing "long long" arithmetic, there is no risk of an overflow
until we reach 64 petabytes.
So change to
   * 200 / (1<<20).

Reported-by: Jan Echternach <jan@goneko.de>
Resolved-debian-bug: 763917
URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763917
Signed-off-by: NeilBrown <neilb@suse.de>
2014-12-18 16:58:44 +11:00
NeilBrown cc742d3807 util: split get_maj_min() out from dev_open()
This allows other code to parse "8:3" style device names.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-08-11 10:34:36 +10:00
NeilBrown 85945e1986 install: use BINDIR consistently to locate mdadm and mdmon
Every place where the paths for mdadm or mdmon is explicit,
it should use the BINDIR setting, not "/sbin/".

Reported-by: member graysky <graysky@archlinux.us> (https://bugs.archlinux.org/task/37330)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-22 17:13:02 +10:00
Jes Sorensen 76d0f1886f Work around architectures having statfs.f_type defined as long
Having RAMFS_MAGIC defined as 0x858458f6 causing problems when trying
to compare it directly against statfs.f_type being cast from long to
unsigned long.

This hack is extremly ugly, but it should at least do the right thing
for every situation.

Thanks to Arnd Bergmann for suggesting the fix.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-20 09:24:27 +11:00
NeilBrown 8832342d3a Assemble/Incremental: don't hold O_EXCL on mddev after assembly.
As soon as the array is assembled, udev or systemd might run
fsck and mount it.  So we need to drop O_EXCL promptly.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-12-05 10:35:16 +11:00
NeilBrown 5dd29dafa2 Two small fixes related to enough()
1/ enough_fd doesn't use avail_disks any more, so discard it.

2/ Manage_Add increments 'found' at the wrong place, so it can
   waste time before calling enough().

Signed-off-by: NeilBrown <neilb@suse.de>
2013-12-05 08:58:21 +11:00
NeilBrown 357ac10678 IMSM metadata really should be ignored when found on partitions.
commit b31df43682
changed load_super_imsm to not insist on finding a partition if
ignore_hw_compat was set.
Unfortunately this is set for '--assemble' so arrays could get
assembled badly.

The comment says this was to allow e.g. --examine of image files.
A better fixes for this is to change test_partitions to not report
a regular file as being a partition.
The errors from the BLKPG ioctl are:

 ENOTTY : not a block device.
 EINVAL : not a whole device (probably a partition)
 ENXIO  : partition doesn't exist (so not a partition)

Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-20 10:49:14 +11:00
NeilBrown 6f02172d2e Release mdadm-3.3
(and  various cosmetic fixes)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-09-03 14:47:47 +10:00
NeilBrown 2f1bcf43d9 Make sure "mdmon" doesn't get called "@dmon".
The Anaconda installer (via its "loader" program) will try to kill
many processes at shutdown, but not "mdmon".

However when mdadm runs mdmon in the Anaconda environment, mdmon
sets argv[0][0] to '@' resulting in "@dmon" which confuses
"loader".

So change mdadm to set argv[0] to a path so that mdmon becomes e.g.
  "@usr/sbin/mdmon"
which "loader" will recognise as being "mdmon".

Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-09-02 11:02:09 +10:00
mwilck@arcor.de 7ac5d47e8a in_initrd: fix gcc compiler error
On some systems, this code caused a "comparison between signed
and unsigned" error.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-28 14:58:56 +10:00
NeilBrown 2538ba2abf Create: fix warning about pre-existing filesystems.
An ext[234] filesystem larger than 2TB was beign reported with
a negative size - which looks odd.

So fix it to use suitably large and unsigned values.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-08 09:16:43 +10:00
NeilBrown 9540cc244d test: ensure testing uses correct mdmon
When testing we want to run mdmon directly, not use
systemctl to get systemd to run it.

So allow an environment variable to make that choice.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-05 14:55:13 +10:00