Commit Graph

233 Commits

Author SHA1 Message Date
Mateusz Grzonka b71de056ce Correct checking if file descriptors are valid
In some cases file descriptors equal to 0 are treated as invalid.
Fix it.

Signed-off-by: Mateusz Grzonka <mateusz.grzonka@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2021-11-24 07:07:12 -05:00
Mateusz Grzonka b2e4f08414 Incremental: Close unclosed mdfd in IncrementalScan()
In addition to closing mdfd, propagate helpers to manage file
descriptors across IncrementalScan().

Signed-off-by: Mateusz Grzonka <mateusz.grzonka@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2021-11-24 07:06:12 -05:00
Mariusz Tkaczyk c7b8547c70 imsm: add verbose flag to compare_super
IMSM does more than comparing metadata and errors reported directly
from compare_super_imsm can be useful.

Add verbose flag to compare_super method and make all not critical
error printing configurable.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2021-03-08 10:43:29 -05:00
Mariusz Tkaczyk 69068584f9 Incremental: Remove redundant spare movement logic
If policy is set then mdmonitor is responsible for moving spares.
This logic is reduntant and potentialy dangerus, spare could be moved at
initrd stage depending on drives appearance order.

Remove it.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2020-12-20 13:45:30 -05:00
Mariusz Tkaczyk ff6bb131a4 mdadm: Unify forks behaviour
If mdadm is run by udev or systemd, it gets a pipe as each stream.
Forks in the background may run after an event or service has been
processed when udev is detached from pipe. As a result process
fails quietly if any message is written.
To prevent from it, each fork has to close all parent streams. Leave
stderr and stdout opened only for debug purposes.
Unify it across all forks. Introduce other descriptors detection by
scanning /proc/self/fd directory. Add generic method for
managing systemd services.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
2020-11-25 18:15:55 -05:00
Mariusz Dabrowski b068159891 mdadm: load default sysfs attributes after assemblation
Added new type of line to mdadm.conf which allows to specify values of
sysfs attributes for MD devices that should be loaded after the array is
assembled. Each line is interpreted as list of structures containing
sysname of MD device (md126 etc.) and list of sysfs attributes and their
values.

Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Krzysztof Smolinski <krzysztof.smolinski@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2019-07-10 16:12:09 -04:00
NeilBrown cd72f9d114 policy: support devices with multiple paths.
As new releases of Linux some time change the name of
a path, some distros keep "legacy" names as well.  This
is useful, but confuses mdadm which assumes each device has
precisely one path.

So change this assumption:  allow a disk to have several
paths, and allow any to match when looking for a policy
which matches a disk.

Reported-and-tested-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-12-06 07:43:19 -05:00
Mariusz Tkaczyk cb8f537135 Incremental: remove external arrays and devices correctly
Kernel returns EBUSY when device fail invokes array fail.
In external metadata if kernel returns it, mdadm doesn't stop member
arrays but it will try to stop container directly. It fails because
container still has working arrays, so udev remove is triggered.

Try to set faulty state on device in member arrays first. If kernel
returns EBUSY, stop this array. After that remove the device from
container.

In external metadata mdmon has to remove faulty devices from degraded
arrays, just remove device from container.

Raid5 array doesn't return EBUSY, it allows to remove every device.
Mdadm shouldn't block it.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-08-03 10:25:12 -04:00
Guoqing Jiang 898bd1ecef Free map to avoid resource leak issues
1. There are some places which didn't free map as
discovered by coverity.

CID 289661 (#1 of 1): Resource leak (RESOURCE_LEAK)12. leaked_storage: Variable mapl going out of scope leaks the storage it points to.
CID 289619 (#3 of 3): Resource leak (RESOURCE_LEAK)63. leaked_storage: Variable map going out of scope leaks the storage it points to.
CID 289618 (#1 of 1): Resource leak (RESOURCE_LEAK)26. leaked_storage: Variable map going out of scope leaks the storage it points to.
CID 289607 (#1 of 1): Resource leak (RESOURCE_LEAK)41. leaked_storage: Variable map going out of scope leaks the storage it points to.

2. If we call map_by_* inside a loop, then map_free
should be called in the same loop, and it is better
to set map to NULL after free.

3. And map_unlock is always called with map_lock,
if we don't call map_remove before map_unlock,
then the memory (allocated by  map_lock -> map_read
-> map_add -> xmalloc) could be leaked. So we
need to free it in map_unlock as well.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2018-06-11 06:35:41 -04:00
NeilBrown 3bc6f786e1 Incremental: Use ->validate_geometry instead of ->avail_size
Since mdadm 3.3 is has not been correct to call ->avail_size if
metadata hasn't been read from the device.  ->validate_geometry
should be used instead.

Unfortunately array_try_spare() didn't get the memo, and it can crash
when adding a spare with no metdata.

So change it to use ->validate_geometry().

Only one place remains that uses ->avail_size(), and that is safe.

Also fix a comment with a typo.

Reported-and-tested-by: Bjørnar Ness <bjornar.ness@gmail.com>
Fixes: 641da74591 ("super1: separate to version of _avail_space1().")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-11-01 17:26:37 -04:00
Song Liu 3b8c712755 mdadm: set journal_clean after scanning all disks
Summary:
In Incremental.c:count_active(), max_events is tracked to show to
which devices are up to date. If a device has events==max_events+1,
getinfo_super() is called to reload the superblock from this
device. getinfo_super1() blindly set journal_clean to 0, which is
wrong.

This patch fixes this by tracking max_journal_events for all the
disks. After scanning all disks, journal_clean is set if
max_journal_events >= max_events-1.

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-09-01 11:12:16 -04:00
Tomasz Majchrzak b13b52c80f Get failed disk count from array state
Recent commit has changed the way failed disks are counted. It breaks
recovery for external metadata arrays as failed disks are not part of
the array and have no corresponding entries is sysfs (they are only
reported for containers) so degraded arrays show no failed disks.

Recent commit overwrites GET_DEGRADED result prior to GET_STATE and it
is not set again if GET_STATE has not been requested. As GET_STATE
provides the same information as GET_DEGRADED, the latter is not needed
anymore. Remove GET_DEGRADED option and replace it with GET_STATE
option.

Don't count number of failed disks looking at sysfs entries but
calculate it at the end. Do it only for arrays as containers report
no disks, just spares.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-06-05 11:11:36 -04:00
Alexey Obitotskiy 4b57ecf6ce Add sector size as spare selection criterion
Add sector size as new spare selection criterion. Assume that 0 means
there is no requirement for the sector size in the array. Skip disks
with unsuitable sector size when looking for a spare to move across
containers.

Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-09 14:18:38 -04:00
Alexey Obitotskiy fbfdcb06dc Allow more spare selection criteria
Disks can be moved across containers in order to be used as a spare
drive for reubild. At the moment the only requirement checked for such
disk is its size (if it matches donor expectations). In order to
introduce more criteria rename corresponding superswitch method to more
generic name and move function parameter to a structure. This change is
a big edit but it doesn't introduce any changes in code logic, it just
updates function naming and parameters.

Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-09 14:18:36 -04:00
Jes Sorensen 00e56fd953 IncrementalScan: Use md_array_active() instead of md_get_array_info()
This eliminates yet another case where GET_ARRAY_INFO was used to
indicate whether the array was active.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-05 12:18:29 -04:00
Jes Sorensen 74d293a253 container_members_max_degradation: Switch to using syfs for disk info
With sysfs now providing the necessary active_disks info, switch to
sysfs and eliminate one more use of md_get_array_info(). We can do
this unconditionally since we wouldn't get here witout sysfs being
available.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-05 12:06:57 -04:00
Jes Sorensen c2d1a6ec6b Incremental: return is not a function
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-05 11:39:58 -04:00
Zhilong Liu 9e04ac1c43 mdadm/util: unify stat checking blkdev into function
declare function stat_is_blkdev() to integrate repeated stat
checking blkdev operations, it returns 'true/1' when it is a
block device, and returns 'false/0' when it isn't.
The devname is necessary parameter, *rdev is optional, parse
the pointer of dev_t *rdev, if valid, assigned device number
to dev_t *rdev, if NULL, ignores.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-05 11:05:32 -04:00
Zhilong Liu 0a6bff09d4 mdadm/util: unify fstat checking blkdev into function
declare function fstat_is_blkdev() to integrate repeated fstat
checking block device operations, it returns true/1 when it is
a block device, and returns false/0 when it isn't.
The fd and devname are necessary parameters, *rdev is optional,
parse the pointer of dev_t *rdev, if valid, assigned the device
number to dev_t *rdev, if NULL, ignores.

Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-05 11:04:02 -04:00
Jes Sorensen 6921010d95 Incremental: Use md_array_active() to determine state of array
One less call to md_get_array_info()

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-02 10:36:51 -04:00
NeilBrown cd6cbb08c4 Create: tell udev md device is not ready when first created.
When an array is created the content is not initialized,
so it could have remnants of an old filesystem or md array
etc on it.
udev will see this and might try to activate it, which is almost
certainly not what is wanted.

So create a mechanism for mdadm to communicate with udev to tell
it that the device isn't ready.  This mechanism is the existance
of a file /run/mdadm/created-mdXXX where mdXXX is the md device name.

When creating an array, mdadm will create the file.
A new udev rule file, 01-md-raid-creating.rules, will detect the
precense of thst file and set ENV{SYSTEMD_READY}="0".
This is fairly uniformly used to suppress actions based on the
contents of the device.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-05-02 09:41:39 -04:00
Jes Sorensen f8c432bfc9 Incremental: Cleanup some if() statement spaghetti
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-04-25 15:07:26 -04:00
Jes Sorensen ff4ad24b1c Incremental: Use md_array_active() where applicable
md_get_array_info() == 0 implies an array is active, however this is more
correct.

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
2017-04-25 14:57:46 -04:00
Jes Sorensen dae131379f sysfs: Make sysfs_init() return an error code
Rather than have the caller inspect the returned content, return an
error code from sysfs_init(). In addition make all callers actually
check it.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-30 16:52:37 -04:00
Jes Sorensen 5b13d2e1fb Incremental: Remove redundant call for GET_ARRAY_INFO
The code above just called md_get_array_info() and only reached this
point if it returned an error that isn't ENODEV, so it's pointless to
check this again here.

In addition it was incorrectly retrieving ioctl data into a
mdu_bitmap_file_t instead of mdu_array_info_t.

Fixes: ("8382f19 Add new mode: --incremental")
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 14:40:36 -04:00
Jes Sorensen 9cd39f0155 util: Introduce md_get_array_info()
Remove most direct ioctl calls for GET_ARRAY_INFO, except for one,
which will be addressed in the next patch.

This is the start of the effort to clean up the use of ioctl calls and
introduce a more structured API, which will use sysfs and fall back to
ioctl for backup.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 14:35:41 -04:00
Artur Paszkiewicz e97a7cd011 super1: PPL support
Enable creating and assembling raid5 arrays with PPL for 1.x metadata.

When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.

While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 11:33:52 -04:00
NeilBrown e22fe3ae15 Introduce enum flag_mode for setting and clearing flags.
We currently use '1' to indicate that a flag (writemostly or failfast)
needs to be set, and '2' to indicate that it needs to be cleared.

Using magic number like this is not a best-practice.

So replaced them with values from a enum.

No functional change.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-29 17:12:13 -05:00
NeilBrown 71574efb07 Add failfast support.
Allow per-device "failfast" flag to be set when creating an
array or adding devices to an array.

When re-adding a device which had the failfast flag, it can be removed
using --nofailfast.

failfast status is printed in --detail and --examine output.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-28 08:50:36 -05:00
Artur Paszkiewicz c012223056 Incremental: don't try to load_container() for a subarray
mdadm -IRs would exit with a non-zero status because of this.

Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-08-09 10:57:15 -04:00
Jes Sorensen fe112c9eba Incremental: Remove unnecesary NULL pointer checks when calling sysfs_free()
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-03-08 12:19:03 -05:00
NeilBrown a0d12d51a7 Merge branch 'fix-unlikely-potential-overflows' of https://github.com/sjvs/mdadm 2015-12-21 13:01:10 +11:00
Guoqing Jiang 41dbb4da22 mdadm: let cluster raid could also add disk within incremental mode
For cluster raid, the disc.state need to be changed accordingly under
incremental mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 13:23:54 +11:00
Bas van Schaik fa9aca4930 avoid confusion with parameter 'devname' with same name, ensure buffer is large enough for two ints plus extras 2015-12-03 13:48:46 +00:00
Bas van Schaik a90ed30e74 ensure buffer is large enough for two ints and some extras 2015-12-03 13:48:37 +00:00
Song Liu 051f326550 mdadm: refactor write journal code in Assemble and Incremental
As discussed, standalone require_journal() in struct superswitch
is not a very good idea. Instead, journal related information
fits well in struct mdinfo.

This patch simplifies journal support code in Assemble and
Incremental as:

- Add journal_device_required and journal_clean to struct mdinfo;
- Remove function require_journal from struct superswitch;
- Update Assemble and Incremental to use journal_device_required
and journal_clean from struct mdinfo (instead of separate var).

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-22 12:19:09 +11:00
Song Liu 5c6ad21150 Check write journal in incremental
If journal device is missing, do not start the array, and shows:

./mdadm -I /dev/sdf
mdadm: journal device is missing, not safe to start yet.

The array will be started when the journal device is attached with -I

./mdadm -I /dev/sdb1
mdadm: /dev/sdb1 attached to /dev/md/0_0, which has been started.

To force start without journal device:

./mdadm -I /dev/sdf --run
mdadm: Trying to run with missing journal device
mdadm: /dev/sdf attached to /dev/md/0_0, which has been started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:18 +11:00
Goldwyn Rodrigues 9d9202e301 Fix --incremental handling on cluster array.
Commit 06bd679317 ("Skip clustered devices in incremental")
disabled incremental completely on clustered arrays.
What we really want is that mdadm should not start or create
a clustered array but still be able to add or readd to an existing
device. This would enable udev scripts to automatically add
or re-add a device after transient errors.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-28 14:42:55 +10:00
NeilBrown 5997585200 Merge branch 'mdadm-3.3.x' 2015-08-03 16:21:37 +10:00
NeilBrown 8360760457 Assemble: really don't assemble IMSM array without OROM.
Previous patch missed on case.

Also print more useful information when rejecting
a device with IMSM metadata.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 16:06:51 +10:00
NeilBrown 7eee461e91 Assemble: don't assemble IMSM array without OROM.
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 15:42:16 +10:00
NeilBrown 9f2e55a421 Assemble: don't assemble IMSM array without OROM.
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.

So don't assemble if OROM doesn't confirm it is OK.

There can still be problems for crash-dump not being able to find
the OROM.   Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-29 14:38:37 +10:00
NeilBrown 653299b699 Merge branch 'cluster'
Now that 3.3.3 is out, it is time to include the cluster-support code.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-27 11:01:08 +10:00
NeilBrown 9581efb1ae mdstat: discard 'dev' field, just use 'devnm'
These both have the same value, and have done since the
'devnm' concept was introduced.
So discard the pointless duplicate.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-07-02 08:15:10 +10:00
Guoqing Jiang 06bd679317 Skip clustered devices in incremental
We want the clustered devices to be started exclusively by a cluster
resource-agent. So, avoid starting using the incremental option.

This also skips a clustered md from starting during boot in inactive mode.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 09:33:18 +10:00
Pawel Baldysiak 4d149ab517 IncRemove: Set "auto-read" only after successful excl open.
"mdadm -If" - triggered from udev rules when disk is removed from OS -
tries to set array in auto-read-only mode. This can interrupt rebuild
process which is started automatically, e.g. if array is mounted and
spare disk is available (I/O error is detected faster than removing
failed disk by mdadm).
This patch prevents "mdadm -If" from setting array into "auto-read-only",
by requiring exclusive open to succeed.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:59:53 +11:00
Jes Sorensen 5d94384e93 IncrementalScan(): Make sure 'st' is valid before dereferencing it
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 15:56:46 +11:00
NeilBrown 7a862a020f Don't break long strings onto multiple lines.
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

 "It is OK to\n"
 "break this string\n"


Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:46:53 +11:00
NeilBrown 1ade5cc15a Consistently print program Name and __func__ in debug messages.
make dprintf() print program name and __func__, so that
this messaging is consistent.

Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:21:17 +11:00
Pawel Baldysiak d56dd607ba Change way of printing name of a process
Sometimes mdadm prints messages with wrong name "mdmon",
and vice versa.
This patch solves this problem by changing method of determining
process name.
Now "Name" will be set in const at start of a program,
previously was hardcoded as #define.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 12:11:01 +11:00