map_num() returns NULL if key is not defined. This patch adds
alternative, non NULL version for cases where NULL is not expected.
There are many printf() calls where map_num() is called on variable
without NULL verification. It works, even if NULL is passed because
gcc is able to ignore NULL argument quietly but the behavior is
undefined. For safety reasons such usages will use map_num_s() now.
It is a potential point of regression.
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
In some cases file descriptors equal to 0 are treated as invalid.
Fix it.
Signed-off-by: Mateusz Grzonka <mateusz.grzonka@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
During assemblation container with quiet flag, sysfs rules are not applied.
This commit makes sysfs_rules_apply() independent from verbose condition.
Signed-off-by: Kinga Tanska <kinga.tanska@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
This fixes '03r0assem' test as assembly fails when looking for specific
uuid among the device list.
Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
The case when array is already degraded has been omitted
by commit 7b99edab28 ("Assemble.c: respect force flag.").
Appropriative support has been added now.
Handlers for "run" and "force" have been divided into independent
routines. Especially force has to be as meaningless as possible.
It respects following rules:
- user agrees to start array as degraded (by --run) or is already
degraded
- raid456 module is in use
- some drives are missing (to limit potential abuses)
It doesn't allow to skip resync on dirty, but not degraded array.
This patch cleans up message generation for external array and makes it
consistent. Following code could be reused also for native.
In current implementation assemble_container_content is called once, in
both Incremental or Assembly mode. Thus makes that partial assembly is
not likely to happen. It is possible, but requires user input.
Partial assembly during reshape fails (sysfs_set_array
error - not yet investigated). For now I put FIXME to mark current
logic as known to be buggy because preexist_cnt contains both exp_cnt
and new_cnt which may produce an incorrect message.
Check for new disks and runstop is unnecessary, so has been removed.
This allows to print assemble status in every case, even if nothing new
happens.
Reported-by: Devon Beets <devon@sigmalabsinc.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
The patch enables the implementation of a write-intent bitmap for external
metadata.
Configuration of the internal bitmaps for non-native metadata requires the
extension in superswitch to perform an additional sysfs setup before the
array is activated.
Signed-off-by: Jakub Radtke <jakub.radtke@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
IMSM does more than comparing metadata and errors reported directly
from compare_super_imsm can be useful.
Add verbose flag to compare_super method and make all not critical
error printing configurable.
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Manual assembly with existing link caused overwriting
this link. Add checking link and block this situation.
Signed-off-by: Kinga Tanska <kinga.tanska@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
If the array is dirty handler will set resync_start to 0 to inform kernel
that resync is needed. RWH affects only raid456 module, for other
levels array will be started even array is degraded and resync cannot be
performed.
Force is really meaningful for raid456. If array is degraded and resync
is requested, kernel will reject an attempt to start the array. To
respect force, it has to be marked as clean (this will be done for each
array without PPL) and remove the resync request (only for raid 456).
Data corruption may occur so proper warning is added.
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
When mdadm tries to assemble one working device and one zeroed-out device,
it failed but print successful message because there is --uuid option.
Following script always reproduce it.
dd if=/dev/zero of=/dev/ram0 oflag=direct
dd if=/dev/zero of=/dev/ram1 oflag=direct
./mdadm -C /dev/md111 -e 1.2 --uuid="12345678:12345678:12345678:12345678" \
-l1 -n2 /dev/ram0 /dev/ram1
./mdadm -S /dev/md111
dd if=/dev/zero of=/dev/ram1 oflag=direct
./mdadm -A /dev/md111 --uuid="12345678:12345678:12345678:12345678" \
/dev/ram0 /dev/ram1
Following is message from mdadm.
mdadm: No super block found on /dev/ram1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram1
mdadm: /dev/md111 assembled from 1 drive - need all 2 to start it (use --run to insist).
The mdadm say that it assembled but mdadm does not create /dev/md111.
The message is wrong.
After applying this patch, mdadm reports error correctly as following.
mdadm: No super block found on /dev/ram1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/ram1
mdadm: /dev/ram1 has no superblock - assembly aborted
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
If you have a RAID0 array with varying sized devices
on a kernel before 5.4, you cannot assembling it on
5.4 or later without explicitly setting the layout.
This is now possible with
--update=layout-original (For 3.13 and earlier kernels)
or
--update=layout-alternate (for 3.14 and later kernels)
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Added new type of line to mdadm.conf which allows to specify values of
sysfs attributes for MD devices that should be loaded after the array is
assembled. Each line is interpreted as list of structures containing
sysname of MD device (md126 etc.) and list of sysfs attributes and their
values.
Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com>
Signed-off-by: Krzysztof Smolinski <krzysztof.smolinski@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
If array was stopped during reshape initialization,
there might be a "0" checkpoint recorded in metadata.
If array with such condition (reshape with position 0)
is passed to kernel - it will refuse to start such array.
Treat such array as normal during assemble, Grow_continue() will
reinitialize and start the reshape.
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
If devices[].i.disk.state has MD_DISK_FAILFAST or MD_DISK_WRITEMOSTLY
flag, it cannot be the most recent device. Both flags should be masked
before checking the state.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Before updating superblock of slave disks, desired_state value
is set for the target state of the slave disks. But it forgets
to check MD_DISK_FAILFAST and MD_DISK_WRITEMOSTLY flags. Then
start_arrays() calls ADD_NEW_DISK ioctl-call and pass the state
without MD_DISK_FAILFAST and MD_DISK_WRITEMOSTLY.
Currenlty it does not generate any problem because kernel does not
care MD_DISK_FAILFAST or MD_DISK_WRITEMOSTLY flags.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Since load_devices frees "devices" when it can't find any
device, we should set it to NULL to avoid double free issue
which can be reproduced by below steps:
mdadm -CR /dev/md/vol -l0 -e 1.2 -n2 /dev/sd[b-c] --assume-clean
mdadm -Ss
mdadm -A /dev/md127 /dev/sd[b-c] --update metadata
Reported-by: Tkaczyk Mariusz <mariusz.tkaczyk@intel.com>
Tested-by: Tkaczyk Mariusz <mariusz.tkaczyk@intel.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Like other failure cases in load_devices, we need
to free those resources as well.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
For HA product, RA (resource agent) assembles cluster raid
through call below cmd:
$MDADM --assemble $mddev --config=$RAIDCONF $MDADM_HOMEHOST
Sometimes node can't assemble array because all the nodes
need to contend dlm lock, which causes node fence in automatic
test.
And in fact, we don't need the protection since the assemble
cmd called by RA doesn't change superblock, so revert the
commit 76781701a4 ("Assemble:
provide protection when clustered raid do assemble") to remove
unneccessary protection.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
1. There are some places which didn't free map as
discovered by coverity.
CID 289661 (#1 of 1): Resource leak (RESOURCE_LEAK)12. leaked_storage: Variable mapl going out of scope leaks the storage it points to.
CID 289619 (#3 of 3): Resource leak (RESOURCE_LEAK)63. leaked_storage: Variable map going out of scope leaks the storage it points to.
CID 289618 (#1 of 1): Resource leak (RESOURCE_LEAK)26. leaked_storage: Variable map going out of scope leaks the storage it points to.
CID 289607 (#1 of 1): Resource leak (RESOURCE_LEAK)41. leaked_storage: Variable map going out of scope leaks the storage it points to.
2. If we call map_by_* inside a loop, then map_free
should be called in the same loop, and it is better
to set map to NULL after free.
3. And map_unlock is always called with map_lock,
if we don't call map_remove before map_unlock,
then the memory (allocated by map_lock -> map_read
-> map_add -> xmalloc) could be leaked. So we
need to free it in map_unlock as well.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Since commit 20dc76d15b ("imsm: Set disk slot number") mdadm
sets slot number for each disk in imsm array. Now auto-assemble determines
devices using slot number and ignores devices on the same slot that have
older generation number.
It causes infinit loop if failed device is still visible in system
(it has metadata, but it is not merged with exisiting array).
To avoid it, out-of-sync device should be added to the best[]. Later
mdadm adds it as spare to the container.
Imsm doesn't support disk replacement feature, so it can use rooms for
replacements.
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
There are some failure paths which share common codes
before return, so simplify them by move common codes
to the end of function, and just goto out in case
failure happened.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
The previous patch provides protection for other modes
such as CREATE, MANAGE, GROW and INCREMENTAL. And for
ASSEMBLE mode, we also need to protect during the process
of assemble clustered raid.
However, we can only know the array is clustered or not
when the metadata is ready, so the lock_cluster is called
after select_devices(). And we could re-read the metadata
when doing auto-assembly, so refresh the locking.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
With "--force", we can assemble the array even if some superblocks
appear out-of-date. But their data layout is regarded to make sense.
In reshape cases, if two devices claims different reshape progresses,
we cannot forcely assemble them back to array. Kernel will treat only
one of them as reshape progress. However, their data is still laid on
different layouts. It may lead to disaster if reshape goes on.
Reproducible Steps:
mdadm -C /dev/md0 --assume-clean -l5 -n3 /dev/loop[012]
mdadm -a /dev/md0 /dev/loop3
mdadm -G /dev/md0 -n4
mdadm -f /dev/md0 /dev/loop0 # after a period
mdadm -S /dev/md0 # after another period
mdadm -E /dev/loop[01] # make sure that they claims different ones
mdadm -Af -R /dev/md0 /dev/loop[023] # give no enough devices for
force_array() to pick non-fresh devices
cat /sys/block/md0/md/reshape_position # You can see that Kernel resume
reshape the from any progress of them.
Note: The unit of mdadm -E is KB, but reshape_position's is sector.
In order to prevent disaster, we add logics to prevent devices with
different reshape progress from being added into the array.
Reported-by: Allen Peng <allenpeng@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
declare function stat_is_blkdev() to integrate repeated stat
checking blkdev operations, it returns 'true/1' when it is a
block device, and returns 'false/0' when it isn't.
The devname is necessary parameter, *rdev is optional, parse
the pointer of dev_t *rdev, if valid, assigned device number
to dev_t *rdev, if NULL, ignores.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
declare function fstat_is_blkdev() to integrate repeated fstat
checking block device operations, it returns true/1 when it is
a block device, and returns false/0 when it isn't.
The fd and devname are necessary parameters, *rdev is optional,
parse the pointer of dev_t *rdev, if valid, assigned the device
number to dev_t *rdev, if NULL, ignores.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
When an array is created the content is not initialized,
so it could have remnants of an old filesystem or md array
etc on it.
udev will see this and might try to activate it, which is almost
certainly not what is wanted.
So create a mechanism for mdadm to communicate with udev to tell
it that the device isn't ready. This mechanism is the existance
of a file /run/mdadm/created-mdXXX where mdXXX is the md device name.
When creating an array, mdadm will create the file.
A new udev rule file, 01-md-raid-creating.rules, will detect the
precense of thst file and set ENV{SYSTEMD_READY}="0".
This is fairly uniformly used to suppress actions based on the
contents of the device.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
mdassemble doesn't handle container based arrays, no support for sysfs,
etc. It has not been actively maintained for years, so time to send it
off to retirement.
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
At this point in the code, we know we have a valid array, and any
recent kernel will return 9003, so no point in querying the kernel for
this.
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Any kernel released during the last decade will return 9003 from
md_get_version() so no point in checking that.
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Rather than have the caller inspect the returned content, return an
error code from sysfs_init(). In addition make all callers actually
check it.
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
This can be used with --assemble for super1 and with --update-subarray
for imsm to enable or disable PPL in the metadata.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Enable creating and assembling raid5 arrays with PPL for 1.x metadata.
When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.
While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Enable creating and assembling IMSM raid5 arrays with PPL. Update the
IMSM metadata format to include new fields used for PPL.
Add structures for PPL metadata. They are used also by super1 and shared
with the kernel, so put them in md_p.h.
Write the initial empty PPL header when creating an array. When
assembling an array with PPL, validate the PPL header and in case it is
not correct allow to overwrite it if --force was provided.
Write the PPL location and size for a device to the new rdev sysfs
attributes 'ppl_sector' and 'ppl_size'. Enable PPL in the kernel by
writing to 'consistency_policy' before the array is activated.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
This gets rid of 5 nearly identical copies of the same code, and
reduces the binary size of mdadm by over 700 bytes on x86_64.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
assemble_container_content() doesn't need a dummy NULL pointer
variable for calling map_update. Passing NULL directly is sufficient.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
A simple revert doesn't work here because the reshape_position is
in the critical section.
The best approach is to let the reshape progress a bit and then
go backwards.
If that isn't possible, assembling with --update=revert-reshape and
--invalid-backup should work.
Reported-by-tested-by: George Rapp <george.rapp@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
In Assemble, getinfo_super() over-writes journal_clean. To
ensure correct journal_clean, keep it in a local variable
before getinfo_super().
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
As discussed, standalone require_journal() in struct superswitch
is not a very good idea. Instead, journal related information
fits well in struct mdinfo.
This patch simplifies journal support code in Assemble and
Incremental as:
- Add journal_device_required and journal_clean to struct mdinfo;
- Remove function require_journal from struct superswitch;
- Update Assemble and Incremental to use journal_device_required
and journal_clean from struct mdinfo (instead of separate var).
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Example output:
./mdadm --assemble /dev/md0 /dev/sd[c-f] /dev/sdb1
mdadm: /dev/md0 has been started with 4 drives and 1 journal.
mdadm checks superblock for journal devices. If the journal device
is missing or faulty, mdadm will show warning
./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1
mdadm: Not safe to assemble with missing or stale journal device, consider --force.
User can insist to start the array (read only) with --force
./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1 --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md0 has been started with 15 drives.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>