For HA product, RA (resource agent) assembles cluster raid
through call below cmd:
$MDADM --assemble $mddev --config=$RAIDCONF $MDADM_HOMEHOST
Sometimes node can't assemble array because all the nodes
need to contend dlm lock, which causes node fence in automatic
test.
And in fact, we don't need the protection since the assemble
cmd called by RA doesn't change superblock, so revert the
commit 76781701a4 ("Assemble:
provide protection when clustered raid do assemble") to remove
unneccessary protection.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
1. There are some places which didn't free map as
discovered by coverity.
CID 289661 (#1 of 1): Resource leak (RESOURCE_LEAK)12. leaked_storage: Variable mapl going out of scope leaks the storage it points to.
CID 289619 (#3 of 3): Resource leak (RESOURCE_LEAK)63. leaked_storage: Variable map going out of scope leaks the storage it points to.
CID 289618 (#1 of 1): Resource leak (RESOURCE_LEAK)26. leaked_storage: Variable map going out of scope leaks the storage it points to.
CID 289607 (#1 of 1): Resource leak (RESOURCE_LEAK)41. leaked_storage: Variable map going out of scope leaks the storage it points to.
2. If we call map_by_* inside a loop, then map_free
should be called in the same loop, and it is better
to set map to NULL after free.
3. And map_unlock is always called with map_lock,
if we don't call map_remove before map_unlock,
then the memory (allocated by map_lock -> map_read
-> map_add -> xmalloc) could be leaked. So we
need to free it in map_unlock as well.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
Since commit 20dc76d15b ("imsm: Set disk slot number") mdadm
sets slot number for each disk in imsm array. Now auto-assemble determines
devices using slot number and ignores devices on the same slot that have
older generation number.
It causes infinit loop if failed device is still visible in system
(it has metadata, but it is not merged with exisiting array).
To avoid it, out-of-sync device should be added to the best[]. Later
mdadm adds it as spare to the container.
Imsm doesn't support disk replacement feature, so it can use rooms for
replacements.
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
There are some failure paths which share common codes
before return, so simplify them by move common codes
to the end of function, and just goto out in case
failure happened.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
The previous patch provides protection for other modes
such as CREATE, MANAGE, GROW and INCREMENTAL. And for
ASSEMBLE mode, we also need to protect during the process
of assemble clustered raid.
However, we can only know the array is clustered or not
when the metadata is ready, so the lock_cluster is called
after select_devices(). And we could re-read the metadata
when doing auto-assembly, so refresh the locking.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
With "--force", we can assemble the array even if some superblocks
appear out-of-date. But their data layout is regarded to make sense.
In reshape cases, if two devices claims different reshape progresses,
we cannot forcely assemble them back to array. Kernel will treat only
one of them as reshape progress. However, their data is still laid on
different layouts. It may lead to disaster if reshape goes on.
Reproducible Steps:
mdadm -C /dev/md0 --assume-clean -l5 -n3 /dev/loop[012]
mdadm -a /dev/md0 /dev/loop3
mdadm -G /dev/md0 -n4
mdadm -f /dev/md0 /dev/loop0 # after a period
mdadm -S /dev/md0 # after another period
mdadm -E /dev/loop[01] # make sure that they claims different ones
mdadm -Af -R /dev/md0 /dev/loop[023] # give no enough devices for
force_array() to pick non-fresh devices
cat /sys/block/md0/md/reshape_position # You can see that Kernel resume
reshape the from any progress of them.
Note: The unit of mdadm -E is KB, but reshape_position's is sector.
In order to prevent disaster, we add logics to prevent devices with
different reshape progress from being added into the array.
Reported-by: Allen Peng <allenpeng@synology.com>
Reviewed-by: Alex Wu <alexwu@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
declare function stat_is_blkdev() to integrate repeated stat
checking blkdev operations, it returns 'true/1' when it is a
block device, and returns 'false/0' when it isn't.
The devname is necessary parameter, *rdev is optional, parse
the pointer of dev_t *rdev, if valid, assigned device number
to dev_t *rdev, if NULL, ignores.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
declare function fstat_is_blkdev() to integrate repeated fstat
checking block device operations, it returns true/1 when it is
a block device, and returns false/0 when it isn't.
The fd and devname are necessary parameters, *rdev is optional,
parse the pointer of dev_t *rdev, if valid, assigned the device
number to dev_t *rdev, if NULL, ignores.
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
When an array is created the content is not initialized,
so it could have remnants of an old filesystem or md array
etc on it.
udev will see this and might try to activate it, which is almost
certainly not what is wanted.
So create a mechanism for mdadm to communicate with udev to tell
it that the device isn't ready. This mechanism is the existance
of a file /run/mdadm/created-mdXXX where mdXXX is the md device name.
When creating an array, mdadm will create the file.
A new udev rule file, 01-md-raid-creating.rules, will detect the
precense of thst file and set ENV{SYSTEMD_READY}="0".
This is fairly uniformly used to suppress actions based on the
contents of the device.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
mdassemble doesn't handle container based arrays, no support for sysfs,
etc. It has not been actively maintained for years, so time to send it
off to retirement.
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
At this point in the code, we know we have a valid array, and any
recent kernel will return 9003, so no point in querying the kernel for
this.
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Any kernel released during the last decade will return 9003 from
md_get_version() so no point in checking that.
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Rather than have the caller inspect the returned content, return an
error code from sysfs_init(). In addition make all callers actually
check it.
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
This can be used with --assemble for super1 and with --update-subarray
for imsm to enable or disable PPL in the metadata.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Enable creating and assembling raid5 arrays with PPL for 1.x metadata.
When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.
While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Enable creating and assembling IMSM raid5 arrays with PPL. Update the
IMSM metadata format to include new fields used for PPL.
Add structures for PPL metadata. They are used also by super1 and shared
with the kernel, so put them in md_p.h.
Write the initial empty PPL header when creating an array. When
assembling an array with PPL, validate the PPL header and in case it is
not correct allow to overwrite it if --force was provided.
Write the PPL location and size for a device to the new rdev sysfs
attributes 'ppl_sector' and 'ppl_size'. Enable PPL in the kernel by
writing to 'consistency_policy' before the array is activated.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
This gets rid of 5 nearly identical copies of the same code, and
reduces the binary size of mdadm by over 700 bytes on x86_64.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
assemble_container_content() doesn't need a dummy NULL pointer
variable for calling map_update. Passing NULL directly is sufficient.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
A simple revert doesn't work here because the reshape_position is
in the critical section.
The best approach is to let the reshape progress a bit and then
go backwards.
If that isn't possible, assembling with --update=revert-reshape and
--invalid-backup should work.
Reported-by-tested-by: George Rapp <george.rapp@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.com>
In Assemble, getinfo_super() over-writes journal_clean. To
ensure correct journal_clean, keep it in a local variable
before getinfo_super().
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
As discussed, standalone require_journal() in struct superswitch
is not a very good idea. Instead, journal related information
fits well in struct mdinfo.
This patch simplifies journal support code in Assemble and
Incremental as:
- Add journal_device_required and journal_clean to struct mdinfo;
- Remove function require_journal from struct superswitch;
- Update Assemble and Incremental to use journal_device_required
and journal_clean from struct mdinfo (instead of separate var).
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Example output:
./mdadm --assemble /dev/md0 /dev/sd[c-f] /dev/sdb1
mdadm: /dev/md0 has been started with 4 drives and 1 journal.
mdadm checks superblock for journal devices. If the journal device
is missing or faulty, mdadm will show warning
./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1
mdadm: Not safe to assemble with missing or stale journal device, consider --force.
User can insist to start the array (read only) with --force
./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1 --force
mdadm: Journal is missing or stale, starting array read only.
mdadm: /dev/md0 has been started with 15 drives.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Previous patch missed on case.
Also print more useful information when rejecting
a device with IMSM metadata.
Signed-off-by: NeilBrown <neilb@suse.com>
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.
So don't assemble if OROM doesn't confirm it is OK.
There can still be problems for crash-dump not being able to find
the OROM. Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.
Signed-off-by: NeilBrown <neilb@suse.com>
If someone has an IMSM array, and disables RAID in the BIOS
and uses the devices for some other purpose, then they really don't
want mdadm to start syncing the array.
So don't assemble if OROM doesn't confirm it is OK.
There can still be problems for crash-dump not being able to find
the OROM. Some explicit work-around might be needed for that
rather than a more general workaround that can corrupt data.
Signed-off-by: NeilBrown <neilb@suse.com>
If the name in the array has a home-host, then
require that it matches, or is "any", or requested
homehost is "any".
Signed-off-by: NeilBrown <neilb@suse.com>
Earlier patch:
56fcbcbb6f
calculated the proper chunk size - but didn't use it..
Let's actually use it this time.
Signed-off-by: NeilBrown <neilb@suse.com>
This extends nodes option for assemble mode, make the num of
cluster node could be change by user.
Before that, it is necessary to ensure there are enough space
for those nodes, calc_bitmap_size is introduced to calculate
the bitmap size of each node.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
To support change the cluster name, the commit do the followings:
1. extend original write_bitmap function for new scenario.
2. add the scenarion to handle the modification of cluster's name
in write_bitmap1.
3. let the cluster name also show in examine_super1 and detail_super1
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
This is a very corner-case, but the self-tests tripped on it,
and it makes sense not to trust the uuid when it is being changed.
Signed-off-by: NeilBrown <neilb@suse.de>
Normally we do not "force"-assemble devices which are in the
middle of recovery, as they are unlikely to have useful data.
However, when a reshape increases the number of devices,
the newly added devices appear to be recovering because they
do not have complete data on them yet, but then they aren't expected
to until the reshape completes.
So in this case, it can be appropriate to force-assemble them.
Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
This reverts commit b720636a58.
As it said, this was a hack. It causes problems when trying to
--force assemble a RAID4. There is a better way.
Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
Since we introduced replacement devices, the 'i' used in
start_array() is twice the slot number.
So we need to adjust when printing.
Signed-off-by: NeilBrown <neilb@suse.de>
static checkers complain about that.
So change the code to use 'fstat', as we really don't want
to see an error here..
Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>