Commit Graph

554 Commits

Author SHA1 Message Date
NeilBrown a740cf6432 MISC: add --action option to set or abort check/repair.
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-22 15:55:31 +10:00
NeilBrown 06e293d097 Grow: fix resent grow_continue breakage.
Commit 5e76dce1ac changed
Grow_continue to assume a fork had already happened, so that
   mdadm --grow --continue

didn't fork.  This is good, but it means that if Grow_continue
is run from Assemble, then
  mdadm --assemble ....

can misbehave if the array was in the middle of a reshape.

So introduce finer control.  Grow_continue only assumes it has
already forked if run from "mdadm --grow --continue".

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-22 14:22:58 +10:00
NeilBrown 54ded86fbd Grow: store a link to current backup file in /run/mdadm or similar.
Subsequent patch will allow the background part of "mdadm --grow" to
be run from systemd.  This can require the passing of a backup file
name.
To do this, store that name as a symlink in /run/mdadm (or MAP_DIR)
and look for it when appropriate.

It might be useful to also store the name across reboot, but that
would be a different patch.  We would need to use the uuid to identify
it, and store it in stable storage.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-15 14:23:16 +10:00
NeilBrown 56cb05c463 DDF: Fix assorted typos and do some reformatting.
..because it is more fun when new patches are harder to apply to old version :-)

Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-01 16:02:08 +11:00
NeilBrown 8832342d3a Assemble/Incremental: don't hold O_EXCL on mddev after assembly.
As soon as the array is assembled, udev or systemd might run
fsck and mount it.  So we need to drop O_EXCL promptly.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-12-05 10:35:16 +11:00
NeilBrown b11fe74db0 Incremental: improve support for "DEVICE" based restriction in mdadm.conf
--incremental currently fails if the device name passed does not
textually match the names permitted by the DEVICE line in mdadm.conf.
This is problematic when "mdadm -I" is run by udev as the name given
can be a temp name.

This patch makes two improvements:
1/ We generate a list of all existing devices that match the names
  in mdadm.conf, and allow rdev based matching
2/ We allows extra aliases to be provided on the command line, and
  perform textual matching on those.  This is particularly suitable
  for udev usages as ${DEVLINKS} can be provided even though the links
  make not yet be created.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-12-03 14:01:24 +11:00
NeilBrown 9ca39acb3e Incremental: add --export handling.
If --export is given with --incremental, then
  MD_DEVNAME
is output which gives the name of the device (in /dev/md) that
is the array (or container) that the device would be added to.
Also
  MD_STARTED
is set to one of
  no
  unsafe
  yes
  nothing

to indicate if the array was started.  IF MD_STARTED=unsafe
then it may be appropriate to run
  mdadm -R /dev/md/$MD_DEVNAME
after a timeout to ensure newly degraded array are started.

If
  MD_FOREIGN=yes
it might be appropriate to suppress this as the array is
probably not critical.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-28 15:15:30 +11:00
NeilBrown f33a71f107 Add support for --add-spare
--add-spare is like --add, but a --re-add is never attempted.
So it is equivalent to two separate commands:

 --zero-metadata
 --add

Signed-off-by: NeilBrown <neilb@suse.de>
2013-10-31 10:41:50 +11:00
Jes Sorensen 8a5f247d0a Be consistent in return types from byteswap macros
The bswap_*() macros return int values. Make sure we return the
equivalent types in same byteorder pass-through functions to avoid
problems with the original type leaking through to printf() etc.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-10-16 12:42:29 +11:00
NeilBrown d5a4041647 Make -IRs and --run work properly for containers.
We really need to make sure assemble_container_content()
gets called to finished the assembly of these.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-09-13 10:51:20 +10:00
NeilBrown 1c0aebc2be Move ARRAY_SIZE macro to common include file.
That was super-ddf can use it.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-09-10 09:48:06 +10:00
NeilBrown d3786cdcd0 Change "mdadm --run" to use the same code as "mdadm --IRs".
Current "mdadm --run /dev/mdX" will not handle external metadata
properly.  mdmon won't be started etc.

So use the code from "mdadm -IRs" instead - that already does all
the right things.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-26 15:24:53 +10:00
NeilBrown a9c1584757 mdmon: don't lie to systemd.
Now that mdmon responds fairly well to SIGTERM, stop lying to
systemd about being started on the initrd.

Note that if mdmon is rerun (--takeover) for some reason, and systemd
chooses to kill processes before remounting / readonly, then the
unmount will hang.

If systemd ever lets us tell it that we don't want to be killed until
root is readonly, then we should do that.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-01 15:59:24 +10:00
NeilBrown bc17158dcc Introduce devid2kname - slightly different to devid2devnm.
The purpose od devid2devnm is to return a kernel name of an
md device, whether that device is a whole device or a partition,
we want the whole device.  md4, never md4p2.

In one place I was using devid2devnm where I really wanted the
partition if there was one ... and wasn't really interested in it
being an md device.
So introduce a new 'devid2kname' for that case.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-01 14:32:04 +10:00
NeilBrown 23bf42cc79 super1: simplify setting of array size.
Currently the extra space to leave before the data in the array
is calculated in two separate places, and they can be inconsistent.

Instead, do it all in validate_geometry.  This records the
'data_offset' chosen which all other devices then use.

'write_init_super' now just uses the value rather than doing all the
calculations again.

This results in more consistent numbers.

Also, load_super sets st->data_offset so that it is used by "--add",
so the new device has a data offset matching a pre-existing device.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-30 17:05:47 +10:00
NeilBrown a7a0d8a116 Grow: use mdstat_wait to wait for delayed reshape.
Having a fix time for a wait is clumsy and can make us
wait much too long.
So use mdstat_wait and keep the mdstat_fd open.
This requires an 'mdstat_close' so it doesn't stay open
forever.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-10 11:10:54 +10:00
NeilBrown eb2306f841 Config: use better device names for "DEVICES container"
When "containers" appears on the "DEVICES" line (which is does by
default), use names from the mdadm map file instead of kernel names,
when possible.
This mean that the name will be more likely to appear in mdadm.conf
and so more likely to match "container=" tags.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-02 10:46:43 +10:00
NeilBrown 2eba849621 Manage: check alignment when stopping an array undergoing reshape.
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.

The kernel only enforces the new stripe size multiple.

So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-01 15:10:05 +10:00
NeilBrown efc67e8e9f New function: sysfs_wait
We have several places that wait for activity on a sysfs
file.  Combine most of these into a single 'sysfs_wait' function.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-01 13:28:13 +10:00
NeilBrown 1011e8344a Remove lots of unnecessary white space.
Now that I am using white-space mode in Emacs I can see all of this,
and I don't like it :-)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 12:31:45 +10:00
NeilBrown fe7e0e64b0 Manage: split Manage_runstop into Manage_run and Manage_stop
The two branches have virtually nothing in common, so it is simpler if
they are separate.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 11:23:44 +10:00
NeilBrown e2f408a4c0 mdadm.h: add little bits of doco for 'struct superswitch'.
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-17 16:04:59 +10:00
NeilBrown a7dec3fd92 Make sure NOFILE resource limit is big enough.
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit  the NOFILE limit.

So raise the limit if it ever looks like it might be a problem.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-30 14:31:09 +10:00
NeilBrown 199f1a1fad Assemble: allow --update=revert-reshape
This will cause a reshape to start going backwards.
2013-05-28 16:44:23 +10:00
NeilBrown afa368f49a Assemble: --update=metadata converts v0.90 to v1.0
This allows the smooth conversion of legacy 0.90 arrays
to 1.0 metadata.
Old metadata is likely to remain but will be ignored.
It can be removed with
  mdadm --zero-superblock --metadata=0.90 /dev/whatever

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-28 16:44:22 +10:00
NeilBrown 89ecd3cfe4 Grow: introduce min_offset_change to struct reshape.
raid10 currently uses the 'backup_blocks' field to store something
else: a minimum offset change.
This is bad practice, we will shortly need to have both for RAID5/6,
so make a separate field.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-22 12:20:39 +10:00
NeilBrown 74db60b00a Add --dump / --restore functionality.
This allows the metadata on a device to be saved and later restored.
This can be useful before experimenting on an array that is misbehaving.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-16 15:07:16 +10:00
NeilBrown eca944fa9c create_mddev: add support for /dev/md_XXX non-numeric names.
With the 'devnm' infrastructure fixed, it is quite easy to support
names like "md_home" for md arrays.
The currently defaults to "off" and can be enabled in mdadm.conf with
  CREATE names=yes
This is incase other tools get confused by the new names.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-15 11:03:25 +10:00
NeilBrown 4dd2df0966 Discard devnum in favour of devnm
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.

So get rid of devnum (a number) and use devnm (a 32char string).
eg.
  md0
  md_d2
  md_home

Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-21 17:05:23 +11:00
John Spencer 0d35d5c480 mdadm.h: fix ugly glibc specific ifdeffery
the code that was exposed on anything else than dietlibc and klibc
is entirely glibc specific and broke the build on musl libc.

Signed-off-by: John Spencer <maillist-mdadm@barfooze.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-10 15:40:53 +11:00
Jes Sorensen 3e23ba9d7b Remove --offroot argument and default to always setting argv[0] to @
We still allow --offroot to be given - for compatibility with scripts
- but ignore it.

The whole point of --offroot is to get systemd to not auto-kill mdmon,
and we always want that.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-05 15:06:47 +11:00
NeilBrown 06d2ffc3e2 conditionally remove map_dev from find_free_devnum
map_dev can be slow so it is best to not call it when
not necessary.
The final test in "find_free_devnum" is not relevant when
udev is being used, so remove the test in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-01-07 10:17:04 +11:00
NeilBrown 6d388a8816 MISC: Add --examine-badblocks option
This will list the contents of the bad-blocks log, if one is present.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-12-05 12:56:31 +11:00
NeilBrown 70c55e36b7 Add support for --replace and --with
--replace can be used to replace a device without completely failing
it.  Once the replacement completes the device will be failed.
--with can indicate which of several spares to use.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-23 16:27:15 +11:00
NeilBrown 4ec2cbe96d Remove get_one_disk
It has never been used, and there isn't really any place that
could usefully use it.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-22 17:23:15 +11:00
NeilBrown 72ca9bcff3 Allow data-offset to be specified per-device for create
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...

The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:21 +10:00
NeilBrown 5e88ab2e2f New RESHAPE_NO_BACKUP flag to track when backup action is needed.
Some arrays (raid10) never need a backup file, so during assembly
we can avoid the whole Grow_continue check in that case.
Achieve this using a flag set by the metadata handler.

Also get "mdadm -I" to fail if a backup process would be
needed.  It currently does fail as the kernel rejects things,
but it is nicer to have this explicit.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:21 +10:00
NeilBrown fe384ca0b9 Grow: set new_data_offset if appropriate 2012-10-04 16:34:21 +10:00
NeilBrown 80bf913592 Add space_before/space_after fields to mdinfo
These will be needed to guide changes to data_offset during reshape.
Only set them for super1 for now.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:21 +10:00
NeilBrown 40c9a66a5c Add --data-offset flag for Create and Grow
This can be used to over-ride the automatic assignment of
data offset.
For --create, it is useful to re-create old arrays where different
   defaults applied.
For --grow it may be able to force a reshape in the reverse direction.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:21 +10:00
NeilBrown 83cd1e97cb Add data_offset arg to ->init_super and use it in super1.c
So if ->data_offset is already set, use that rather than
computing one.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
NeilBrown af4348ddd1 Add data_offset arg to ->validate_geometry.
This is needed to return correct available size.  It isn't
really used yet.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
NeilBrown 387fcd593c Add data_offset arg to ->avail_size
This is currently only useful for 1.x metadata and will allow an
explicit --data-offset request on command line.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
NeilBrown 822e393a05 Allow parse_size to return 0.
We will shortly introduce --data-offset= which is allowed to
be zero.  We will want to use parse_size() so it needs to be
able to return '0' without it being an error.

So define INVALID_SECTORS to be an impossible value (currently '1')
and return and test for it consistently.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
NeilBrown 7103b9b88d Handles spaces in array names better.
1/ When printing the "name=" entry for --brief output,
   enclose name in quotes if it contains spaces etc.
   Quotes are already supported for reading mdadm.conf

2/ When a name is used as a device name, translate spaces
   and tabs to '_', as well as the current translation of
   '/' to '-'.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
Maciej Naruszewicz 9eafa1de73 imsm: Allow to specify controller for --detail-platform.
Usually, 'mdadm --detail-platform -e imsm' scans all the controllers
looking for IMSM capabilities. This patch provides the possibility
to specify a controller to scan, enabling custom usage by other
processes - especially with the --export switch.

$ mdadm --detail-platform
       Platform : Intel(R) Matrix Storage Manager
        Version : 9.5.0.1037
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 7
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.2
       Platform : Intel(R) Matrix Storage Manager
        Version : 9.5.0.1037
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 7
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.2 --export
MD_FIRMWARE_TYPE=imsm
IMSM_VERSION=9.5.0.1037
IMSM_SUPPORTED_RAID_LEVELS=raid0 raid1 raid10 raid5
IMSM_SUPPORTED_CHUNK_SIZES=4k 8k 16k 32k 64k 128k
IMSM_2TB_VOLUMES=yes
IMSM_2TB_DISKS=no
IMSM_MAX_DISKS=7
IMSM_MAX_VOLUMES_PER_ARRAY=2
IMSM_MAX_VOLUMES_PER_CONTROLLER=4

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.0 # This isn't an IMSM-capable controller
mdadm: no active Intel(R) RAID controller found under /sys/devices/pci0000:00/0000:00:1f.0

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:11 +10:00
Maciej Naruszewicz f0ec67106c Display size with human_size_brief with a chosen prefix
When using human_size_brief, only IEC prefixes were supported. Now
it's possible to specify which format we want to see - either IEC
(kibi, mibi, gibi) or JEDEC (kilo, mega, giga).

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-02 16:41:13 +10:00
Maciej Naruszewicz e50cf22073 imsm: Add --export option for --detail-platform
This option will provide most of information we can get via
mdadm --detail-platform [-e format] in the key=value format.
Example output:

$ mdadm --detail-platform
       Platform : Intel(R) Matrix Storage Manager
        Version : 9.5.0.1037
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 7
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

$ mdadm --detail-platform --export
MD_FIRMWARE_TYPE=imsm
IMSM_VERSION=9.5.0.1037
IMSM_SUPPORTED_RAID_LEVELS=raid0 raid1 raid10 raid5
IMSM_SUPPORTED_CHUNK_SIZES=4k 8k 16k 32k 64k 128k
IMSM_2TB_VOLUMES=yes
IMSM_2TB_DISKS=no
IMSM_MAX_DISKS=7
IMSM_MAX_VOLUMES_PER_ARRAY=2
IMSM_MAX_VOLUMES_PER_CONTROLLER=4

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-02 16:28:30 +10:00
Lukasz Dorau 51d4261ca9 fix: adjust parse_size() to the unsigned size variable
An error in parse_size() should be reported by 0, not -1,
because -1 is changed to the max value of unsigned long long
during calculations of size (e.g. at mdadm.c:412).

A negative value of size should be reported as error
(e.g. size equal -1 has been changed to the max value of
unsigned long long so far).

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-20 12:27:17 +10:00
NeilBrown 7bd04da926 Manage: minor cosmetic fixes.
Signed-off-by: NeilBrown <neilb@suse.de>
2012-08-13 08:00:20 +10:00
NeilBrown 50f01ba5a1 Use new struct context and struct shape for Grow_addbitmap
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:22:12 +10:00
NeilBrown 32754b7d84 Use new struct context and struct shape in Grow_reshape
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:22:09 +10:00
NeilBrown 99cc42f4a9 Use new 'struct shape' to pass args to Create
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:22:05 +10:00
NeilBrown a4e8316a75 Use new 'struct shape' to pass args to Build
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:21:57 +10:00
NeilBrown e705e81ba6 Create new 'struct shape' to pass around array details.
This collects to together many of the args given to
create/build/grow

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:21:38 +10:00
NeilBrown d04f65f48c Change the values for "max size" from -1 to 1.
Both are impossible, and '1' allows size to be unsigned,
which is neater.
Also #define MAX_SIZE to be '1' to make it all more explicit.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:20:32 +10:00
NeilBrown 11b6d91dd0 Change Incremental and related functions to take struct context
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:20:22 +10:00
NeilBrown 95c5020544 Change Monitor to take a struct context
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:20:19 +10:00
NeilBrown ef898ce65b Change Detail and misc_scan to take a struct context
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:20:16 +10:00
NeilBrown eec3f88785 change Examine to take a struct context
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:20:00 +10:00
NeilBrown 265460abab Examine: split 'verbose' out from 'brief'.
The value of 'verbose' is sometimes mixed into 'brief', particularly
for Examine.
This is messy and confusing.  So keep them separate.
'brief' still gets assumed when 'scan' is set, unless we are very
verbose.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:19:48 +10:00
NeilBrown 171dccc813 Change Create to take a struct context
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:19:24 +10:00
NeilBrown 0c9e4afb1f Change Build to take a struct context
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:19:21 +10:00
NeilBrown 4977146a84 Convert Assemble() to take a context rather than a list of options.
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:19:07 +10:00
NeilBrown 0937132db1 Discard 'quiet' context variable.
Just use negative verbose, now that we are ready for that.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:18:48 +10:00
NeilBrown ba728be72f Convert 'quiet' to 'not verbose' in various places.
If we change some functions to accept 'verbose', where <0 means to be
quiet, in place of 'quiet', then we will be able to merge
'quiet' and 'verbose' together for simplicity.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:18:09 +10:00
NeilBrown 9e33d55609 Create 'struct context' for ad hoc context option.
Rather than passing a long list of little flags etc to various
functions we will soon pass a small collection of structures.

This first step combines a collection of variables local to
'main' into a single structure.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:17:33 +10:00
NeilBrown 7986889004 Create parse_num() function.
Instead of open-coding this several times, just do it once.

The frees up the name 'c' which I'm about to use.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:17 +10:00
NeilBrown 72d566f68d Create: support --readonly flag.
Allow array to be created read-only

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:17 +10:00
NeilBrown 0ea8f5b167 Assemble: allow arrays to be assembled read-only.
The option was there, but never used.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown 503975b9d5 Remove scattered checks for malloc success.
malloc should never fail, and if it does it is unlikely
that anything else useful can be done.  Best approach is to
abort and let some super-daemon restart.

So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit.  Then use those
removing all the tests for failure.

Also replace all "malloc;memset" sequences with 'xcalloc'.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown c8e1a230b7 Remove re_add flag in favour of new disposition.
Instead of
   disposition == 'a'  re_add == 1
use
   disposition == 'A'

to record that a re-add was requested.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown e7b84f9d50 Introduce pr_err for printing error messages.
'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": '
cont_err() is also available.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown 5187a38587 Help: use an array to choose which help matches which mode.
Looks cleaner than a big switch statement.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown f7d3febcd6 Use explicit non-char opt for --zero-super
As we don't allow '-K' for '--zero-super' there is no point
using it internally.  Just define a 'KillOpt' like with
other options.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:00:45 +10:00
NeilBrown 9dad51d418 Monitor: fix inconsistencies in values for ->percent
->percent sometimes stores negative values recording states
like 'pending' or 'delayed'.
The value '-2' means both 'delayed' and in Monitor, 'unknown'.
Also, '-1' has a meaning but not #define.

So change the #defines to be prefixed with "RESYNC_", instead
of "PROCESS_", add new "_NONE" and "_UNKNOWN", and use correct
value in each location.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-06-04 12:31:40 +10:00
NeilBrown 96fd06edce Adjust to new standard of /run
Now that /run seems to be a good standard, make that
the default for storing various run-time files, rather than
/var/run or /dev/.mdadm.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-03 14:16:56 +10:00
Jes Sorensen 012a864129 Introduce sysfs_set_num_signed() and use it to set bitmap/offset
mdinfo->bitmap_offset is a signed long and needs to be treated as
such when passed to the kernel.

This resolves the problem with adding internal bitmaps to a 1.0 array.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-30 09:56:22 +10:00
NeilBrown c2ecf5f61a Add --prefer option for --detail and --monitor
Both --detail and --monitor can report the names of member
devices on an array, and do so by searching /dev and finding
the shortest name that matches.

If
   --prefer=foo
is given, they will instead prefer a name that contain /foo/.
So
   mdadm --detail /dev/md0 --prefer=by-path

will list the component devices via their /dev/disk/by-path/xxx
names.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-18 11:00:07 +10:00
NeilBrown 480f356641 Raid limit of 1024 when scanning for devices.
When we can for devices using GET_DISK_INFO we currently
limit to 1024.  But some arrays can have more than this.
So raise it to 4096 and make the constant a #define.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-18 09:06:02 +10:00
Adam Kwolek 016e00f546 FIX: Support metadata changes rollback
Function reshape_super() guards metadata changes.
It is used to apply changes rollback in error case also.
As change (apply and rollback) can be not bi-directional reshape_super()
has to know if current action is metadata change that should be guarded
using metadata restrictions, or this is metadata rollback change
executed due to error occurrence.

In second case change has to be unconditional.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-17 12:33:37 +10:00
NeilBrown fbdef49811 Bitmap_offset is a signed number
As the bitmap can be before the superblock, bitmap_offset is signed.
But some of the code didn't honour that :-(

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-04 14:03:45 +10:00
NeilBrown 2d762ade6a Fix the new ROUND_UP macro.
It was missing a "- 1".

Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-22 19:40:38 +11:00
Jes Sorensen de89706515 Generalize ROUND_UP() macro and introduce matching ROUND_UP_PTR()
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-21 08:04:24 +11:00
NeilBrown de5a472ea3 Remove avail_disks arg from 'enough'.
It can easily be calculated from 'avail' and  'raid_disks', and we
will soon have a case where we don't have it easily available to pass
in.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-07 14:04:47 +11:00
Jes Sorensen a0963a86e1 Spawn mdmon with --offroot if mdadm was launched with --offroot
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-30 12:11:29 +11:00
Jes Sorensen 08ca2adfff Add --offroot argument to mdadm
When --offroot is specified, mdadm will change the first character of
argv[0] to '@'. This is used to signal to systemd that mdadm was
launched from initramfs and should not be shut down before returning
to the initramfs.

Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-30 12:11:16 +11:00
NeilBrown 68226a8081 monitor: ensure we retry soon when 'remove' fails.
If a 'remove' fails there is no certainty that another event will
happen soon, so make sure we retry soon anyway.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-03 11:39:59 +11:00
NeilBrown c0c1acd691 Grow/bitmap: support adding bitmap via sysfs.
Adding a bitmap via ioctl can only add it at a fixed location.
That location is not suitable for 4K-block devices.

So allow setting the bitmap location via sysfs if kernel supports it
and aim to always use 4K alignments.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 14:10:41 +11:00
NeilBrown 81a5b4f52f Remove update_private
This fields doesn't work any more as ->getinfo_super clears the info
structure at an awkward time.  So get rid of it and do it differently.

The issue is that the metadata handler cannot tell if the uuid it has
was randomly generated or explicitly requested, except on the first
call.
And we don't want to accept explicit requests for IMSM.
So when it was auto-generated, make it look distinctive by having the
same int copied in all 4 positions.  If someone requests a uuid like
that, I guess they get away with it.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-20 10:30:34 +11:00
Lukasz Orlowski 7c3367585e fix: Allowed to assemble 2 volumes with the same names from config file.
mdadm allowes to assemble 2 volumes with the same names based on the
config file. The issue is fixed by iterating over the list of md device
identifiers and comparing the names of md devices against each other,
detecting identical names and blocking the assembly should the same names
be found.
Now having detected duplicate names, mdadm terminates without assembling
the container, displaying appropriate prompt.

Signed-off-by: Lukasz Orlowski <lukasz.orlowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-07 12:20:34 +11:00
NeilBrown 2244d1a987 Remove duplicated code: search_mdstat and conf_match
search_mdstat and conf_match are almost identical.

Put all the functionality in conf_match, and remove search_mdstat.

Reported-by: Jes.Sorensen@redhat.com
Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-01 13:30:41 +11:00
Labun, Marcin 81219e70f2 kill-subarray: fix, IMSM cannot kill-subarray with unsupported metadata
container_content retrieves volume information from disks in the
container.  For unsupported volumes the function was not returning
mdinfo. When all volumes were unsupported the function was returning
NULL pointer to block actions on the volumes. Therefore, such volumes
were not activated in Incremental and Assembly. As side effect they
also could not be deleted using kill-subarray since "kill" function
requires to obtain a valid mdinfo from container_content.

This patch fixes the kill-subarray problem by allowing to obtain
mdinfo of all volumes types including unsupported and introducing new
array.status flags.

There are following changes:

1. Added MD_SB_BLOCK_VOLUME for blocking an array, other arrays in the
   container can be activated.

2. Added MD_SB_BLOCK_CONTAINER_RESHAPE block container wide reshapes
   (like changing disk numbers in arrays).

3. IMSM container_content handler is to load mdinfo for all volumes
   and set both blocking flags in array.state field in mdinfo of
   unsupported volumes.  In case of some errors, all volumes can be
   affected. Only blocked array is not activated (also reshaped as
   result). The container wide reshapes are also blocked since by
   metadata definition they require modifications of both arrays.

4. Incremental_container and Assemble functions check array.state and
   do not activate volumes with blocking bits set.

5. assemble_container_content is changed to check container wide reshapes
   before activating reshapes of assembled containers.

6. Grow_reshape and Grow_continue_command checks blocking bits
   before starting reshapes or continueing (-G --continue) reshapes.

7. kill-subarray ignores array.state info and can remove requested array.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-31 11:29:46 +11:00
Adam Kwolek 6e75048bc5 Add recovery blocked field to mdinfo
When container is assembled while reshape is active on one of its member
whole container can be required to be blocked from monitoring.
For such purpose field recovery blocked is added to mdinfo structure.

When metadata handler finds active reshape in container it should set
recovery_blocked field to disable whole container monitoring during
reshape.

For arrays that doesn't use containers, recovery_blocked field
has the same value as reshape_active field e.g. super0/1.
In fact,recovery is blocked during reshape for such arrays.
For ddf, metadata handler doesn't set reshape_active field,
so recovery_blocked is not set also.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-05 13:30:50 +11:00
Adam Kwolek 577e8448e9 Move code to get_data_disks() function
Move code to function for code reuse.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-03 09:57:12 +11:00
Adam Kwolek 2dddadb0f7 Add continue option to grow command
To allow for reshape continuation '--continue' option is added
to grow command.
Function that will be executed in grow-continue case doesn't require
information about reshape geometry. All required information are read
from metadata.
For external metadata reshape can be run for monitored array/container
only. In case when array/container is not monitored run mdmon for it.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-03 09:26:48 +11:00
Adam Kwolek b76b30e0f9 Do not continue reshape during initrd phase
During initrd phase continuing reshape will cause file system context
lost. This blocks ability to control reshape using checkpoints.

To avoid this, during initrd phase assemble has to be executed with
'--freeze-reshape' option. This causes that mdadm restores reshape
critical section only.

Reshape can be continued later after system full boot.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-03 09:15:22 +11:00
Lukasz Dorau cc700db34f fix: correct unlocking of map file
1. Three missing map_unlock() calls were added.
2. Map file must be unlocked on fork, else child will hold lock.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-03 08:55:02 +11:00
NeilBrown ecbd9e8160 Create: improve messages from validate_geometry.
When validate_geometry finds that we haven't committed to
a metadata yet and that the subdev is a member of 'our'
container, it needs to report any errors it finds as Create()
cannot report them effectively.

So make a slight change to the semantics of the 'verbose' flag
and allow validate_geometry to report if it printed any error
messages.

Signed-off-by: NeilBrown  <neilb@suse.de>
2011-09-21 14:39:01 +10:00
Adam Kwolek 3f54bd62dc Move restore backup code to function
Reshape backup should be able to be restored during reshape continuation
also. To reuse already existing code it is moved to function.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21 12:17:30 +10:00
Doug Ledford 16715c01f7 Fix readding of a readwrite drive into a writemostly array
If you create a two drive raid1 array with one device writemostly, then
fail the readwrite drive, when you add a new device, it will get the
writemostly bit copied out of the remaining device's superblock into
it's own.  You can then remove the new drive and readd it as readwrite,
which will work for the readd, but it leaves the stale WriteMostly1 bit
in devflags resulting in the device going back to writemostly on the
next assembly.

The fix is to make sure that A) when we readd a device and we might have
filled the st->sb info from a running device instead of the device being
readded, then clear/set the WriteMostly1 bit in the super1 struct in
addition to setting the disk state (ditto for super0, but slightly
different mechanism) and B) when adding a clean device to an array (when
we most certainly did copy the superblock info from an existing device),
then clear any writemostly bits.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-19 13:06:38 +10:00
NeilBrown 11b391ece9 Discourage large devices from being added to 0.90 arrays.
0.90 arrays can only use up to 4TB per device.  So when a larger
device is added, complain a bit.  Still allow it if --force is given
as there could be a valid use.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-08 13:05:31 +10:00
Krzysztof Wojcik 2d3603ba0c Show DELAYED, PENDING status of resync process in "--detail"
Initially there is no proper translation mdstat's DELAYED/PENDING processes
to "--detail" output.
For example, if we have recover=DELAYED in mdstat, "--detail"
shows "State: recovering" and "Rebuild Status = 0%".
It was incorrect in case of process waiting on checkpoint different
than 0%. In fact rebuild status is differnt than 0% and user is misled.

The patch fix the problem. Current "--detail" command shows
in the exampe: "State: recovering (DELAYED)" and no information
about precentage.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-23 12:06:47 +10:00
Adam Kwolek ba53ea59ad Add reshape restart support for external metadata
Patch introduces support for reshape process restart for external metadata
using metadata specific data handling methods.
It introduces recover_backup() function that restores array to stable state
It is equivalent to Grow_restart() functionality for native metadata.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 17:11:11 +10:00
Adam Kwolek 10f228541c imsm: Implement imsm_manage_reshape(), reshape workhorse
Before reshape is started, mdadm should check again if there is only one
array (in container) under reshape. Then function "divides" array in to
"migration units" that can fits migration copy area and enters main loop.
It checks if current "migration unit" requires to be backed up.
If necessary mdadm saves it to copy area and updates migration record.
Then MD-driver is directed to perform reshape step (by "migration unit" size)
and checkpoint is moved forward. In this way reshape is executed until
array ends.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 17:09:08 +10:00
Adam Kwolek 2fcb75aea1 Support restore_stripes() from the given buffer
For external metadata backup location and saving methods depends
on metadata specific implementation details. Currently restore_stripes()
function is able to restore data only from the given backup file handles
and it is used only for assembling partially reshaped arrays.
As this function will be very helpful for external metadata backup
mechanism, add the support for restoring data from the given source buffer.
Add possibility for save_stripes() to work without designation targets.
Save_stripes() can now prepare data for restore_stripes() only.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 16:24:48 +10:00
NeilBrown 95eeceeb32 getinfo_super now clears the 'info' structure before filling it in.
Some code currently clears 'info' before calling getinfo_super,
some code doesn't.

To be consistent, change it so no caller ever clears 'info',
but ever getinfo_super function must clear it.

Note that ->raid_disk may be meaningful if that 'map' is passed
non-NULL.  In that case it is copied out before the structure
is zeroed.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:54:13 +10:00
NeilBrown ce52f92f04 Grow: accept --assume-clean with --grow --size
When an array is resized to have larger members, --assume-clean will
disable any resync if the kernel supports it (2.6.40 and later).

Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-16 17:28:27 +10:00
Labun, Marcin df3346e675 examine: allows to examine a disk metadata on non-metadata compliant systems
Allow for loading metadata from disk attached to non-metadata compliant
system. Affects mdadm --examine and guess_super.

Added ignore_hw_compat in supertype to pass information to load_super
handler. If ignore_hw_compat is set the handler should load metadata
also from disks that do not comply with metadata requirements (i.e. disk is not
attached to native controller, etc).

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 12:04:46 +11:00
NeilBrown d998b738f5 mdmon: don't wait for O_EXCL when shutting down.
If mdmon is shutting down because there are no devices
left to look at, then don't wait 5 seconds for an O_EXCL open,
and that can block progress of --grow.

Only wait for O_EXCL if we received a signal.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 16:10:22 +11:00
Krzysztof Wojcik 53ed6ac36e Warn the user about too small array size
If single-disk RAID0 or RAID1 array is created, user may preserve data on
disk. If array given size covers all partitions on disk, all data will be
available on created array. If array size is too small (not covers
all partitions), data will be not accessible.
This patch introduces warning message during array creation if given size
is too small. User may interrupt creation process to avoid data loss.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:21:21 +11:00
NeilBrown e2e53a2da5 Grow: support reshape of RAID0 arrays.
This is done via conversion to RAID4 and back.

To grow the array, extra devices will be needed which cannot
already be present as spares - so allow a list of new devices
to be included in grow request which changed the number of devices.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 15:05:23 +11:00
NeilBrown 4968025884 Run Grow_restart/Grow_continue when assembling the content of a container.
As containers can now grow, we need to use both Grow_restart (to
replay any backup-file) and Grow_continue when assembling the content
of a container.

Note that we don't pass a backup-file when doing incremental assembly.
If such is needed in that case, the assembly will fail.

To restart such arrays, explicit assembly is required.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-08 17:14:00 +11:00
Czarnowska, Anna c21e737ba1 set default chunk in validate_geometry
When chunk size is not set from command line we need to guess it
depending on metadata given on command line or found on listed devices.

Validate_geometry sets the default for it's metadata if chunk is not set.
For external metadata chunk is set only when creating in a container.
For imsm validate_geometry_imsm_orom is responsible for finding default
chunk depending on container metadata loaded. Container will already know
which controller it is attached to, and have this controllers orom
available.
do_default_chunk indicates that we need to find default chunk and
if validate_geometry fails for some metadata it tells us to reset chunk
that may have been set.

Current solution would set default chunk correctly for imsm only if
container device was given on command line. With the list of devices
chunk was always set to 512.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-22 11:25:07 +11:00
Adam Kwolek 41784c88f3 FIX: delta_disk can have UnSet value
Delta_disk can be set to UnSet value.
This can a cause to pass wrong parameter to reshape_super().
To avoid such situations raid_disks and delta_disks parameters
have to be passed to reshape_super() separately.
It will be up to reshape_super() function validation
and usage of this parameters to avoid not valid values.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-14 11:04:09 +11:00
NeilBrown e5e5d7cea3 Incr: don't exclude 'active' devices from auto inclusion in a container.
For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 13:07:36 +11:00
Czarnowska, Anna bfd76b9309 Monitor: do not move partitions to external container
Arrays on partitions are not supported for external metadata
so do not take such spare from native array.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 10:40:56 +11:00
Labun, Marcin 20b60dcd6c Dynamic hot-plug udev rules for policies
Neil,
Please consider this patch that once was discussed and I think agreed with in general direction. It was sent a while ago
but somehow did not merged into your devel3-2. This patch enables hot-plug of so called bare devices (as understand by domain policies rules in mdadm.conf).
Without this patch we do NOT serve hot-plug of bare devices at all.

Thanks,
Marcin Labun

Subject was: FW: Autorebuild, new dynamic udev rules for hot-plugs

>>From c0aecd4dd96691e8bfa6f2dc187261ec8bb2c5a2 Mon Sep 17 00:00:00 2001
From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Date: Thu, 23 Dec 2010 16:35:01 +0100
Subject: [PATCH] Dynamic hot-plug udev rules for policies
Cc: linux-raid@vger.kernel.org, Williams, Dan J <dan.j.williams@intel.com>, Ciechanowski, Ed <ed.ciechanowski@intel.com>

When introducing policies, new hot-plug rules were added to support
bare disks. Mdadm was started for each hot plugged block device
to determine if it could be used as spare or as a replacement member for
degraded array.
This patch introduces limitation of range of devices that are handled
by mdadm.
It limits them to the ones specified in domains associated with
the actions: spare-same-port, spare and spare-force.
In order to enable hot-plug for bare disks one must update udev rules
with command

        mdadm --activate-domains[=filename]

Above command writes udev rule configuration to stdout. If 'filename'
is given output is written to the file provided as parameter. It is up
to system administrator what should be done later. To make such rule
permanent (i.e. remain after reboot) rule should be writen to
/lib/udev/rules.d directory. Other cases will just need to write it to
/dev/.udev/rules.d directory where temporary rules lies. One should be
aware of the meaning of names/priorities of the udev rules.

After mdadm.conf is changed one is obliged to re-run
"mdadm --activate-domains" command in order to bring the system
configuration up to date.
All hot-plugged disks containing metadata are still handled by existing
rules.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-27 12:48:04 +10:00
NeilBrown a93f87eee6 Add 'restart' arg to various functions used for reshaping.
When we restart an array in the middle of a reshape, we reuse a lot of
the code for starting the reshape, but it needs to know that
circumstances are slightly different.

So add a 'restart' arg which is used:
 - skip checking and adding spares
 - activate the array (rather than start reshape)
 - allow the backup file to already exist

Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-17 09:53:56 +11:00
NeilBrown 999b497251 Make child_monitor a candidate for ->manage_reshape
Child_monitor was design to perform 'manage_reshape' for native
arrays.  So change the signature for ->manage_reshape to match
child_monitor and move the all to the same place that child_monitor
is called from.

Also give super-intel a manage_reshape handler which simple calls
child_monitor.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-12 14:46:17 +11:00
Anna Czarnowska d52bb542d4 move_spare function modified and moved to Manage.c
It will also be needed for Incremental.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-05 14:34:32 +11:00
Anna Czarnowska 326727d9c9 Use one function chosing spares from container
container_chose_spares in Monitor.c and
get_spares_for_grow in super-intel.c
do the same thing: search for spares in a container.

Another version will also be needed for Incremental
so a more general solution is presented here and
applied in two previous contexts.

Normally domlist==NULL would lead an empty list but
this is typically checked earlier so here it is interpreted
as "do not test domains".

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-05 14:34:14 +11:00
Anna Czarnowska 22e263f64a imsm: set imsm spare uuid to 0
uuid_match_any is replaced by uuid_zero for imsm spares.

Function fixup_container_spare_uuid not needed as it gives
unwanted uuid to spares.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-26 21:59:31 +11:00
NeilBrown cb23f1f4c3 Allow a metadata update to have a linked list of allocated spaces.
Sometimes one metadata update will require allocating several
larger data structures.  As 'monitor' cannot allocate, 'manager'
must, so it must be able to attach a list of allocates to the
update, and importantly it must be able to easily free them.

So add a 'space_list' element to metadata updates where each
element on the list starts with a pointer to the next.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 12:10:01 +11:00
NeilBrown 78b10e663c imsm: Prepare reshape_update in mdadm
During Online Capacity Expansion metadata has to be updated to show
array changes and allow for future assembly of array.  To do this
mdadm prepares and sends reshape_update metadata update to mdmon.
The update contains the old and new number of raid disks, and the
indices of the spare disks that will be used to fill the spaces.

This works as follows:
1. reshape_super() prepares metadata update.
2. mdadm discovers the spares and adds them to the array
3. mdadm sends the update to mdmon
4. managemon in prepare_update() allocates required memory for bigger
   device object
5. monitor in process_update() updates the metadata to record the
   new sizes and the newly assigned devices.
6. mdadm initiates the reshape

Based on code From: Adam Kwolek <adam.kwolek@intel.com>

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 11:45:21 +11:00
NeilBrown 11877f4dc2 Split fmt_devnum out from devnum2devname
Sometimes we want to convert a devnum to a devname without allocating
memory.  So provide function to do the formatting without allocation.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 09:07:51 +11:00
Labun, Marcin 1a64be565b IMSM: Fix problem in mdmon monitor of using removed disk in imsm container.
Manager thread shall pass the information to monitor thread (mdmon)
that some devices are removed from container.  Otherwise, monitor
(mdmon) might use such devices (spares) to rebuild the array that has
gone degraded.

This problem happens for imsm containers, since a list of the
container disks is maintained in intel_super structure. When array
goes degraded, the list is searched to find a spare disks to start
rebuild.  Without this fix the rebuild could be stared on the spare
device that was a member of the container, but has been removed from
it.

New super type function handler has been introduced to prepare
metadata format specific information about removed devices.

int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo)

The message prepared in remove_from_super is later processed by
process_update handler in monitor thread.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-15 15:51:51 +11:00
NeilBrown 833bb0f8f6 Allow --update=devicesize with --re-add
This is useful with 1.1 and 1.2 metadata to update the metadata if
the device size has changed.
The same functionality can be achieved by writing to the device size
in sysfs after re-adding normally, but in some cases this might be
easier.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-09 13:06:29 +11:00
NeilBrown 691a36b76f Grow: warn if growing an array will make it degraded.
Growing an array when there aren't enough spares can make the array
degraded.  This works but might not be what is wanted.
So warn the user in this case and require a --force to go ahead
with the reshape.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-09 11:51:13 +11:00
Adam Kwolek e6e9d47b76 Grow: open backup file for reshape as function
Move opening backup file to the function for future reuse during
container reshape.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-03 15:00:16 +11:00
NeilBrown 87f26d14f7 Assemble: allow an array undergoing reshape to be started without backup file
Though not having the proper backup file can cause data corruption, it
is not enough to justify not being able to start the array at all.
So allow "--invalid-backup" to be specified which says "just continue
even if a backup cannot be restored".

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-01 11:47:32 +11:00
NeilBrown ab2bb0b621 mdmon: don't copy an invalid chunk_size
As chunk_size in mdstat_ent is never set, we shouldn't copy
it into a->info.array.
In fact, it is safest to get rid of the field altogether.

Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-30 18:35:36 +11:00
Adam Kwolek 1c009fc218 Compute backup blocks in function.
number of backup blocks evaluation is put in to function for code reuse.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-30 13:30:22 +11:00
Adam Kwolek 130994cb83 Prepare and free fdlist in functions
fd handles table creation is put in to function for code reuse.

In manage_reshape(), child_grow() function from Grow.c will be reused.
To prepare parameters for this function, code from Grow.c can be
reused also.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-30 13:27:08 +11:00
Adam Kwolek 6d11ec6fc2 Treat feature as experimental
Due to fact that IMSM Windows compatibility was not tested yet,
feature has to be treated as experimental until compatibility
verification will be performed.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 12:11:09 +11:00
NeilBrown 746a6567d3 Improve comments for block_monitor.
Also not that the leading '-' on the metadata names now
simply means that mdmon must not reconfiure the array.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 10:32:15 +11:00
Anna Czarnowska 0f0749ad93 Monitor: devid should be dev_t
For consistency with makedev().
int is not sufficient.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:56:28 +11:00
NeilBrown de6ae75015 Incremental - avoid including wayward devices.
If a devices - typically in a mirrored set - is assembled
independently of the other devices, and then attempted to be brought
back into the set, it could contain inconsistent data.  It should not
be included.

So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device.  If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.

This patches fixes --incremental, --assemble was done in an earlier
patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:40:15 +11:00
NeilBrown 1c7a808c4d Improve opt parsing, and distinguish long from short.
In several cases, two different long options map to the same short
option.  So e.g. you could give '--brief' and it would be interpreted
as '--bitmap'.  That isn't really good.

So for every shared short option, define an option number and return
that for the long option instead.  Then always check for both the
short and long options.

Also give some bugs like " mode == 'G'" which should be '== GROW'.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-25 18:58:45 +11:00
Dan Williams 7bc7119671 External reshape (step 1): container reshape and ->reshape_super()
In the native metadata case Grow_reshape() and the kernel validate what
reshapes are possible / supported and the kernel handles all the metadata
updates.  In the external case the metadata format may have specific
constraints above this baseline.  External formats also introduce the
constraint of only permitting some reshapes at container scope versus subarray
scope.  For exmaple imsm changes to 'raiddisks' must be applied to all arrays
in the container.

This operation assumes that its 'st' parameter has been obtained from
super_by_fd() (such that st->subarray is up to date), and that a snapshot of
the metadata has been loaded from the container.

Why a new method, versus extending an existing one?
->validate_geometry: this routine assumes it is being called from Create(),
adding reshape complicates the cases that this routine needs to handle.  Where
we find that checks can be shared between the two cases those routines
refactored into common code internal to the metadata handler, i.e. no need to
provide a unified external interface.  ->validate_geometry() also does not
expect to update the metadata.

->update_super: this is meant to update single fields at Assembly() and only at
the container scope.  Reshape potentially wants to update multiple fields at
either container or subarray scope.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 16:09:27 +11:00
Dan Williams 30f58b2208 Create: cleanup/unify default geometry handling
Support metadata specific level, layout and chunksize defaults.  Kill an
uneeded superswitch methods ahead of adding more for the reshape case.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 15:20:50 +11:00
Dan Williams bc77ed535d block monitor: freeze spare assignment for external arrays
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares.  In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.

When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level.  Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.

Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication.  If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version.  Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation.  This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 15:00:54 +11:00
Dan Williams e5408a3202 Provide a mdstat_ent to subarray helper
...before introducing another open coded instace of this conversion.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 14:44:23 +11:00
Marcin Labun 2cda7640f9 Policy is aware of metadata disk's controller domains.
Platform (metadata) domain let the metadata handlers differentiate
disk domains based on controllers that the disk belongs to.
Platform domain is sub-domain inside user specified domain
in mdadm.conf configuration files inheriting all parameters from it.
The metadata domain name is used disk domain matching functions.
The disk with the same metadata domain name belong to the same metadata
domain.

New metadata handler is added that retrieves platform domain string based
on disk path:
const char *(*get_disk_controller_domain)(const char *path);

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
Anna Czarnowska 80e7f8c31a Monitor: Allow metadata to set minimum size for spare to migrate in.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown e78dda3bf5 Monitor: policy based spare migration.
Rather than only migrating between arrays with the same spare_group,
we now migrate based on domains set in the policy.

In order for spare_group to continue to work, we treat it as a domain
of the destination array, and a domain of any device we might remove
from a source array.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
Anna Czarnowska 5c4cd5da70 imsm: create mdinfo list of disks in a container from supertype
If getinfo_super is called on a container supertype we only get information
on first disk. As a parameter it uses reference to preallocated
mdinfo structure. Amending getinfo_super to return full list of disks
would require ammending all previous calls and subsequently freeing memory
allocated for mdinfo list.
Function container_content that returns a mdinfo list is written
specifically for assembly, performing actions not needed to just fill
mdinfo. It also does not include spares so is unsuitable.
As an alternative a new function getinfo_super_disks is created
to obtain information about all disks states in array.
Existing function sysfs_free is used to free memory
allocated by getinfo_super_disks.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
Anna Czarnowska edde9560fa mdadm: added --no-sharing option for Monitor mode
--no-sharing option disables moving spares between arrays/containers.
Without the option spares are moved if needed according to config rules.
We only allow one process moving spares started with --scan option.
If there is such process running and another instance of Monitor
is starting without --scan, then we issue a warning but allow it
to continue.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
Anna Czarnowska 52d5d101a9 Util: get device size from id
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
NeilBrown d2db304558 Add action=spare-same-slot policy.
When "mdadm -I" is given a device with no metadata, mdadm tries to add
it as a 'spare' somewhere based on policy.

This patch changes the behaviour in two ways:

1/ If the device is at a 'path' where a previous device was removed
  from an array or container, then we preferentially add the spare to
  that array or container.

2/ Previously only 'bare' devices were considered for adding as
  spares.  Now if action=spare-same-slot is active, we will add
  non-bare devices, but *only* if the path was previously in use
  for some array, and the device will only be added to that array.

Based on code
  From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00