Commit Graph

306 Commits

Author SHA1 Message Date
NeilBrown 7d55dca2cc mdassemble: don't try to perform cluster check.
mdassemble is meant to be small an simple, so avoid
trying to check for a cluster.
Currently it doesn't, but it still includes the code,
which doesn't build because the library isn't provided.

So just exclude the get_cluster_name code from mdassemble.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-08-03 11:53:01 +10:00
Guoqing Jiang 4de9091302 Add a new clustered disk
A clustered disk is added by the traditional --add sequence.
However, other nodes need to acknowledge that they can "see"
the device. This is done by --cluster-confirm:

--cluster-confirm SLOTNUM:/dev/whatever (if disk is found)
or
--cluster-confirm SLOTNUM:missing (if disk is not found)

The node initiating the --add, has the disk state tagged with
MD_DISK_CLUSTER_ADD and the one confirming tag the disk with
MD_DISK_CANDIDATE.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 09:21:29 +10:00
Guoqing Jiang 7716570e6d Set home-cluster while creating an array
The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.

If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.

neilb: allow code to compile when corosync-devel is not installed.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 09:06:30 +10:00
NeilBrown 330d6900bb Assemble: allow a RAID4 to assemble easily when parity devices is missing.
If the parity device of a RAID4 is missing, then there is no immediate
risk to data.  So it doesn't matter if the array is dirty or not.

This can be important when reshaping a RAID0, and is a much better
solution that that in the resent-reverted.
   b720636a58

Reported-by: "Jonathan Harker (Jesusaurus)" <jesusaurus@gentlydownthe.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-08 09:39:02 +10:00
NeilBrown 7a862a020f Don't break long strings onto multiple lines.
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

 "It is OK to\n"
 "break this string\n"


Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:46:53 +11:00
NeilBrown 1ade5cc15a Consistently print program Name and __func__ in debug messages.
make dprintf() print program name and __func__, so that
this messaging is consistent.

Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:21:17 +11:00
NeilBrown 93d3bd3b28 util: remove rounding error where reporting "human sizes".
The division
 1<<20 / 200
is not exact, so dividing by this to convert bytes into half-megs
is wrong and results in incorrect output.

As we are doing "long long" arithmetic, there is no risk of an overflow
until we reach 64 petabytes.
So change to
   * 200 / (1<<20).

Reported-by: Jan Echternach <jan@goneko.de>
Resolved-debian-bug: 763917
URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763917
Signed-off-by: NeilBrown <neilb@suse.de>
2014-12-18 16:58:44 +11:00
NeilBrown cc742d3807 util: split get_maj_min() out from dev_open()
This allows other code to parse "8:3" style device names.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-08-11 10:34:36 +10:00
NeilBrown 85945e1986 install: use BINDIR consistently to locate mdadm and mdmon
Every place where the paths for mdadm or mdmon is explicit,
it should use the BINDIR setting, not "/sbin/".

Reported-by: member graysky <graysky@archlinux.us> (https://bugs.archlinux.org/task/37330)
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-22 17:13:02 +10:00
Jes Sorensen 76d0f1886f Work around architectures having statfs.f_type defined as long
Having RAMFS_MAGIC defined as 0x858458f6 causing problems when trying
to compare it directly against statfs.f_type being cast from long to
unsigned long.

This hack is extremly ugly, but it should at least do the right thing
for every situation.

Thanks to Arnd Bergmann for suggesting the fix.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-03-20 09:24:27 +11:00
NeilBrown 8832342d3a Assemble/Incremental: don't hold O_EXCL on mddev after assembly.
As soon as the array is assembled, udev or systemd might run
fsck and mount it.  So we need to drop O_EXCL promptly.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-12-05 10:35:16 +11:00
NeilBrown 5dd29dafa2 Two small fixes related to enough()
1/ enough_fd doesn't use avail_disks any more, so discard it.

2/ Manage_Add increments 'found' at the wrong place, so it can
   waste time before calling enough().

Signed-off-by: NeilBrown <neilb@suse.de>
2013-12-05 08:58:21 +11:00
NeilBrown 357ac10678 IMSM metadata really should be ignored when found on partitions.
commit b31df43682
changed load_super_imsm to not insist on finding a partition if
ignore_hw_compat was set.
Unfortunately this is set for '--assemble' so arrays could get
assembled badly.

The comment says this was to allow e.g. --examine of image files.
A better fixes for this is to change test_partitions to not report
a regular file as being a partition.
The errors from the BLKPG ioctl are:

 ENOTTY : not a block device.
 EINVAL : not a whole device (probably a partition)
 ENXIO  : partition doesn't exist (so not a partition)

Reported-by: "David F." <df7729@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-20 10:49:14 +11:00
NeilBrown 6f02172d2e Release mdadm-3.3
(and  various cosmetic fixes)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-09-03 14:47:47 +10:00
NeilBrown 2f1bcf43d9 Make sure "mdmon" doesn't get called "@dmon".
The Anaconda installer (via its "loader" program) will try to kill
many processes at shutdown, but not "mdmon".

However when mdadm runs mdmon in the Anaconda environment, mdmon
sets argv[0][0] to '@' resulting in "@dmon" which confuses
"loader".

So change mdadm to set argv[0] to a path so that mdmon becomes e.g.
  "@usr/sbin/mdmon"
which "loader" will recognise as being "mdmon".

Reported-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-09-02 11:02:09 +10:00
mwilck@arcor.de 7ac5d47e8a in_initrd: fix gcc compiler error
On some systems, this code caused a "comparison between signed
and unsigned" error.

Signed-off-by: Martin Wilck <mwilck@arcor.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-28 14:58:56 +10:00
NeilBrown 2538ba2abf Create: fix warning about pre-existing filesystems.
An ext[234] filesystem larger than 2TB was beign reported with
a negative size - which looks odd.

So fix it to use suitably large and unsigned values.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-08 09:16:43 +10:00
NeilBrown 9540cc244d test: ensure testing uses correct mdmon
When testing we want to run mdmon directly, not use
systemctl to get systemd to run it.

So allow an environment variable to make that choice.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-05 14:55:13 +10:00
NeilBrown a9c1584757 mdmon: don't lie to systemd.
Now that mdmon responds fairly well to SIGTERM, stop lying to
systemd about being started on the initrd.

Note that if mdmon is rerun (--takeover) for some reason, and systemd
chooses to kill processes before remounting / readonly, then the
unmount will hang.

If systemd ever lets us tell it that we don't want to be killed until
root is readonly, then we should do that.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-01 15:59:24 +10:00
NeilBrown 23bf42cc79 super1: simplify setting of array size.
Currently the extra space to leave before the data in the array
is calculated in two separate places, and they can be inconsistent.

Instead, do it all in validate_geometry.  This records the
'data_offset' chosen which all other devices then use.

'write_init_super' now just uses the value rather than doing all the
calculations again.

This results in more consistent numbers.

Also, load_super sets st->data_offset so that it is used by "--add",
so the new device has a data offset matching a pre-existing device.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-30 17:05:47 +10:00
NeilBrown 289c74f8d7 Move find_free_devnum to mdopen.c
There is only one called to find_free_devnum and it is in mdopen.c

The removes a dependency between util.c and config.c which allows
us to now drop config.o from mdmon.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-02 10:24:50 +10:00
NeilBrown 399e0b9709 Subject: Make wait_for and open_dev_excl faster
When we crete or assemble an array, we wait for udev to create the
device file in /dev so that as soon as mdadm complete, the device can
be used.

This waiting is performed in multiples of 200ms, which can sometimes
be too long to wait.

So change to an exponential backoff.  Wait 1, then 2, then 4 msec etc.
Once we get to 256msec, stop backing off and continue waiting 256ms at
a time until we reach the limit which is now 4.608sec rather than 5sec
which it was before.

Ditto for open_dev_excl.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-25 15:56:22 +10:00
NeilBrown 1011e8344a Remove lots of unnecessary white space.
Now that I am using white-space mode in Emacs I can see all of this,
and I don't like it :-)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 12:31:45 +10:00
Bernd Schubert f8fcf7a1c5 raid6check: Fix build of raid6check
After recent git pull 'make raid6check' did not work anymore, as
sysfs_read() was called with a wrong argument and as check_env()
was used by use_udev(), but not defined.

Replace sysfs_read(..., -1, ...) by sysfs_read(..., NULL, ...)

Move check_env() from util.c to lib.c

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:03:12 +10:00
NeilBrown a7dec3fd92 Make sure NOFILE resource limit is big enough.
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit  the NOFILE limit.

So raise the limit if it ever looks like it might be a problem.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-30 14:31:09 +10:00
NeilBrown d33f151842 Change some fprintf(stderrs to cont_err()
Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-21 12:51:33 +10:00
NeilBrown 701d5b4ab5 Suppress error messages from systemctl.
We call systemctl to see if systemd will run mdmon for us.
If it cannot, we run mdmon directly, so we aren't interested
in the error message.
So redirect stderr to /dev/null.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-15 11:10:54 +10:00
NeilBrown 5a23a06ea4 mdassemble - fix new compile-time problems.
Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-13 17:05:16 +10:00
NeilBrown 4dd2df0966 Discard devnum in favour of devnm
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.

So get rid of devnum (a number) and use devnm (a 32char string).
eg.
  md0
  md_d2
  md_home

Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-21 17:05:23 +11:00
Jes Sorensen 15c10423aa In case launching mdmon fails, print an error message before exiting
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-05 15:51:48 +11:00
Jes Sorensen 0f7bdf8946 Add support for launching mdmon via systemctl instead of fork/exec
If launching mdmon via systemctl fails, we fall back to the old method
of fork/exec. This allows for having mdmon launched via systemctl
which avoids problems with it getting killed by systemd due to it
ending up in the parent's cgroup (udev).

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-05 15:40:38 +11:00
Jes Sorensen 3e23ba9d7b Remove --offroot argument and default to always setting argv[0] to @
We still allow --offroot to be given - for compatibility with scripts
- but ignore it.

The whole point of --offroot is to get systemd to not auto-kill mdmon,
and we always want that.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-05 15:06:47 +11:00
NeilBrown 9dc7d3576a dev_open - don't bother trying map_dev
map_dev can be slow, and doesn't really provide a better result
than just creating a temporary device.
So discard it and use mknod/open/unlink to open a major:minor device.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-01-10 14:04:54 +11:00
NeilBrown 06d2ffc3e2 conditionally remove map_dev from find_free_devnum
map_dev can be slow so it is best to not call it when
not necessary.
The final test in "find_free_devnum" is not relevant when
udev is being used, so remove the test in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-01-07 10:17:04 +11:00
NeilBrown cb8f6859d1 IMSM - allow assembling any imsm array even without OROM.
It is important to check for compatibility with 'platform' or
Option ROM when creating or changing and array.  However there is no
real need when simply assembling the array.

On some systems there are situations where the platform information is
not available.  e.g. on some UEFI systems, UEFI is not available
during 'kdump' handling.  This makes it impossible to assemble
an IMSM array to receive the dump.

So remove the requirements that the platform be visible to assemble
an IMSM array.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-11-20 12:07:30 +11:00
NeilBrown 4ec2cbe96d Remove get_one_disk
It has never been used, and there isn't really any place that
could usefully use it.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-22 17:23:15 +11:00
NeilBrown a994592d75 Fix open_container
open_container should open a container which contains the device,
but sometimes it would open another volume which contains the
device.  Be more careful in 'holder' selection.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 17:15:55 +11:00
NeilBrown 5d5002289c Replace a lot of leading spaces with tabs.
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-10 18:33:26 +11:00
NeilBrown 72ca9bcff3 Allow data-offset to be specified per-device for create
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...

The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:21 +10:00
NeilBrown 822e393a05 Allow parse_size to return 0.
We will shortly introduce --data-offset= which is allowed to
be zero.  We will want to use parse_size() so it needs to be
able to return '0' without it being an error.

So define INVALID_SECTORS to be an impossible value (currently '1')
and return and test for it consistently.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
Maciej Naruszewicz 9eafa1de73 imsm: Allow to specify controller for --detail-platform.
Usually, 'mdadm --detail-platform -e imsm' scans all the controllers
looking for IMSM capabilities. This patch provides the possibility
to specify a controller to scan, enabling custom usage by other
processes - especially with the --export switch.

$ mdadm --detail-platform
       Platform : Intel(R) Matrix Storage Manager
        Version : 9.5.0.1037
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 7
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.2
       Platform : Intel(R) Matrix Storage Manager
        Version : 9.5.0.1037
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 7
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.2 --export
MD_FIRMWARE_TYPE=imsm
IMSM_VERSION=9.5.0.1037
IMSM_SUPPORTED_RAID_LEVELS=raid0 raid1 raid10 raid5
IMSM_SUPPORTED_CHUNK_SIZES=4k 8k 16k 32k 64k 128k
IMSM_2TB_VOLUMES=yes
IMSM_2TB_DISKS=no
IMSM_MAX_DISKS=7
IMSM_MAX_VOLUMES_PER_ARRAY=2
IMSM_MAX_VOLUMES_PER_CONTROLLER=4

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.0 # This isn't an IMSM-capable controller
mdadm: no active Intel(R) RAID controller found under /sys/devices/pci0000:00/0000:00:1f.0

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:11 +10:00
NeilBrown b3ec716d00 Fix 'enough' function for RAID10.
The 'enough' function is written to work with 'near' arrays only
in that is implicitly assumes that the offset from one 'group' of
devices to the next is the same as the number of copies.
In reality it is the number of 'near' copies.

So change it to make this number explicit.

Reported-by: Jakub Husák <jakub@gooseman.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-03 13:53:46 +10:00
Maciej Naruszewicz f0ec67106c Display size with human_size_brief with a chosen prefix
When using human_size_brief, only IEC prefixes were supported. Now
it's possible to specify which format we want to see - either IEC
(kibi, mibi, gibi) or JEDEC (kilo, mega, giga).

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-02 16:41:13 +10:00
Maciej Naruszewicz 570abc6f38 Synchronize size calculation in human_size and human_size_brief
It would be better if two size-calculating methods had the same
calculating algorithm. The human_size way of calculation seems
more readable, so let's use it for both methods.

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-02 16:40:11 +10:00
Lukasz Dorau 51d4261ca9 fix: adjust parse_size() to the unsigned size variable
An error in parse_size() should be reported by 0, not -1,
because -1 is changed to the max value of unsigned long long
during calculations of size (e.g. at mdadm.c:412).

A negative value of size should be reported as error
(e.g. size equal -1 has been changed to the max value of
unsigned long long so far).

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-20 12:27:17 +10:00
Robert Buchholz 1cc101f3f8 Move xmalloc et al into their own file
This avoid code duplication for utilities that do not link to
util.c and everything that comes with it, such as test_restripe and
raid6check

Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-10 17:23:59 +10:00
NeilBrown fb52f2457a find_free_devnum: avoid auto-using names in /etc/mdadm.conf
high-number names like "/dev/md126" shouldn't be in /etc/mdadm.conf,
but if they are they should be ignored when choosing an
unused number.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-08-20 10:50:42 +10:00
NeilBrown ca3b669603 Minor cosmetic fixes in various files.
Signed-off-by: NeilBrown <neilb@suse.de>
2012-08-13 08:00:21 +10:00
NeilBrown 7986889004 Create parse_num() function.
Instead of open-coding this several times, just do it once.

The frees up the name 'c' which I'm about to use.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:17 +10:00
NeilBrown 503975b9d5 Remove scattered checks for malloc success.
malloc should never fail, and if it does it is unlikely
that anything else useful can be done.  Best approach is to
abort and let some super-daemon restart.

So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit.  Then use those
removing all the tests for failure.

Also replace all "malloc;memset" sequences with 'xcalloc'.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown e7b84f9d50 Introduce pr_err for printing error messages.
'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": '
cont_err() is also available.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown 480f356641 Raid limit of 1024 when scanning for devices.
When we can for devices using GET_DISK_INFO we currently
limit to 1024.  But some arrays can have more than this.
So raise it to 4096 and make the constant a #define.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-18 09:06:02 +10:00
NeilBrown 15632a96f4 parse_size: distinguish between 0 and error.
It isn't sufficient to use '0' for 'error' as well will
later have fields that can validly be '0'.

So return "-1" on error.

Also fix parsing of --bitmap_check so that '0' is treated
as an error: we don't support 512B anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-04 14:03:13 +10:00
Czarnowska, Anna e03640bda5 simplify calculating array_blocks
no point calling info_to_blocks_per_member when it just returns size*2 for level==1
calc_array_size can be used for all levels

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-02 10:16:04 +10:00
Adam Kwolek 92d49ecfaa FIX: NULL pointer to strdup() can be passed
When result from strchr() is NULL and it is assigned to subarray,
NULL pointer can be passed to strdup() function and coredump file
is generated.

Subarray is checked for NULL pointer, so it is assumed that it can
be NULL at this moment.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-09 12:20:51 +11:00
NeilBrown de5a472ea3 Remove avail_disks arg from 'enough'.
It can easily be calculated from 'avail' and  'raid_disks', and we
will soon have a case where we don't have it easily available to pass
in.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-07 14:04:47 +11:00
Jes Sorensen a0963a86e1 Spawn mdmon with --offroot if mdadm was launched with --offroot
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-30 12:11:29 +11:00
Jes Sorensen aabe020dd2 enough_fd(): remember to free buffer for avail array
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-02 10:48:53 +11:00
Jes Sorensen db7fdfe422 Avoid stack overflow if GPT partition entries on disk are > 128 bytes
Per [1] GPT partition table entries are not guaranteed to be 128
bytes, in which case read() straight into a struct GPT_part_entry
would result in a buffer overflow corrupting the stack.

[1] http://en.wikipedia.org/wiki/GUID_Partition_Table

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-31 10:24:55 +11:00
Lukasz Dorau 65c83a8023 util.c: two typos fixed
Two typos fixed.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-26 08:48:31 +11:00
Thomas Jarosch 9cf014ec40 Fix off-by-one in readlink() buffer size handling
readlink() returns the number of bytes in the buffer.

If we do something like

len = readlink(path, buf, sizeof(buf));
buf[len] = '\0';

we might write one byte past the end of the buffer.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-17 11:15:04 +11:00
Adam Kwolek 577e8448e9 Move code to get_data_disks() function
Move code to function for code reuse.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-03 09:57:12 +11:00
NeilBrown 01619b4818 Fix component size checks in validate_super0.
A 0.90 array can use at most 4TB of each device - 2TB between
2.6.39 and 3.1 due to a kernel bug.

The test for this in validate_super0 is very wrong.  'size' is sectors
and the number it is compared against is just confusing.

So fix it all up and correct the spelling of terabytes and remove
a second redundant test on 'size'.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-08 12:20:36 +10:00
Czarnowska, Anna b990032d39 fix: segfault when killing subarray of non-existent container
Negative value must be returned to indicate error in open_subarray

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-07 14:09:43 +10:00
NeilBrown 1913c3256b start_mdmon: provide more dynamic way to close-all-fds
When forking mdmon we need to close all other fds because we don't
use O_CLOEXEC yet.
Any approach will be fairly arbitrary, but as we can expect fds to be
fairly dense, closing until we find a set number that don't need
closing is possible safer than only closing the first 100.
So keep closing until we find 20 that are already closed.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-07 13:00:32 +10:00
NeilBrown 4a96d9ff4f Add some more settings of ignore_hw_compat
There are some more times when we don't care that the hardware doesn't
support the metadata:
 - when removing old metadata
 - when reporting the metadata present before over-writing it.

So set ignore_hw_compat in these cases.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-01 12:21:19 +10:00
NeilBrown f161d047ee util: correctly parse shorter linux version numbers.
The next version of Linux might be 3.0.  If it is, get_linux_version
will fail.
So make it more robust.

Reported-by: Namhyung Kim <namhyung@gmail.com>
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-17 22:49:24 +10:00
Luca Berra 73e658d8cc Improvements to GPT reading code.
looking at the gpt code in util.c i found i did not like it at all, a
gpt partition entry is currently 128 bytes, but the spec does not say it
is a fixed value, so the code that reads into a buffer with 512bytes
chunk expecting this to be a multiplier of part_size is imho incorrect.
my fix was to read each partition entry directly into a struct
GPT_part_entry, the advantage is that the code is very simple to read,
the disadvantage it is 128 reads of 128 bytes each, which is
sub-optimal, but i believe readahead will mitigate this a lot.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-17 14:41:01 +10:00
NeilBrown 9e6d929127 Check all member devices in enough_fd
The loop over all member devices in enough_fd could easily stop
before it had found all devices.  This would cause --re-add to
fail incorrectly.

So change the loop to be based on the reported number of devices
in the device - with a safe-guard limit of 1024.

Change some other loops to be more careful too.

Reported-by: "Schmidt, Annemarie" <Annemarie.Schmidt@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-23 17:21:35 +10:00
NeilBrown 78c0a3b17f Split some of util.c into a new lib.c
Some of util.c is dependent on lots of other code, some of it
is stand-alone.
Move some of the stand-alone stuff into a new lib.c so it can be used
by smaller utilities.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:44:54 +10:00
NeilBrown 32367cb558 split name/number maps into separate file.
This reduced some interdependencies between files.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:40:49 +10:00
NeilBrown 7187750e8d open_dev_excl: allow device to be read-only.
For many operations we don't need a writable device.  So if
opening O_RDWR fails in open_dev_excl, then try again O_RDONLY.

If we really needed write, a subsequent operation will failed.  But
if we didn't, we succeed when otherwise we wouldn't have.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 14:21:58 +11:00
Labun, Marcin df3346e675 examine: allows to examine a disk metadata on non-metadata compliant systems
Allow for loading metadata from disk attached to non-metadata compliant
system. Affects mdadm --examine and guess_super.

Added ignore_hw_compat in supertype to pass information to load_super
handler. If ignore_hw_compat is set the handler should load metadata
also from disks that do not comply with metadata requirements (i.e. disk is not
attached to native controller, etc).

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 12:04:46 +11:00
NeilBrown d998b738f5 mdmon: don't wait for O_EXCL when shutting down.
If mdmon is shutting down because there are no devices
left to look at, then don't wait 5 seconds for an O_EXCL open,
and that can block progress of --grow.

Only wait for O_EXCL if we received a signal.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 16:10:22 +11:00
Krzysztof Wojcik 53ed6ac36e Warn the user about too small array size
If single-disk RAID0 or RAID1 array is created, user may preserve data on
disk. If array given size covers all partitions on disk, all data will be
available on created array. If array size is too small (not covers
all partitions), data will be not accessible.
This patch introduces warning message during array creation if given size
is too small. User may interrupt creation process to avoid data loss.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:21:21 +11:00
NeilBrown 82a7851e5f dev_open should always open read-only.
When opening an array to manipulate it we never need to write to the
array and  sometimes it might be read-only so the open for write will
fail.
So always open read-only.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 11:41:21 +11:00
NeilBrown 71204a5029 Various compile fixes.
Make "make everything" succeed.
This fixed some real bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 15:48:03 +11:00
NeilBrown e5508b361d Allow domain_test to report that no domains were found.
Sometime we will need to know the difference between no domains found
and domains didn't match.
So allow domain_test to return different values and fix up all callers
to maintain current behaviour.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 14:44:02 +11:00
NeilBrown e5e5d7cea3 Incr: don't exclude 'active' devices from auto inclusion in a container.
For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 13:07:36 +11:00
Czarnowska, Anna bfd76b9309 Monitor: do not move partitions to external container
Arrays on partitions are not supported for external metadata
so do not take such spare from native array.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 10:40:56 +11:00
Dan Williams aa4cab513d fix extended partition detection
# mdadm --detail --export /dev/md127p1

Before:
MD_LEVEL=raid5
MD_DEVICES=4
MD_METADATA=0.90

After:
MD_LEVEL=raid5
MD_DEVICES=4
MD_CONTAINER=/dev/md0
MD_MEMBER=0
MD_UUID=55746a20:925d24a7:4f9bd7e2:9c9a411f

We parse the symlink target with a format:

../../block/mdXXX/mdXXXpYY

...and need the second '/' from the end of the string to read detect a
'md' device.

Reported-by: Krzysztof Wasilewski <krzysztof.wasilewski@intel.com>
Cc: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-27 12:56:51 +10:00
Anna Czarnowska 326727d9c9 Use one function chosing spares from container
container_chose_spares in Monitor.c and
get_spares_for_grow in super-intel.c
do the same thing: search for spares in a container.

Another version will also be needed for Incremental
so a more general solution is presented here and
applied in two previous contexts.

Normally domlist==NULL would lead an empty list but
this is typically checked earlier so here it is interpreted
as "do not test domains".

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-05 14:34:14 +11:00
Anna Czarnowska 22e263f64a imsm: set imsm spare uuid to 0
uuid_match_any is replaced by uuid_zero for imsm spares.

Function fixup_container_spare_uuid not needed as it gives
unwanted uuid to spares.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-26 21:59:31 +11:00
NeilBrown cb23f1f4c3 Allow a metadata update to have a linked list of allocated spaces.
Sometimes one metadata update will require allocating several
larger data structures.  As 'monitor' cannot allocate, 'manager'
must, so it must be able to attach a list of allocates to the
update, and importantly it must be able to easily free them.

So add a 'space_list' element to metadata updates where each
element on the list starts with a pointer to the next.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 12:10:01 +11:00
NeilBrown 11877f4dc2 Split fmt_devnum out from devnum2devname
Sometimes we want to convert a devnum to a devname without allocating
memory.  So provide function to do the formatting without allocation.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 09:07:51 +11:00
Adam Kwolek 6d11ec6fc2 Treat feature as experimental
Due to fact that IMSM Windows compatibility was not tested yet,
feature has to be treated as experimental until compatibility
verification will be performed.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 12:11:09 +11:00
Anna Czarnowska 0f0749ad93 Monitor: devid should be dev_t
For consistency with makedev().
int is not sufficient.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:56:28 +11:00
NeilBrown de6ae75015 Incremental - avoid including wayward devices.
If a devices - typically in a mirrored set - is assembled
independently of the other devices, and then attempted to be brought
back into the set, it could contain inconsistent data.  It should not
be included.

So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device.  If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.

This patches fixes --incremental, --assemble was done in an earlier
patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:40:15 +11:00
Dan Williams 5f7e44b29f Initialize st->devnum and st->container_dev in super_by_fd
Precludes needing to deduce this information later, like in Detail.c and
soon in Grow.c.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 15:31:18 +11:00
Dan Williams bc77ed535d block monitor: freeze spare assignment for external arrays
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares.  In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.

When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level.  Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.

Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication.  If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version.  Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation.  This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 15:00:54 +11:00
Dan Williams e5408a3202 Provide a mdstat_ent to subarray helper
...before introducing another open coded instace of this conversion.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 14:44:23 +11:00
Anna Czarnowska 52d5d101a9 Util: get device size from id
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
NeilBrown 3a3716107b Add must_be_container helper.
This checks a block device to see if it could be a container, and
in particular cannot be a member device.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
NeilBrown db20d4135e Switch open_subarray to use the new load_container
This removes another user of loaded_container

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown 1f49fb3ae5 Use new load_container in Examine
This makes explicit the two different ways to use Examine
And removes a user of container_loaded.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown 69b2fcc5bb Remove subarray field in supertype.
This is now only ever set, never used.
So remove it.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown d1d599ea0d Create: user container_dev rather than subarray for some tests.
It makes more sense to test for container_dev than for subarray
for several places in Create where it then uses container_dev.

This allows us to subsequently remove subarray.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown a951a4f78f Pass subarray arg explicitly to ->update_subarray.
This is better than hiding it in the supertype structure
where we are never quite sure who needs it.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown 4725bc31fb super_by_fd: return subarray info explicitly.
Rather than hiding this in the 'st', return it explicitly.

In the one case we still need it, copy it into st where needed.
This will disappear in a future patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 19:35:25 +11:00
NeilBrown feab51f8f7 open_subarray: pass subarray name as explicit arg.
Rather than hiding this arg in the 'st' structure, pass it explicitly.

This is a first step to getting rid of 'subarray' from 'supertype'.

The strcpy in open_subarray should have better error checking, but it
will disappear soon so there is little point.

Signed-off-by: NeilBrown <neilb@suse.de.
2010-11-22 19:35:25 +11:00