Commit Graph

208 Commits

Author SHA1 Message Date
Jes Sorensen 9cd39f0155 util: Introduce md_get_array_info()
Remove most direct ioctl calls for GET_ARRAY_INFO, except for one,
which will be addressed in the next patch.

This is the start of the effort to clean up the use of ioctl calls and
introduce a more structured API, which will use sysfs and fall back to
ioctl for backup.

Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 14:35:41 -04:00
Artur Paszkiewicz e97a7cd011 super1: PPL support
Enable creating and assembling raid5 arrays with PPL for 1.x metadata.

When creating, reserve enough space for PPL and store its size and
location in the superblock and set MD_FEATURE_PPL bit. Write an initial
empty header in the PPL area on each device. PPL is stored in the
metadata region reserved for internal write-intent bitmap, so don't
allow using bitmap and PPL together.

While at it, fix two endianness issues in write_empty_r5l_meta_block()
and write_init_super1().

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 11:33:52 -04:00
Artur Paszkiewicz 5308f11727 Generic support for --consistency-policy and PPL
Add a new parameter to mdadm: --consistency-policy=. It determines how
the array maintains consistency in case of unexpected shutdown. This
maps to the md sysfs attribute 'consistency_policy'. It can be used to
create a raid5 array using PPL. Add the necessary plumbing to pass this
option to metadata handlers. The write journal and bitmap
functionalities are treated as different policies, which are implicitly
selected when using --write-journal or --bitmap options.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 11:32:15 -04:00
NeilBrown e22fe3ae15 Introduce enum flag_mode for setting and clearing flags.
We currently use '1' to indicate that a flag (writemostly or failfast)
needs to be set, and '2' to indicate that it needs to be cleared.

Using magic number like this is not a best-practice.

So replaced them with values from a enum.

No functional change.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-29 17:12:13 -05:00
NeilBrown 71574efb07 Add failfast support.
Allow per-device "failfast" flag to be set when creating an
array or adding devices to an array.

When re-adding a device which had the failfast flag, it can be removed
using --nofailfast.

failfast status is printed in --detail and --examine output.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-28 08:50:36 -05:00
Jes Sorensen 2ec2b7e9d5 mdadm: Make add_internal_bitmap() return 0 on success
add_internal_bitmap() returned 1 on success and 0 on error which is
inconsistent. This changes it to return 0 on success and use more
reasonable error codes on error.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-05-12 15:19:16 -04:00
Guoqing Jiang 82d9485e06 Create: check the node nums when create clustered raid
It doesn't make sense to create a clustered raid
with only 1 node.

Reported-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-05-09 14:59:01 -04:00
NeilBrown dfd7822ca6 Create: minor fix when adding a journal device
The check of "is there a filesystem here" is still appropriate for a
journal device.

Also set active_disks correctly - even though it is ignored.

Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-14 14:13:17 +11:00
NeilBrown f170a5a9a0 Create: fix regression in setting raid_disk
Recent commit caused 'missing' declarations to not be handled correctly.

Fixes: cc1799c3dd ("Enable create array with write journal (--write-journal DEVICE).")
Signed-off-by: NeilBrown <neilb@suse.com>
2016-01-14 13:22:17 +11:00
Song Liu cc1799c3dd Enable create array with write journal (--write-journal DEVICE).
Specify the write journal device with --write-journal DEVICE

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Only one journal device is allowed. If multiple --write-journal
are given, mdadm will use the first and ignore others

./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1 --write-journal /dev/sdx
mdadm: Please specify only one journal device for the array.
mdadm: Ignoring --write-journal /dev/sdx...
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-19 13:06:12 +11:00
Goldwyn Rodrigues 6d9c7c2551 Increment version for clustered bitmaps
Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels
to assemble a clustered device.

In order to maximize compatibility, the major version is set to
BITMAP_MAJOR_CLUSTERED *only* if the bitmap is clustered.

Also, added MD_FEATURE_CLUSTERED in order to return error
for older kernels which would assemble MD in case bitmap is
corrupted.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-28 11:47:04 +10:00
Guoqing Jiang 7716570e6d Set home-cluster while creating an array
The home-cluster is stored in the bitmap super block of the
array. The device can be assembled on a cluster with the
cluster name same as the one recorded in the bitmap.

If home-cluster is not specified, this is auto-detected using
dlopen corosync cmap library.

neilb: allow code to compile when corosync-devel is not installed.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 09:06:30 +10:00
Guoqing Jiang 529e2aa573 Add nodes option while creating md
Specifies the maximum number of nodes in the cluster that may use
this device simultaneously. This is equivalent to the number of
bitmaps created in the internal superblock (patches to follow).

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 09:04:16 +10:00
Guoqing Jiang 95a05b37e8 Create n bitmaps for clustered mode
For a clustered MD, create bitmaps equal to number of nodes so
each node has an independent bitmap.

Only the first bitmap is has the bits set so that the first node
that assembles the device also performs the sync.

The bitmaps are aligned to 4k boundaries.

On-disk format:

0                    4k                     8k                    12k
-------------------------------------------------------------------
| idle                | md super            | bm super [0] + bits |
| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
| bm bits [3, contd]  |                     |                     |

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 07:54:03 +10:00
NeilBrown 7a862a020f Don't break long strings onto multiple lines.
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

 "It is OK to\n"
 "break this string\n"


Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:46:53 +11:00
NeilBrown 476066a3d5 DDF: add support of --data-offset when creating array.
Infrastructure is there, so use it.

This requires making sure that ->data_offset is correctly set, even
for containers.

Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-21 11:54:48 +10:00
Artur Paszkiewicz 39917e56cc Create: don't default to bitmap=internal when it is not supported
For large arrays (component size > 100GB) if write-intent bitmap is not
enabled, then it is set by default to "internal", even if the metadata
format does support internal bitmaps, which causes Create to fail.

This patch adds checking if add_internal_bitmap is set in the
superswitch before setting bitmap_file to "internal".

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-01 10:14:59 +10:00
Artur Paszkiewicz 19ad4b2cb2 Fix race between --create and --incremental
This modifies locking in Create to eliminate a situation where
--incremental can assemble a device between write_init_super() and
add_disk(), which causes Create to fail.

It sporadically occurs e.g. when metadata is written on a device,
causing an udev change event which triggers mdadm --incremental.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-01 10:14:53 +10:00
NeilBrown 6f02172d2e Release mdadm-3.3
(and  various cosmetic fixes)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-09-03 14:47:47 +10:00
NeilBrown a7dec3fd92 Make sure NOFILE resource limit is big enough.
Some people want to create truely enormous arrays.
As we sometimes need to hold one file descriptor for each
device, this can hit  the NOFILE limit.

So raise the limit if it ever looks like it might be a problem.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-30 14:31:09 +10:00
NeilBrown a21e848a55 Create: over-ride "start_ro" setting when creating an array.
If module parameter start_ro is set, arrays start readonly.
This is OK when assembling, but is very surprising when creating
an array as the resync won't start.
So over-ride the setting (unless --read-only was given) make
arrays RW when created.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-15 11:40:27 +10:00
NeilBrown eca944fa9c create_mddev: add support for /dev/md_XXX non-numeric names.
With the 'devnm' infrastructure fixed, it is quite easy to support
names like "md_home" for md arrays.
The currently defaults to "off" and can be enabled in mdadm.conf with
  CREATE names=yes
This is incase other tools get confused by the new names.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-15 11:03:25 +10:00
NeilBrown 8baab049ce Create: fix bug with --data-offset.
Test for VARIABLE_OFFSET was wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-13 17:26:37 +10:00
NeilBrown 748952f73e Create: default to bitmap=internal for large arrays.
Here, "large" means components are 100G or more.  It is
usually beneficial to have write-intent bitmaps on such arrays.
They can be suppressed with --bitmap=none

Signed-off-by: NeilBrown <neilb@suse.de>
2013-03-05 10:36:21 +11:00
NeilBrown 4dd2df0966 Discard devnum in favour of devnm
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.

So get rid of devnum (a number) and use devnm (a 32char string).
eg.
  md0
  md_d2
  md_home

Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-21 17:05:23 +11:00
Lukasz Dorau 066e92f017 Create.c: check if freesize is equal 0
"freesize" can be equal 0, particularly after rounding to the chunk's size.
Creating should be aborted in such case.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-11-20 12:12:03 +11:00
NeilBrown 5d5002289c Replace a lot of leading spaces with tabs.
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-10 18:33:26 +11:00
NeilBrown 72ca9bcff3 Allow data-offset to be specified per-device for create
mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ...

The size is in K unless a suffix: K M G is given.
The suffix 's' means sectors.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:21 +10:00
NeilBrown 40c9a66a5c Add --data-offset flag for Create and Grow
This can be used to over-ride the automatic assignment of
data offset.
For --create, it is useful to re-create old arrays where different
   defaults applied.
For --grow it may be able to force a reshape in the reverse direction.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:21 +10:00
NeilBrown 83cd1e97cb Add data_offset arg to ->init_super and use it in super1.c
So if ->data_offset is already set, use that rather than
computing one.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
NeilBrown af4348ddd1 Add data_offset arg to ->validate_geometry.
This is needed to return correct available size.  It isn't
really used yet.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:20 +10:00
Maciej Naruszewicz 9eafa1de73 imsm: Allow to specify controller for --detail-platform.
Usually, 'mdadm --detail-platform -e imsm' scans all the controllers
looking for IMSM capabilities. This patch provides the possibility
to specify a controller to scan, enabling custom usage by other
processes - especially with the --export switch.

$ mdadm --detail-platform
       Platform : Intel(R) Matrix Storage Manager
        Version : 9.5.0.1037
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 7
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.2
       Platform : Intel(R) Matrix Storage Manager
        Version : 9.5.0.1037
    RAID Levels : raid0 raid1 raid10 raid5
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : not supported
      Max Disks : 7
    Max Volumes : 2 per array, 4 per controller
 I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.2 --export
MD_FIRMWARE_TYPE=imsm
IMSM_VERSION=9.5.0.1037
IMSM_SUPPORTED_RAID_LEVELS=raid0 raid1 raid10 raid5
IMSM_SUPPORTED_CHUNK_SIZES=4k 8k 16k 32k 64k 128k
IMSM_2TB_VOLUMES=yes
IMSM_2TB_DISKS=no
IMSM_MAX_DISKS=7
IMSM_MAX_VOLUMES_PER_ARRAY=2
IMSM_MAX_VOLUMES_PER_CONTROLLER=4

$ mdadm --detail-platform /sys/devices/pci0000:00/0000:00:1f.0 # This isn't an IMSM-capable controller
mdadm: no active Intel(R) RAID controller found under /sys/devices/pci0000:00/0000:00:1f.0

Signed-off-by: Maciej Naruszewicz <maciej.naruszewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-04 16:34:11 +10:00
NeilBrown ca3b669603 Minor cosmetic fixes in various files.
Signed-off-by: NeilBrown <neilb@suse.de>
2012-08-13 08:00:21 +10:00
NeilBrown 99cc42f4a9 Use new 'struct shape' to pass args to Create
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:22:05 +10:00
NeilBrown 3c463be459 Create: Remove unnecessary cast from 'size'.
'size' is already unsigned long long, so no need to cast it.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:21:27 +10:00
NeilBrown d04f65f48c Change the values for "max size" from -1 to 1.
Both are impossible, and '1' allows size to be unsigned,
which is neater.
Also #define MAX_SIZE to be '1' to make it all more explicit.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:20:32 +10:00
NeilBrown 171dccc813 Change Create to take a struct context
Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:19:24 +10:00
NeilBrown 72d566f68d Create: support --readonly flag.
Allow array to be created read-only

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:17 +10:00
NeilBrown 503975b9d5 Remove scattered checks for malloc success.
malloc should never fail, and if it does it is unlikely
that anything else useful can be done.  Best approach is to
abort and let some super-daemon restart.

So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit.  Then use those
removing all the tests for failure.

Also replace all "malloc;memset" sequences with 'xcalloc'.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown e7b84f9d50 Introduce pr_err for printing error messages.
'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": '
cont_err() is also available.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown ae6c05ad83 Create: round off size for RAID1 arrays.
RAID1 arrays don't have a chunk size, but if you ever convert
one to RAID5 you will need at least a small one >= 4K.
So round of size to a multiple of 64K.

This only affect Create, not "--grow --size=max".  The latter
is too hard and with smaller returns.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-22 16:53:51 +11:00
NeilBrown ad6db3cb95 Create: reduce the verbosity of 'default_layout'.
We only need this the first couple of times.  Reporting it for
every device is not necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-08 15:40:54 +11:00
Jes Sorensen 4011421332 Print error message if failing to write super for 1.x metadata
In addition remove attempt to print an error message if
write_init_super() fails, as this is handled in the various
write_init_super() functions. This avoids a segfault on error.

Reported by Jim Meyering in
https://bugzilla.redhat.com/show_bug.cgi?id=795461

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-23 08:55:19 +11:00
Adam Kwolek 3e9df86add FIX: Verify if array name doesn't exist already
When e.g. array name (an) is correct and it is the same as container name (cn),
file element creation /dev/md/an will replace /dev/md/cn.
This can cause that user cannot access container using /dev/md/cn.

Verify during array creation if chosen name is not already existing
one.

[Changed to use map_by_name() rather than stat() to determine prior
 existence - NeilBrown]


Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 07:13:55 +11:00
Jes Sorensen e69104392b Create() check malloc() return value
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-22 11:33:53 +11:00
Jes Sorensen e06af9dd62 Create() don't leave the lock hanging on error
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-22 11:32:52 +11:00
NeilBrown ecbd9e8160 Create: improve messages from validate_geometry.
When validate_geometry finds that we haven't committed to
a metadata yet and that the subdev is a member of 'our'
container, it needs to report any errors it finds as Create()
cannot report them effectively.

So make a slight change to the semantics of the 'verbose' flag
and allow validate_geometry to report if it printed any error
messages.

Signed-off-by: NeilBrown  <neilb@suse.de>
2011-09-21 14:39:01 +10:00
NeilBrown ca0748fa49 imsm: getinfo_super_imsm_volume() doesn't fill all disk information
getinfo_super_imsm_volume doesn't correctly set info.disk fields
because it doesn't know which disk to set them from.
It should be the last disk passed to add_to_super.

So add a field 'current_disk' to record this disk in add_to_super, and
use it in getinfo_super.

This allows us to remove a hack in Create.c

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-14 15:42:10 +10:00
Adam Kwolek 80e4abc99c FIX: Cannot create volume
getinfo_super() can clear entire 'inf' structure before filling with new
information. Disk number required later is lost.

Restore disk number information after getinfo_super() call.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-14 12:42:06 +10:00
NeilBrown 95eeceeb32 getinfo_super now clears the 'info' structure before filling it in.
Some code currently clears 'info' before calling getinfo_super,
some code doesn't.

To be consistent, change it so no caller ever clears 'info',
but ever getinfo_super function must clear it.

Note that ->raid_disk may be meaningful if that 'map' is passed
non-NULL.  In that case it is copied out before the structure
is zeroed.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-08 15:54:13 +10:00