Commit Graph

112 Commits

Author SHA1 Message Date
Artur Paszkiewicz 2432ce9b32 imsm: PPL support
Enable creating and assembling IMSM raid5 arrays with PPL. Update the
IMSM metadata format to include new fields used for PPL.

Add structures for PPL metadata. They are used also by super1 and shared
with the kernel, so put them in md_p.h.

Write the initial empty PPL header when creating an array. When
assembling an array with PPL, validate the PPL header and in case it is
not correct allow to overwrite it if --force was provided.

Write the PPL location and size for a device to the new rdev sysfs
attributes 'ppl_sector' and 'ppl_size'. Enable PPL in the kernel by
writing to 'consistency_policy' before the array is activated.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 11:32:49 -04:00
Artur Paszkiewicz 5308f11727 Generic support for --consistency-policy and PPL
Add a new parameter to mdadm: --consistency-policy=. It determines how
the array maintains consistency in case of unexpected shutdown. This
maps to the md sysfs attribute 'consistency_policy'. It can be used to
create a raid5 array using PPL. Add the necessary plumbing to pass this
option to metadata handlers. The write journal and bitmap
functionalities are treated as different policies, which are implicitly
selected when using --write-journal or --bitmap options.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@gmail.com>
2017-03-29 11:32:15 -04:00
NeilBrown c07566f14c Make get_component_size() work with named array.
get_component_size() still assumes that all array are
 /sys/block/md%d or /sys/block/md_d%d
and so doesn't work with e.g. /sys/block/md_foo.

This cause "mdadm --detail" to report
   Used Dev Size : unknown
and causes problems when added spares and in other circumstances.

So change it to use stat2devnm() which does the right thing with all
types of array names.

Reported-and-tested-by: Robert LeBlanc <robert@leblancnet.us>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-12-22 12:19:10 -05:00
Tomasz Majchrzak bb758ccad0 mdadm: bad block support for external metadata - initialization
If metadata handler provides support for bad blocks, tell md by writing
'external_bbl' to rdev state file (both on create and assemble),
followed by a list of known bad blocks written via sysfs 'bad_blocks'
file.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-28 17:44:45 -05:00
Tomasz Majchrzak bbb52f2b1d Increase buffer for sysfs path
'unacknowledged_bad_blocks' is a long name for sysfs property and it
makes sysfs path over 50 characters long. Increase buffer to the double
length of the longest path available in sysfs at the moment.

Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-11-17 09:43:44 -05:00
Jes Sorensen 36138e4e4b sysfs: Avoid if and return on the same line
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-08-11 15:52:48 -04:00
Jes Sorensen 193b6c0b26 load_sys(): Add a buffer size argument
This adds a buffer size argument to load_sys(), rather than relying on
a hard coded buffer size. The old behavior was safe because we knew
the kernel would never return strings overrunning the buffers, however
it was ugly, and would cause code checking tools to spit out warnings.

This caused a Coverity warning over the read into
sra->sysfs_array_state which is only 20 bytes.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
2016-03-09 11:35:34 -05:00
Song Liu 5aa644c68a add sysfs_array_state to struct mdinfo
Add sysfs_array_state to struct mdinfo, and add GET_ARRAY_STATE to
options of sysfs_read.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-12-16 12:43:45 +11:00
Guoqing Jiang 9465f17058 re-add: make re-add try to write sysfs node first
If sysfs node existed, we should try to write "re-add" to it.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-10-08 11:08:40 +11:00
NeilBrown 5418499ae4 sysfs: reject reads that use the whole buffer.
If a read fills the whole buffer, then we possibly
missed something of the end, and we definitely shouldn't
put a '\0' beyond the end, so just return an error.
This should never happen anyway.

Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-06 13:21:33 +10:00
NeilBrown 7a862a020f Don't break long strings onto multiple lines.
It is best to keep strings all together so that they
are easier to search for in the source code.
If a string is so long that it looks ugly one line,
them maybe it should be broken into multiple lines
for display too.

Only strings which contain a newline can be broken
into multiple lines:

 "It is OK to\n"
 "break this string\n"


Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:46:53 +11:00
NeilBrown 1ade5cc15a Consistently print program Name and __func__ in debug messages.
make dprintf() print program name and __func__, so that
this messaging is consistent.

Also remove all __func__ messages from pr_err(). We shouldn't
leak that internal data in error message.
If we really want function name there, we new pr_XXX might
be wanted.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 13:21:17 +11:00
Pawel Baldysiak d56dd607ba Change way of printing name of a process
Sometimes mdadm prints messages with wrong name "mdmon",
and vice versa.
This patch solves this problem by changing method of determining
process name.
Now "Name" will be set in const at start of a program,
previously was hardcoded as #define.

Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-12 12:11:01 +11:00
NeilBrown bc17158dcc Introduce devid2kname - slightly different to devid2devnm.
The purpose od devid2devnm is to return a kernel name of an
md device, whether that device is a whole device or a partition,
we want the whole device.  md4, never md4p2.

In one place I was using devid2devnm where I really wanted the
partition if there was one ... and wasn't really interested in it
being an md device.
So introduce a new 'devid2kname' for that case.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-08-01 14:32:04 +10:00
NeilBrown 4bffc964b9 sysfs: fix bugs in new sysfs_wait function.
- 'tv' isn't initialised properly.
- 100?  I'm sure I fixed that already! Seems not.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-02 16:08:34 +10:00
NeilBrown 2eba849621 Manage: check alignment when stopping an array undergoing reshape.
To be able to revert-reshape of raid4/5/6 which is changing
the number of devices, the reshape must has been stopped on a multiple
of the old and new stripe sizes.

The kernel only enforces the new stripe size multiple.

So we enforce the old-stripe-size multiple by careful use of
"sync_max" and monitoring "reshape_position".

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-01 15:10:05 +10:00
NeilBrown efc67e8e9f New function: sysfs_wait
We have several places that wait for activity on a sysfs
file.  Combine most of these into a single 'sysfs_wait' function.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-07-01 13:28:13 +10:00
NeilBrown dea3786ae2 Grow: fix bug in raid0 -> raid5 conversion.
The moment we change a RAID0 to a RAID5 it will try to recovery.  This
will abort quite quickly as there are not spare devices, but it could
confuse the attempt to freeze the array.

So allow 'freeze' to work even on a recovering array.

Signed-off-by: NeilBrown  <neilb@suse.de>
2013-06-25 15:52:58 +10:00
NeilBrown 1011e8344a Remove lots of unnecessary white space.
Now that I am using white-space mode in Emacs I can see all of this,
and I don't like it :-)

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 12:31:45 +10:00
NeilBrown 64e103fe19 sysfs_read: return devices in same order as in filesystem.
When we read devices from sysfs (../md/dev-*), store them in the same
order that they appear.  That makes more sense when exposed to a
human (as the next patch will).

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-19 10:33:47 +10:00
NeilBrown e6fc80a895 Detail: report on inactive arrays.
Array can be inactive when e.g. -I is in the process of assembling them.
This change allows --detail to report limited information about
these arrays.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-05-13 16:57:10 +10:00
NeilBrown 4dd2df0966 Discard devnum in favour of devnm
We widely use a "devnum" which is 0 or +ve for md%d devices
and -ve for md_d%d devices.
But I want to be able to use md_%s device names.

So get rid of devnum (a number) and use devnm (a 32char string).
eg.
  md0
  md_d2
  md_home

Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-21 17:05:23 +11:00
NeilBrown 6a67848ab6 Grow: fix reshape from RAID5 to RAID1.
Commit 5da9ab9874
       Grow_reshape re-factor
in mdadm-3.2 broke conversion from RAID5 and RAID1 - and we
never noticed.

This fixes it.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-11-20 12:06:34 +11:00
NeilBrown fe384ca0b9 Grow: set new_data_offset if appropriate 2012-10-04 16:34:21 +10:00
NeilBrown aab15415ed Manage: fix checks for removal from a container.
We must only remove from a container if the device isn't a
member of any member array.
To check we look at the 'holders' directory in sysfs.

We currently skip that check if ->devname is "detached", however
that can never be true since the change that introduced
add_detached().

Also sysfs_unique_holder returns status in 'errno' which isn't
entirely safe as e.g. closedir() is probably allowed to clear it.

So make sysfs_unique_holder return an unambigious value, and us
it to decide what to report.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-09-24 12:26:03 +10:00
NeilBrown 503975b9d5 Remove scattered checks for malloc success.
malloc should never fail, and if it does it is unlikely
that anything else useful can be done.  Best approach is to
abort and let some super-daemon restart.

So define xmalloc, xcalloc, xrealloc, xstrdup which don't
fail but just print a message and exit.  Then use those
removing all the tests for failure.

Also replace all "malloc;memset" sequences with 'xcalloc'.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown e7b84f9d50 Introduce pr_err for printing error messages.
'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": '
cont_err() is also available.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
Jes Sorensen 012a864129 Introduce sysfs_set_num_signed() and use it to set bitmap/offset
mdinfo->bitmap_offset is a signed long and needs to be treated as
such when passed to the kernel.

This resolves the problem with adding internal bitmaps to a 1.0 array.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-30 09:56:22 +10:00
NeilBrown fbdef49811 Bitmap_offset is a signed number
As the bitmap can be before the superblock, bitmap_offset is signed.
But some of the code didn't honour that :-(

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-04 14:03:45 +10:00
NeilBrown fd324b08db sysfs: fixed sysfs_freeze_array array to work properly with Manage_subdevs.
If the array is already frozen when Manage_subdevs is called we don't
want it to unfreeze the array.
This is because Grow calls Manage_subdevs to add devices to an array
being reshaped, and the array must stay frozen over this call.

So if sysfs_freeze_array find the array to be frozen it returns '0',
meaning that it didn't and cannot freeze it.  Then the caller will not
try to unfreeze, which is good.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-28 17:29:37 +11:00
NeilBrown c0c1acd691 Grow/bitmap: support adding bitmap via sysfs.
Adding a bitmap via ioctl can only add it at a fixed location.
That location is not suitable for 4K-block devices.

So allow setting the bitmap location via sysfs if kernel supports it
and aim to always use 4K alignments.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 14:10:41 +11:00
Jes Sorensen 99f6e52159 get_component_size(): Check read() return value for error before using it
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-03 08:07:39 +11:00
Jes Sorensen 93f1df3355 sysfs_unique_holder(): Check read() return value before using as buffer index
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-02 10:48:53 +11:00
Thomas Jarosch 9cf014ec40 Fix off-by-one in readlink() buffer size handling
readlink() returns the number of bytes in the buffer.

If we do something like

len = readlink(path, buf, sizeof(buf));
buf[len] = '\0';

we might write one byte past the end of the buffer.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-17 11:15:04 +11:00
Adam Kwolek ddb12f6ca6 FIX: Do not unblock array accidentally
When sysfs_set_array() function is called, it tests if array
can be configured using sysfs. Setting metadata_version entry
can accidentally unblock mdmon when array is under reshape.
To avoid this, blocking character '-' is checked and if is is set,
it is used for array test.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21 11:55:08 +10:00
Dan Williams d8924477b7 sysfs: fix sysfs_disk_to_scsi_id
Not sure how this ever worked, but now we just try to parse a directory
name that looks like <host>:<bus>:<target>:<lun>.

Array creation segfaults on Fedora 14 without this.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-30 10:49:42 +10:00
NeilBrown 6560987b25 Grow: ensure clean abort if we cannot read the 'completed' file.
If a read of 'completed' returns an error, select will never fail, so
this loop would never exit.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-27 17:26:12 +10:00
Luca Berra e4c72d1dc6 Fix some compiler warnings.
Original by Luca, with various changes by Neil

Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-17 14:35:06 +10:00
Adam Kwolek a5062b1cb6 FIX: Set proper raid disks during migration
During migration raid_disks field contains new disks number now.
It should be set old disks number first and then new disks number
to allow md to calculate e.g. delta_disks parameter.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-18 10:31:15 +10:00
NeilBrown 679eb882fc Move WaitClean from sysfs to Monitor.c
It might not really belong in Monitor, but it really doesn't
belong in sysfs.c, and fits well with Wait()

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:21:03 +10:00
NeilBrown 20a40eca4b Change way that reshaping arrays with external-metadata are assembled.
Now that the external metadata handler must provide an md-compatible
old/new geometry, sys_set_array can do all of the array set-up for
an array that is undergoing reshape.
That leave less for reshape_array to do.

Also clean up how reshape_array tells if the reshape has started or
not.
Don't use ->reshape_active as that doesn't tell us anything consistent
at this stage, only use the 'restart' flag passed in.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-08 16:10:29 +11:00
Krzysztof Wojcik fa89bdeeaf FIX: sysfs_disk_to_scsi_id() adapted to current sysfs format
Problem: sysfs_disk_to_scsi_id() not returns correct scsi_id value.
Reason: sysfs format has been changed

This patch adapt sysfs_disk_to_scsi_id() to new sysfs format.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-18 23:51:34 +11:00
NeilBrown f897078e8b Fix some issues with setting 'new' state of a reshape
- when reshaping a container, ->reshape_active is already set
  even though it isn't really active yet, so we need to set
  the new geometry even when reshape_active is set.  This is safe.

- When restarting a reshape, make sure the reshape_position is set
  appropriately when external metadata is used.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-26 08:50:28 +10:00
NeilBrown 11877f4dc2 Split fmt_devnum out from devnum2devname
Sometimes we want to convert a devnum to a devname without allocating
memory.  So provide function to do the formatting without allocation.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 09:07:51 +11:00
Adam Kwolek 899aead007 Add support to skip slot configuration
When disk is added, set valid slot numbers (positive) only.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-03 14:49:01 +11:00
Dan Williams bc77ed535d block monitor: freeze spare assignment for external arrays
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares.  In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.

When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level.  Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.

Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication.  If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version.  Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation.  This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 15:00:54 +11:00
NeilBrown f21e18ca89 Compile with -Wextra by default
This produced lots of warning, some of which pointed to actual bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-05 13:13:02 +10:00
Dan Williams d19e3cfb66 Merge branch 'fixes' into for-neil 2010-07-01 17:36:11 -07:00
Dan Williams b526e52dc7 Always assume SKIP_GONE_DEVS behaviour and kill the flag
...i.e. GET_DEVS == (GET_DEVS|SKIP_GONE_DEVS)

A null pointer dereference in Incremental.c can be triggered by
replugging a disk while the old name is in use.  When mdadm -I is called
on the new disk we fail the call to sysfs_read().  I audited all the
locations that use GET_DEVS and it appears they can tolerate missing a
drive.  So just make SKIP_GONE_DEVS the default behaviour.

Also fix up remaining unchecked usages of the sysfs_read() return value.

Reported-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-16 17:26:04 -07:00
Dan Williams 6a0ee6a077 Remove 'checkpointing' side effect of --wait-clean
Now that mdmon records periodic checkpoints, and checkpoints every
->set_array_state() event we no longer need to 'idle' sync_action from
--wait-clean.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-15 18:41:57 -07:00