Commit Graph

280 Commits

Author SHA1 Message Date
NeilBrown 5527fc7462 Add policy framework.
Policy can be stated as lines in mdadm.conf like:

POLICY  type=disk path=pci-0000:00:1f.2-* action=ignore domain=onboard

This defines two distinct policies which apply to any disk (but not
partition) device reached through the pci device 0000:00:1f.2.
The policies are "action=ignore" which means certain actions will
ignore the device, and "domain=onboard" which means all such devices
as treated as being united under the name 'onboard'.

This patch just adds data structures and code to read and
manipulate them.  Future patches will actually use them.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-09-06 11:03:43 +10:00
NeilBrown f21e18ca89 Compile with -Wextra by default
This produced lots of warning, some of which pointed to actual bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-05 13:13:02 +10:00
NeilBrown 7f5de63d51 Switch from /lib/init/rw to /dev for early-boot files.
It turns out that /lib/init/rw doesn't exist in early boot
like I thought.  So give up on that idea and just use
/dev/.mdadm/ for files that must persist from early-boot
to regular boot.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-28 17:41:35 +10:00
Doug Ledford 753cf90512 Fix all the confusion over directories once and for all.
We now have 3 directory definitions: mdmon directory for its pid and
sock files (compile time define, not changable at run time), mdmonitor
directory which is for the mdadm monitor mode pid file (can only be
passed in via command line at the time mdadm is invoked in monitor mode),
and the directory for the mdadm incremental assembly map file (compile
time define, not changable at run time).  Only the mdadm map file still
hunts multiple locations, and the number of locations has been reduced
to /var/run and the compile time specified location.  Re-use of similar
sounding defines that actually didn't denote their actual usage at
compile time made it more difficult for a person to know what affect
changing the compile time defines would have on the resulting programs.

This patch renames the various defines to clearly identify which item
the define affects.  It also reduces the number of various directories
which will be searched for these files as this has lead to confusion
in mdadm and mdmon in terms of which files should take precedence when
files exist in multiple locations, etc.  It's best if the person
compiling the program intentionally and with planning selects the
right directories to be used for the various purposes.  Which directory
is right depends on which items you are talking about and what boot
loader your system uses and what initramfs generation program your
system uses.  Because of the inter-dependency of all these items it
would typically be up to the distribution that mdadm is being integrated
into to select the correct values for these defines.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2010-07-22 10:16:30 -04:00
Dan Williams 1dccfff910 Incremental: restore assembly for inactive containers, block active
GET_ARRAY_INFO always succeeds on an inactive container, so we need to
be a bit more diligent about adding a disk to an active container.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-07-19 14:59:25 -07:00
NeilBrown 1538aca5cb Merge branch 'master' of git://github.com/djbw/mdadm 2010-07-06 14:46:47 +10:00
NeilBrown 7d2e6486e3 Add --test option to --re-add and similar
--test can be given in Manage mode.
This can be used when there is an attempt to fail or remove 'faulty',
'failed' or 'detached' devices, or to re-add 'missing' devices.
If no devices were failed, removed, or re-added, then mdadm will
exit with status '2'.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-06 12:07:07 +10:00
Dan Williams d19e3cfb66 Merge branch 'fixes' into for-neil 2010-07-01 17:36:11 -07:00
Dan Williams 8cfc801c72 Merge branch 'subarray' into for-neil
Conflicts:
	mdadm.h
	super-intel.c
2010-07-01 17:36:05 -07:00
NeilBrown 29ba480497 Add -fail support to --incremental
This can be used for hot-unplug.  When a device has been remove,
udev can call
   mdadm --incremental --fail sda

and mdadm will find the array holding sda and remove sda from
the array.

Based on code from  Doug Ledford <dledford@redhat.com>

Signed-off-by: NeilBrown <neilb@suse.de>
2010-06-30 16:55:17 +10:00
NeilBrown 3b57c4661a Add mdstat_by_component
This allows finding the array which contains a given component.
Components are named using the kernel-internal string name such
as "sda1" or "hdb".
Don't return member arrays, only the contain that contains them.

Also tidy up the parsing of 'inactive' arrays in /proc/mdstat.
If we see 'inactive' we need to set 'in_devs' immediately as there
is no level coming.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-06-30 16:55:17 +10:00
Dan Williams aa534678ba Rename subarray v2
Allow the name of the array stored in the metadata to be updated.  In
some cases the metadata format may not be able to support this rename
without modifying the UUID.  In these cases the request will be blocked.
Otherwise we allow the rename to take place, even for active arrays.
This assumes that the user understands the difference between the kernel
node name, the device node symlink name, and the metadata specific name.

Anticipating further need to modify subarrays in-place, introduce the
->update_subarray() superswitch method.  A future potential use
case is setting storage pool (spare-group) identifiers.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-22 16:30:59 -07:00
Dan Williams b526e52dc7 Always assume SKIP_GONE_DEVS behaviour and kill the flag
...i.e. GET_DEVS == (GET_DEVS|SKIP_GONE_DEVS)

A null pointer dereference in Incremental.c can be triggered by
replugging a disk while the old name is in use.  When mdadm -I is called
on the new disk we fail the call to sysfs_read().  I audited all the
locations that use GET_DEVS and it appears they can tolerate missing a
drive.  So just make SKIP_GONE_DEVS the default behaviour.

Also fix up remaining unchecked usages of the sysfs_read() return value.

Reported-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-16 17:26:04 -07:00
Dave Jiang 0bd16cf217 create: Check with OROM limit before setting default chunk size
Make create check with the appropriate meta data handler and see what the
largest chunk size is supported. The current 512K default is not supported
by existing imsm OROM.

[dan.j.williams@intel.com: trim the upper limit to 512k for future oroms]
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-15 18:41:53 -07:00
Dan Williams 33414a0182 Kill subarray v2
Support for deleting a subarray out of a container.  When all subarrays
are deleted the component devices are converted back into spares, a
--zero-superblock is still needed to kill the remaining metadata at this
point.  This operation is blocked when the subarray is active and may
also be blocked by the metadata handler when deleting the subarray might
change the uuid of other active subarrays.  For example, with imsm,
deleting subarray 'n' may change the uuid of subarrays with indexes > n.

Deleting a subarray needs to be a container wide event to ensure
disks that record the modified subarray list perceive other disks that
did not receive this change as out of date.

Notes:
The st->subarray parsing in super-intel.c and super-ddf.c is updated to
be more strict now that we are reading user supplied subarray values.

Offline container modification shares actions that mdmon typically
handles so promote is_container_member() and version_to_superswitch()
(formerly find_metadata_methods()) to generic utility functions for the
cases where mdadm performs the operation.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-15 17:55:41 -07:00
Dan Williams 97b4d0e971 Incremental: honor an 'enough' flag from external handlers
This is needed for imsm where:
1/ we want to report raid_disks as zero to allow mdadm -As to
   incorporate all spares
2/ we can't determine stale disks by looking at the event counts.
3/ we can't see per-subarray expectations with the info returned from
   the container level ->getinfo_super()

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-05-26 13:22:36 -07:00
NeilBrown 691c6ee1b6 IMSM/DDF: don't recognised these metadata on partitions.
These metadata are not expected on partitions, and they have
no way of differentiation whether which is correct if they
are found both on the device and on the last partition.

So if the device is a partition, refuse to read the metadata.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-04-29 16:09:59 +10:00
Dan Williams 4eb269706f Create: cleanup after failed create in duplicated array member case
mdadm prevents creation when device names are duplicated on the command
line, but leaves the partially created array intact.  Detect this case
in the error code from add_to_super() and cleanup the partially created
array.  The imsm handler is updated to report this conflict in
add_to_super_imsm_volume().

Note that since neither mdmon, nor userspace for that matter, ever saw an
active array we only need to perform a subset of the cleanup actions.
So call ioctl(STOP_ARRAY) directly and arrange for Create() to cleanup
the map file rather than calling Manage_runstop().

Reported-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-04-19 15:28:07 +10:00
Doug Ledford e259df4e63 mapfile: if we putting the mapfile in a custom location via ALT_RUN, allow
a custom filename too.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2010-03-24 09:37:34 +11:00
NeilBrown d1d3482b56 config: add 'homehost' option to 'AUTO' line.
This allows basing auto-assembly decisions on whether
the array is recorded as belonging to this host or not.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-03-03 14:33:55 +11:00
Luca Berra c132678b18 allow redefinition of VAR_RUN
having mdmon socket under var is painful at shutdown time

Signed-off-by: Luca Berra <bluca@comedia.it>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-03-03 12:23:30 +11:00
NeilBrown b179246f4f Assemble: Handle assembling from config file which is out of order.
Currently "mdadm -As" will process the entries in the config
file in order.  If any array is a component or member of a preceding
array, that array will not be assembled.

So if there are any failures during assembly, retry those arrays,
and look until everything is assembled, or nothing more can
be assembled.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-24 11:16:56 +11:00
NeilBrown 58a4ba2a6b mdmon: don't monitor /proc/mounts to decide when to create .pid file.
Monitoring /proc/mounts and creating a .pid file as soon as /var/run
is writable is racy.  Most distros clean all non-directories from
/var/run early in boot and if mdmon races with this it could
lose the files as soon as they are created.

Instead require that "mdmon --takeover" be run after /var is writable.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-08 17:26:18 +11:00
NeilBrown 5d4d1b26d3 mdmon: allow pid to be stored in different directory.
/var/run probably doesn't persist from early boot.
So if necessary, store in in /lib/init/rw or somewhere else
that does persist.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-04 16:47:28 +11:00
NeilBrown 24f6f99b36 Having single function to read mdmon pid file.
We don't need three.
One (signal_mdmon) wasn't even being used.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-04 16:47:21 +11:00
NeilBrown 8409bc51e8 Merge branch 'klockwork' of git://github.com/djbw/mdadm
Conflicts:
	super-intel.c
2009-12-30 13:46:52 +11:00
NeilBrown c1e3ab8c1e Merge branch 'master' of git://github.com/djbw/mdadm 2009-12-30 13:42:37 +11:00
Dan Williams 1e5c69836d imsm: add support for checkpointing via 'curr_migr_unit'
Unlike native md checkpointing some data about the geometry and type of
the migration process is coded into curr_migr_unit.  Provide logic to
convert between md/{resync_start|recovery_start} and imsm/curr_migr_unit.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 17:54:32 -07:00
Dan Williams 2904b26f05 Support external metadata recovery-resume
Minimal changes needed to permit reassembling partially recovered
external metadata arrays.  The biggest logical change is that
->container_content() can now surface partially rebuilt members rather
than omitting them from the disk list.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 12:51:57 -07:00
Dan Williams d23534e464 Teach sysfs_add_disk() callers to use ->recovery_start versus 'insync' parameter
Also fixup 'in_sync' versus 'insync' typo.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 11:26:21 -07:00
Dan Williams b7528a20cc Introduce MaxSector
Replace occurrences of ~0ULL to make it clear we are talking about maximal
resync/recovery position.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 10:23:26 -07:00
Dan Williams e1516be1db Add scaffolding for handling md/dev-XXX/recovery_start
Prepare the code to handle saving a recovery checkpoint.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 10:06:14 -07:00
Artur Wojcik 33a6535d00 Fix required to enable RAID arrays on SAS disks.
The patch increases the capacity of buffers used to store
sysfs path names. Originally the buffers were too small to
hold the canonical representation of sysfs path (in case
of a SAS device, especially a device installed behind an
expander).

Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-10 12:03:40 -07:00
Trela, Maciej 034b203a47 Check partition tables when creating array.
When creating an array, check if the devices have partition
tables and print a warning if the table or the partitions might be
destroyed by array creation.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-08 16:07:47 +11:00
NeilBrown 9277cc7752 Various fixes for --kill
- When --kill-superblock is used with --metadata, find every
  different superblock if there are several and kill them all.
- When creating a new array, kill off any old metadata.  The code
  to do this was already present but has become broken over time.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-11-24 16:32:01 +11:00
NeilBrown 4a997737a1 Merge branch 'master' into devel-3.1 2009-10-22 11:13:13 +11:00
NeilBrown ea0ebe9685 Assemble: print more verbose messages about restarting a reshape
Signed-off-by: NeilBrown <neilb@suse.de>
2009-10-20 16:23:45 +11:00
Zdenek Behan 9a36a9b713 Monitor: add option to specify rebuild increments
ie. the percent increments after which RebuildNN event is generated

This is particulary useful when using --program option, rather than
(only) syslog for alerts.

Signed-off-by: Zdenek Behan <rain@matfyz.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-10-19 13:13:58 +11:00
Dan Williams 9f1da82421 mdmon: preserve socket over chroot
Connect to the monitor in the old namespace and use that connection for
WaitClean requests when stopping the victim mdmon instance.  This allows
ping_monitor() to work post chroot().

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-10-13 17:41:58 -07:00
Dan Williams aae5a11207 Detail: export MD_UUID from mapfile
The load_super() from an mdadm --detail call may race against an mdmon
update.  When this happens the load_super sees an inconsistent metadata
block and returns an error.  The fallback path to use the map file
contents lacks uuid reporting, so provide __fname_from_uuid for
generically printing a uuid.

Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-10-13 17:41:57 -07:00
Dan Williams 6e46bf344b imsm: add --update=uuid support
When disks have conflicting container memberships (same container ids
but incompatible member arrays) --update=uuid can be used to move
offenders to a new container id by changing 'orig_family_num'.

Note that this only supports random updates of the uuid as the actual
uuid is synthesized.  We also need to communicate the new
'orig_family_num' value to all disks involved in the update.  A new
field 'update_private' is added to struct mdinfo to allow this
information to be transmitted.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-10-13 17:41:53 -07:00
NeilBrown ca4f89a3b7 Merge branch 'master' into devel-3.1
Conflicts:
	mdadm.8
2009-10-01 16:58:40 +10:00
NeilBrown e9e43ec367 Grow: support restart of new migrations. 2009-08-13 11:12:54 +10:00
NeilBrown 7236ee7ad4 Handle extra 'grow' variations.
UNFINISHED
2009-08-11 13:02:49 +10:00
NeilBrown 4737ae25de Exmaine/brief: put member arrays after container arrays.
A previous patch moved move the '--examine --brief' reporting of
member arrays to before their containers.  This breaks "mdadm -As"
assembly.  So put them back, but still fix the problem addressed by
previous patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-07 14:17:40 +10:00
Dan Williams 148acb7baa imsm: fix family number handling
The family_number field can change.  The option-rom will change the
family number when it starts a rebuild process (flags a container for
rebuild).  This was not seen previously as mdadm would usually start the
rebuild process, preserving the family number.

This is the mechanism that helps to prevent a prodigal array member from
being returned to its original system and cause a rebuild to go in the
wrong direction.  With the change we will end up with a container that
will fail to assemble unless the device with the incompatible family
number is left out of the assembly.

So, take several actions:
1/ Convert uuid generation to use orig_family_num, being careful to
   preserve the existing uuid in the case where orig_family_num is not
   set (i.e. previous mdadm created imsm arrays)
2/ Set orig_family_num at Create.  For arrays created by mdadm prior to
   this release orig_family_num will be zero, so set it to family_num at
   the first metadata write.
3/ Add checks for orig_family_num to compare_super_imsm
4/ Update the family number when initiating rebuild
5/ The option-rom mixes some random data into the family number, add
   this functionality to the mdadm implementation.

Reported-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-07-31 17:11:41 -07:00
NeilBrown a628848379 restripe: support saving when not all devices are present. 2009-07-14 15:12:30 +10:00
NeilBrown 19678e536d Grow: pass layout as a string rather than a number.
This allows the layout to be parsed after the current level of the
array is know, so that the level doesn't need to be given (otherwise
pointlessly) on the command line.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-14 12:13:29 +10:00
NeilBrown d823a6c872 Remove Manage_reconfing in favour of Grow_reshape
Bother Manage_reconfig and Grow_reshape provide for changing
the 'layout' of a faulty array.  This is no necessary.
So discard Manage_reconfig and just use Grow_reshape

Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-14 12:11:31 +10:00
NeilBrown 4a06e2c270 main: factor out code to parse layout for raid10 and faulty.
This will soon be called from multiple places.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-14 11:29:20 +10:00