Commit Graph

306 Commits

Author SHA1 Message Date
NeilBrown e7b84f9d50 Introduce pr_err for printing error messages.
'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": '
cont_err() is also available.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09 17:14:16 +10:00
NeilBrown 480f356641 Raid limit of 1024 when scanning for devices.
When we can for devices using GET_DISK_INFO we currently
limit to 1024.  But some arrays can have more than this.
So raise it to 4096 and make the constant a #define.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-18 09:06:02 +10:00
NeilBrown 15632a96f4 parse_size: distinguish between 0 and error.
It isn't sufficient to use '0' for 'error' as well will
later have fields that can validly be '0'.

So return "-1" on error.

Also fix parsing of --bitmap_check so that '0' is treated
as an error: we don't support 512B anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-04 14:03:13 +10:00
Czarnowska, Anna e03640bda5 simplify calculating array_blocks
no point calling info_to_blocks_per_member when it just returns size*2 for level==1
calc_array_size can be used for all levels

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-02 10:16:04 +10:00
Adam Kwolek 92d49ecfaa FIX: NULL pointer to strdup() can be passed
When result from strchr() is NULL and it is assigned to subarray,
NULL pointer can be passed to strdup() function and coredump file
is generated.

Subarray is checked for NULL pointer, so it is assumed that it can
be NULL at this moment.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-09 12:20:51 +11:00
NeilBrown de5a472ea3 Remove avail_disks arg from 'enough'.
It can easily be calculated from 'avail' and  'raid_disks', and we
will soon have a case where we don't have it easily available to pass
in.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-07 14:04:47 +11:00
Jes Sorensen a0963a86e1 Spawn mdmon with --offroot if mdadm was launched with --offroot
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-30 12:11:29 +11:00
Jes Sorensen aabe020dd2 enough_fd(): remember to free buffer for avail array
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-02 10:48:53 +11:00
Jes Sorensen db7fdfe422 Avoid stack overflow if GPT partition entries on disk are > 128 bytes
Per [1] GPT partition table entries are not guaranteed to be 128
bytes, in which case read() straight into a struct GPT_part_entry
would result in a buffer overflow corrupting the stack.

[1] http://en.wikipedia.org/wiki/GUID_Partition_Table

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-31 10:24:55 +11:00
Lukasz Dorau 65c83a8023 util.c: two typos fixed
Two typos fixed.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-26 08:48:31 +11:00
Thomas Jarosch 9cf014ec40 Fix off-by-one in readlink() buffer size handling
readlink() returns the number of bytes in the buffer.

If we do something like

len = readlink(path, buf, sizeof(buf));
buf[len] = '\0';

we might write one byte past the end of the buffer.

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-17 11:15:04 +11:00
Adam Kwolek 577e8448e9 Move code to get_data_disks() function
Move code to function for code reuse.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-03 09:57:12 +11:00
NeilBrown 01619b4818 Fix component size checks in validate_super0.
A 0.90 array can use at most 4TB of each device - 2TB between
2.6.39 and 3.1 due to a kernel bug.

The test for this in validate_super0 is very wrong.  'size' is sectors
and the number it is compared against is just confusing.

So fix it all up and correct the spelling of terabytes and remove
a second redundant test on 'size'.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-08 12:20:36 +10:00
Czarnowska, Anna b990032d39 fix: segfault when killing subarray of non-existent container
Negative value must be returned to indicate error in open_subarray

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-07 14:09:43 +10:00
NeilBrown 1913c3256b start_mdmon: provide more dynamic way to close-all-fds
When forking mdmon we need to close all other fds because we don't
use O_CLOEXEC yet.
Any approach will be fairly arbitrary, but as we can expect fds to be
fairly dense, closing until we find a set number that don't need
closing is possible safer than only closing the first 100.
So keep closing until we find 20 that are already closed.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-07 13:00:32 +10:00
NeilBrown 4a96d9ff4f Add some more settings of ignore_hw_compat
There are some more times when we don't care that the hardware doesn't
support the metadata:
 - when removing old metadata
 - when reporting the metadata present before over-writing it.

So set ignore_hw_compat in these cases.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-08-01 12:21:19 +10:00
NeilBrown f161d047ee util: correctly parse shorter linux version numbers.
The next version of Linux might be 3.0.  If it is, get_linux_version
will fail.
So make it more robust.

Reported-by: Namhyung Kim <namhyung@gmail.com>
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-17 22:49:24 +10:00
Luca Berra 73e658d8cc Improvements to GPT reading code.
looking at the gpt code in util.c i found i did not like it at all, a
gpt partition entry is currently 128 bytes, but the spec does not say it
is a fixed value, so the code that reads into a buffer with 512bytes
chunk expecting this to be a multiplier of part_size is imho incorrect.
my fix was to read each partition entry directly into a struct
GPT_part_entry, the advantage is that the code is very simple to read,
the disadvantage it is 128 reads of 128 bytes each, which is
sub-optimal, but i believe readahead will mitigate this a lot.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-06-17 14:41:01 +10:00
NeilBrown 9e6d929127 Check all member devices in enough_fd
The loop over all member devices in enough_fd could easily stop
before it had found all devices.  This would cause --re-add to
fail incorrectly.

So change the loop to be based on the reported number of devices
in the device - with a safe-guard limit of 1024.

Change some other loops to be more careful too.

Reported-by: "Schmidt, Annemarie" <Annemarie.Schmidt@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-05-23 17:21:35 +10:00
NeilBrown 78c0a3b17f Split some of util.c into a new lib.c
Some of util.c is dependent on lots of other code, some of it
is stand-alone.
Move some of the stand-alone stuff into a new lib.c so it can be used
by smaller utilities.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:44:54 +10:00
NeilBrown 32367cb558 split name/number maps into separate file.
This reduced some interdependencies between files.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-05 08:40:49 +10:00
NeilBrown 7187750e8d open_dev_excl: allow device to be read-only.
For many operations we don't need a writable device.  So if
opening O_RDWR fails in open_dev_excl, then try again O_RDONLY.

If we really needed write, a subsequent operation will failed.  But
if we didn't, we succeed when otherwise we wouldn't have.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-24 14:21:58 +11:00
Labun, Marcin df3346e675 examine: allows to examine a disk metadata on non-metadata compliant systems
Allow for loading metadata from disk attached to non-metadata compliant
system. Affects mdadm --examine and guess_super.

Added ignore_hw_compat in supertype to pass information to load_super
handler. If ignore_hw_compat is set the handler should load metadata
also from disks that do not comply with metadata requirements (i.e. disk is not
attached to native controller, etc).

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 12:04:46 +11:00
NeilBrown d998b738f5 mdmon: don't wait for O_EXCL when shutting down.
If mdmon is shutting down because there are no devices
left to look at, then don't wait 5 seconds for an O_EXCL open,
and that can block progress of --grow.

Only wait for O_EXCL if we received a signal.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 16:10:22 +11:00
Krzysztof Wojcik 53ed6ac36e Warn the user about too small array size
If single-disk RAID0 or RAID1 array is created, user may preserve data on
disk. If array given size covers all partitions on disk, all data will be
available on created array. If array size is too small (not covers
all partitions), data will be not accessible.
This patch introduces warning message during array creation if given size
is too small. User may interrupt creation process to avoid data loss.

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:21:21 +11:00
NeilBrown 82a7851e5f dev_open should always open read-only.
When opening an array to manipulate it we never need to write to the
array and  sometimes it might be read-only so the open for write will
fail.
So always open read-only.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 11:41:21 +11:00
NeilBrown 71204a5029 Various compile fixes.
Make "make everything" succeed.
This fixed some real bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 15:48:03 +11:00
NeilBrown e5508b361d Allow domain_test to report that no domains were found.
Sometime we will need to know the difference between no domains found
and domains didn't match.
So allow domain_test to return different values and fix up all callers
to maintain current behaviour.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 14:44:02 +11:00
NeilBrown e5e5d7cea3 Incr: don't exclude 'active' devices from auto inclusion in a container.
For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 13:07:36 +11:00
Czarnowska, Anna bfd76b9309 Monitor: do not move partitions to external container
Arrays on partitions are not supported for external metadata
so do not take such spare from native array.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 10:40:56 +11:00
Dan Williams aa4cab513d fix extended partition detection
# mdadm --detail --export /dev/md127p1

Before:
MD_LEVEL=raid5
MD_DEVICES=4
MD_METADATA=0.90

After:
MD_LEVEL=raid5
MD_DEVICES=4
MD_CONTAINER=/dev/md0
MD_MEMBER=0
MD_UUID=55746a20:925d24a7:4f9bd7e2:9c9a411f

We parse the symlink target with a format:

../../block/mdXXX/mdXXXpYY

...and need the second '/' from the end of the string to read detect a
'md' device.

Reported-by: Krzysztof Wasilewski <krzysztof.wasilewski@intel.com>
Cc: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-27 12:56:51 +10:00
Anna Czarnowska 326727d9c9 Use one function chosing spares from container
container_chose_spares in Monitor.c and
get_spares_for_grow in super-intel.c
do the same thing: search for spares in a container.

Another version will also be needed for Incremental
so a more general solution is presented here and
applied in two previous contexts.

Normally domlist==NULL would lead an empty list but
this is typically checked earlier so here it is interpreted
as "do not test domains".

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-05 14:34:14 +11:00
Anna Czarnowska 22e263f64a imsm: set imsm spare uuid to 0
uuid_match_any is replaced by uuid_zero for imsm spares.

Function fixup_container_spare_uuid not needed as it gives
unwanted uuid to spares.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-26 21:59:31 +11:00
NeilBrown cb23f1f4c3 Allow a metadata update to have a linked list of allocated spaces.
Sometimes one metadata update will require allocating several
larger data structures.  As 'monitor' cannot allocate, 'manager'
must, so it must be able to attach a list of allocates to the
update, and importantly it must be able to easily free them.

So add a 'space_list' element to metadata updates where each
element on the list starts with a pointer to the next.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 12:10:01 +11:00
NeilBrown 11877f4dc2 Split fmt_devnum out from devnum2devname
Sometimes we want to convert a devnum to a devname without allocating
memory.  So provide function to do the formatting without allocation.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-16 09:07:51 +11:00
Adam Kwolek 6d11ec6fc2 Treat feature as experimental
Due to fact that IMSM Windows compatibility was not tested yet,
feature has to be treated as experimental until compatibility
verification will be performed.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 12:11:09 +11:00
Anna Czarnowska 0f0749ad93 Monitor: devid should be dev_t
For consistency with makedev().
int is not sufficient.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:56:28 +11:00
NeilBrown de6ae75015 Incremental - avoid including wayward devices.
If a devices - typically in a mirrored set - is assembled
independently of the other devices, and then attempted to be brought
back into the set, it could contain inconsistent data.  It should not
be included.

So detect this situation by ensuring that the 'most recent' device is
believed to be active by every other device.  If a device is wayward,
it will only consider fellow wayward devices to be active and will
think all others are failed or missing.

This patches fixes --incremental, --assemble was done in an earlier
patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:40:15 +11:00
Dan Williams 5f7e44b29f Initialize st->devnum and st->container_dev in super_by_fd
Precludes needing to deduce this information later, like in Detail.c and
soon in Grow.c.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 15:31:18 +11:00
Dan Williams bc77ed535d block monitor: freeze spare assignment for external arrays
In order to support reshape and atomic removal of spares from containers
we need to prevent mdmon from activating spares.  In the reshape case we
additionally need to freeze sync_action while the reshape transaction is
initiated with the kernel and recorded in the metadata.

When reshaping a raid0 array we need to freeze the array *before* it is
transitioned to a redundant raid level.  Since sync_action does not exist
at this point we extend the '-' prefix of a subarray string to flag
mdmon not to activate spares.

Mdadm needs to be reasonably certain that the version of mdmon in the
system honors this 'freeze' indication.  If mdmon is not already active
then we assume the version that gets started is the same as the mdadm
version.  Otherwise, we check the version of mdmon as returned by the
extended ping_monitor() operation.  This is to catch cases where mdadm
is upgraded in the filesystem, but mdmon started in the initramfs is
from a previous release.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 15:00:54 +11:00
Dan Williams e5408a3202 Provide a mdstat_ent to subarray helper
...before introducing another open coded instace of this conversion.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 14:44:23 +11:00
Anna Czarnowska 52d5d101a9 Util: get device size from id
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
NeilBrown 3a3716107b Add must_be_container helper.
This checks a block device to see if it could be a container, and
in particular cannot be a member device.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
NeilBrown db20d4135e Switch open_subarray to use the new load_container
This removes another user of loaded_container

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown 1f49fb3ae5 Use new load_container in Examine
This makes explicit the two different ways to use Examine
And removes a user of container_loaded.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown 69b2fcc5bb Remove subarray field in supertype.
This is now only ever set, never used.
So remove it.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown d1d599ea0d Create: user container_dev rather than subarray for some tests.
It makes more sense to test for container_dev than for subarray
for several places in Create where it then uses container_dev.

This allows us to subsequently remove subarray.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown a951a4f78f Pass subarray arg explicitly to ->update_subarray.
This is better than hiding it in the supertype structure
where we are never quite sure who needs it.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:24:50 +11:00
NeilBrown 4725bc31fb super_by_fd: return subarray info explicitly.
Rather than hiding this in the 'st', return it explicitly.

In the one case we still need it, copy it into st where needed.
This will disappear in a future patch.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 19:35:25 +11:00
NeilBrown feab51f8f7 open_subarray: pass subarray name as explicit arg.
Rather than hiding this arg in the 'st' structure, pass it explicitly.

This is a first step to getting rid of 'subarray' from 'supertype'.

The strcpy in open_subarray should have better error checking, but it
will disappear soon so there is little point.

Signed-off-by: NeilBrown <neilb@suse.de.
2010-11-22 19:35:25 +11:00
NeilBrown a5d85af748 get_info_super: report which other devices are thought to be working/failed.
To accurately detect when an array has been split and is now being
recombined, we need to track which other devices each thinks is
working.

We should never include a device in an array if it thinks that the
primary device has failed.

This patch just allows get_info_super to return a list of devices
and whether they are thought to be working or not.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 19:35:25 +11:00
NeilBrown 8453e70430 Manage: be more careful about --add attempts.
If an --add is requested and a re-add looks promising but fails or
cannot possibly succeed, then don't try the add.  This avoids
inadvertently turning devices into spares when an array is failed but
the devices seem to actually work.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 19:35:25 +11:00
NeilBrown 54887ad8cb Add guess_super_type
This can select to only guess array types,
or only guess partition types.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-09-06 11:26:28 +10:00
NeilBrown 0592faeb5e Add gpt pseudo-metadata
This allows mdadm to work with gpt metadata to a limited extent.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-09-06 11:26:28 +10:00
NeilBrown 0f22b998fb Add mbr pseudo metadata handler.
To support incorpating a new bare device into a collection of arrays -
one partition each - mdadm needs a modest understanding of partition
tables.
The main needs to be able to recognise a partition table on one device
and copy it onto another.

This will be done using pseudo metadata types 'mbr' and 'gpt'.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-09-06 11:26:28 +10:00
NeilBrown 6df6a774bf Allow dev_open to work on read-only /dev
/dev could be read-only in which case we cannot make devices
there.
So dev_open should first try to use an existing device name,
and if that doesn't work try creating a node in /dev or /tmp.

Reported-by:  Paweł Sikora <pluto@agmk.net>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-30 08:48:48 +10:00
NeilBrown f21e18ca89 Compile with -Wextra by default
This produced lots of warning, some of which pointed to actual bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-05 13:13:02 +10:00
Doug Ledford 753cf90512 Fix all the confusion over directories once and for all.
We now have 3 directory definitions: mdmon directory for its pid and
sock files (compile time define, not changable at run time), mdmonitor
directory which is for the mdadm monitor mode pid file (can only be
passed in via command line at the time mdadm is invoked in monitor mode),
and the directory for the mdadm incremental assembly map file (compile
time define, not changable at run time).  Only the mdadm map file still
hunts multiple locations, and the number of locations has been reduced
to /var/run and the compile time specified location.  Re-use of similar
sounding defines that actually didn't denote their actual usage at
compile time made it more difficult for a person to know what affect
changing the compile time defines would have on the resulting programs.

This patch renames the various defines to clearly identify which item
the define affects.  It also reduces the number of various directories
which will be searched for these files as this has lead to confusion
in mdadm and mdmon in terms of which files should take precedence when
files exist in multiple locations, etc.  It's best if the person
compiling the program intentionally and with planning selects the
right directories to be used for the various purposes.  Which directory
is right depends on which items you are talking about and what boot
loader your system uses and what initramfs generation program your
system uses.  Because of the inter-dependency of all these items it
would typically be up to the distribution that mdadm is being integrated
into to select the correct values for these defines.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2010-07-22 10:16:30 -04:00
Dan Williams 1dccfff910 Incremental: restore assembly for inactive containers, block active
GET_ARRAY_INFO always succeeds on an inactive container, so we need to
be a bit more diligent about adding a disk to an active container.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-07-19 14:59:25 -07:00
Dan Williams d19e3cfb66 Merge branch 'fixes' into for-neil 2010-07-01 17:36:11 -07:00
Dan Williams 33414a0182 Kill subarray v2
Support for deleting a subarray out of a container.  When all subarrays
are deleted the component devices are converted back into spares, a
--zero-superblock is still needed to kill the remaining metadata at this
point.  This operation is blocked when the subarray is active and may
also be blocked by the metadata handler when deleting the subarray might
change the uuid of other active subarrays.  For example, with imsm,
deleting subarray 'n' may change the uuid of subarrays with indexes > n.

Deleting a subarray needs to be a container wide event to ensure
disks that record the modified subarray list perceive other disks that
did not receive this change as out of date.

Notes:
The st->subarray parsing in super-intel.c and super-ddf.c is updated to
be more strict now that we are reading user supplied subarray values.

Offline container modification shares actions that mdmon typically
handles so promote is_container_member() and version_to_superswitch()
(formerly find_metadata_methods()) to generic utility functions for the
cases where mdadm performs the operation.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-06-15 17:55:41 -07:00
Przemyslaw Hawrylewicz Czarnowski 10013317ce fix: memory leak in mdmon_pid()
devnum2devname() returns pointer to memory allocated with strdup.
It must be released to prevent memory leak.

Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-05-17 15:38:34 -07:00
NeilBrown 691c6ee1b6 IMSM/DDF: don't recognised these metadata on partitions.
These metadata are not expected on partitions, and they have
no way of differentiation whether which is correct if they
are found both on the device and on the last partition.

So if the device is a partition, refuse to read the metadata.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-04-29 16:09:59 +10:00
NeilBrown 86983cce34 2010-03-24 09:07:02 +11:00
NeilBrown 056b331efe Improve partition table code.
Code to check partition tables used some needless casts
and was broken, using a u8 when a u32 was wanted.

So create structure describing the tables rather than using offset,
and read into those tables instead.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-03-09 10:57:57 +11:00
Luca Berra cf55630357 fix mdmon takeover
- when we waited for the old mdmon to exit, we didn't look
  for the socket in the right place

- when we failed to find a pid file, we returned the wrong
  value (code expected <0, but got ==0).

Signed-off-by: Luca Berra <bluca@comedia.it>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-03-03 11:35:26 +11:00
NeilBrown bde713f015 fix gcc warnings about strict-aliasing rules
Original-by: Luca Berra <bluca@comedia.it>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-03-03 10:17:41 +11:00
NeilBrown 5d4d1b26d3 mdmon: allow pid to be stored in different directory.
/var/run probably doesn't persist from early boot.
So if necessary, store in in /lib/init/rw or somewhere else
that does persist.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-04 16:47:28 +11:00
NeilBrown 24f6f99b36 Having single function to read mdmon pid file.
We don't need three.
One (signal_mdmon) wasn't even being used.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-04 16:47:21 +11:00
NeilBrown c1e3ab8c1e Merge branch 'master' of git://github.com/djbw/mdadm 2009-12-30 13:42:37 +11:00
Dan Williams 1e5c69836d imsm: add support for checkpointing via 'curr_migr_unit'
Unlike native md checkpointing some data about the geometry and type of
the migration process is coded into curr_migr_unit.  Provide logic to
convert between md/{resync_start|recovery_start} and imsm/curr_migr_unit.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 17:54:32 -07:00
Dan Williams 2904b26f05 Support external metadata recovery-resume
Minimal changes needed to permit reassembling partially recovered
external metadata arrays.  The biggest logical change is that
->container_content() can now surface partially rebuilt members rather
than omitting them from the disk list.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 12:51:57 -07:00
Dan Williams d23534e464 Teach sysfs_add_disk() callers to use ->recovery_start versus 'insync' parameter
Also fixup 'in_sync' versus 'insync' typo.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-21 11:26:21 -07:00
Dan Williams 1f0769d768 util: fix devnum2devname for devnum == 0
devnum 0 is md0 no md_d-1

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-12-12 13:57:28 -07:00
Trela, Maciej 034b203a47 Check partition tables when creating array.
When creating an array, check if the devices have partition
tables and print a warning if the table or the partitions might be
destroyed by array creation.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-08 16:07:47 +11:00
NeilBrown df0d4ea04e Replace all relevant occurrences of -4 with LEVEL_MULTIPATH
Also -1 -> LEVEL_LINEAR.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-11-17 12:31:12 +11:00
NeilBrown 4a997737a1 Merge branch 'master' into devel-3.1 2009-10-22 11:13:13 +11:00
NeilBrown eb3929a47f Compile fixes for mdassemble
Signed-off-by: NeilBrown <neilb@suse.de>
2009-10-20 16:53:43 +11:00
Dan Williams aae5a11207 Detail: export MD_UUID from mapfile
The load_super() from an mdadm --detail call may race against an mdmon
update.  When this happens the load_super sees an inconsistent metadata
block and returns an error.  The fallback path to use the map file
contents lacks uuid reporting, so provide __fname_from_uuid for
generically printing a uuid.

Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-10-13 17:41:57 -07:00
NeilBrown ca4f89a3b7 Merge branch 'master' into devel-3.1
Conflicts:
	mdadm.8
2009-10-01 16:58:40 +10:00
Dan Williams 436305c690 Detail: fix for an imsm container with a spare
Spares for imsm arrays do not have any info about the container in their
metadata records.  If Detail() inadvertantly picks such a device for
->get_array_info() it will end up with less than useful info for the
container.  So, continue to read from the disks until a non-spare device
is found.

This bug was found by timeouts waiting for udev to create the
user-friendly container name.  To detect future UUID reporting problems
and a debug print to the timeout case in wait_for().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-15 11:34:20 -07:00
Dan Williams 148acb7baa imsm: fix family number handling
The family_number field can change.  The option-rom will change the
family number when it starts a rebuild process (flags a container for
rebuild).  This was not seen previously as mdadm would usually start the
rebuild process, preserving the family number.

This is the mechanism that helps to prevent a prodigal array member from
being returned to its original system and cause a rebuild to go in the
wrong direction.  With the change we will end up with a container that
will fail to assemble unless the device with the incompatible family
number is left out of the assembly.

So, take several actions:
1/ Convert uuid generation to use orig_family_num, being careful to
   preserve the existing uuid in the case where orig_family_num is not
   set (i.e. previous mdadm created imsm arrays)
2/ Set orig_family_num at Create.  For arrays created by mdadm prior to
   this release orig_family_num will be zero, so set it to family_num at
   the first metadata write.
3/ Add checks for orig_family_num to compare_super_imsm
4/ Update the family number when initiating rebuild
5/ The option-rom mixes some random data into the family number, add
   this functionality to the mdadm implementation.

Reported-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-07-31 17:11:41 -07:00
NeilBrown 4a06e2c270 main: factor out code to parse layout for raid10 and faulty.
This will soon be called from multiple places.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-14 11:29:20 +10:00
NeilBrown 84e11361aa Grow: support --array-size changes
With 2.6.30 it is possible to tell the md driver to clip an array to a
size smaller than the real size of the array.  This option gives
access to that feature.  The size change does not persist
across restarts.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-13 15:00:02 +10:00
NeilBrown e736b62389 Update copyright dates and remove references to @cse.unsw.edu.au
Also removed 'paper' addresses.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-02 14:35:45 +10:00
NeilBrown 70ef16dbcb map_dev: prefer names in /dev/md/
Rather than preferring non-standard names (of which there are
many, like /dev/block/9:1), prefer names in /dev/md/ when finding
the name of an md device.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-05-11 15:47:10 +10:00
NeilBrown 603f24a05f util: fix test for text_version
as text_version is a char array (not a pointer), testing the
address against NULL is the wrong thing to do.  Test the
content instead.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-05-11 15:21:43 +10:00
NeilBrown 462906cdee incremental_container: preserve 'in_sync' flag when adding to existing array.
When building container members with -IR, we need to ensure that
devices added to an active array preserve the 'in_sync' status so they
don't needlessly get rebuilt.

So allow sysfs_add_disk to do this (only works in kernels since
2.6.30) and pass the relevant flag down.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-14 10:19:02 +10:00
NeilBrown a7c6e3fb24 wait_for improvement.
wait not only for the name to appear, but for it to refer to the
correct device.
Sometimes old symlinks left lying around can be confusing.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-04-07 17:34:38 +10:00
NeilBrown a56fb7ec54 util.c: use correct range for minor numbers when finding free device.
Minor numbers are 20 bits, not 22.
So when looking for a free, high minor number, try (1<<20)-1,
not (1<<22)-1.
2009-04-06 15:50:56 +10:00
NeilBrown e8a70c8958 mdmon: pass symbolic name to mdmon instead of device name.
Now that names in /dev are usually created (eventually) by udev,
it isn't really safe to rely in finding a name in /dev to pass to
mdmon to identify which array to monitor.
And it isn't really necessary to have a name in /dev.
So just pass the symbolic name, e.g. md127 or md123.

Change util.c to pass that name, and change mdmon to process the
name sensibly.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-20 14:51:42 +11:00
Dan Williams bf68e9d9ab fix add_dev() handling of broken links
Resolves issues like:
mdadm -Ss
mdadm: unable to open /dev/md/r1: No such file or directory

...where /dev/md/r1 points to a removed device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-11-10 09:30:07 -07:00
NeilBrown a714580e02 Wait for name to appear after create/assemble etc.
We don't really want mdadm to exit until udev has
created the names in /dev.  So wait.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-04 21:56:42 +11:00
NeilBrown 9008ed1c96 Assemble: allow members of containers to be assembled and auto-assembled.
Try to treat members of containers much like other arrays for
assembly.
We still look through the list of devices for a match (it will be
the container), then find the relevant 'info' and try to assemble
the array.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-04 20:51:12 +11:00
Dan Williams ce744c97bc Assemble: revert preliminary -As support
I have seen the light.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-11-04 20:51:11 +11:00
NeilBrown 40ebbb9cfe util: make env checking more generic
Change the "env_check_mdmon" function to be more generic, accepting
and environment variable name, as soon we will have a new use for it.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-04 10:35:43 +11:00
NeilBrown d7ab966bb8 Move recently merged /sys/dev/ lookup into stat2devnum.
But sysfs_init and stat2devnum try to convert stat information
into an md devnum.  Combine all the value of both pieces of code
into stat2devnum and have sysfs_init call that.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-04 10:35:08 +11:00
NeilBrown 6c766cf101 Merge branch 'master' into devel-3.0
Conflicts:

	Incremental.c
	super0.c
	super1.c
2008-10-30 13:59:11 +11:00
NeilBrown 2b4ca8f079 Fix --incremental assembly of partitions arrays.
If incremental assembly finds an array mentioned in mdadm.conf,
with a 'standard partitioned' name like /dev/md_d0 or /dev/md/d0,
it will not create a partitioned array like it should.
This is because it mishandled the 'devnum' returned by
is_standard.
That is a devnum that does not have the partition-or-not encoded
into it.  So we need to check the actual return value of
is_standard and encode the partition-or-not info into the devnum.

Also fix a couple of comments.


Signed-off-by: NeilBrown <neilb@suse.de>
2008-10-30 09:34:04 +11:00
Dan Williams 71d60c480a Preliminary -As support for container member arrays
Given an mdadm.conf like the following allow /dev/imsm and /dev/md/r1 to be
created by "mdadm -As".

DEVICES partitions 
ARRAY /dev/imsm metadata=imsm auto=md UUID=b98f5dbe-aa859e7b-0e369b89-a80986d4 
ARRAY /dev/md/r1 container=/dev/imsm member=0 auto=mdp UUID=3538e39c-b397c2e9-1aa031f9-2bc0eca4 
   spares=1

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-28 10:55:31 -07:00
NeilBrown 492350045c Merge branch 'master' into devel-3.0
Conflicts:

	Manage.c
2008-10-17 12:46:23 +11:00
Dan Williams 36ba7d4849 Allow a uuid of all f's to always match
The uuid returned for an imsm spare device will never match the uuid of an
active disk.  So make mdadm interpret a uuid of all f's as "match any".

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-15 14:43:57 -07:00
Dan Williams 9968e376a1 fname_as_uuid: print uuids msb first
The sha1 routines store the uuids in little endian byte-order, so always
print from msb to lsb. This allows imsm containers to be assembled with
-As.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-10-15 14:26:51 -07:00
NeilBrown e4965ef846 Improve reporting of layout for raid10.
Showing e.g.

   near=1, far=2

for the 'far2' layout of raid10 is confusing even though there is a
sense in which is it correct.

Make it less confusing by only printing whichever number is not 1.
If both are 1, make that clear too (i.e. no redundancy).
2008-10-13 16:15:18 +11:00
NeilBrown ff54de6e47 Report uuid in --detail --brief for ddf and intel
The uuid is slightly fictitious but needed for array matching.
2008-09-18 16:11:40 +10:00
NeilBrown d7288ddc3a Use uuid as /dev name when assembling array of uncertain origin.
If we aren't sure that the array belongs to 'this' host, use the
uuid to choose a name to avoid any conflict.
2008-09-18 16:08:10 +10:00
NeilBrown f35f252592 Move calls to SET_ARRAY_INFO to common helper.
When we assemble an array, there are three different approaches
depending on whether metadata is internal or external, and on
kernel version.

Move all this to a common helper instead of duplicating in 3 places.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-09-18 16:01:55 +10:00
NeilBrown 7801ac2092 Factor out add-disk code
The variety of approaches to 'add_disk' are factored out into
a separate function, and Incremental mode benefits by being
closer to supporting the assembly of containers.

Also remove the adding-to-array-data-structure out of sysfs_add_disk
and into add_disk.

And add some tests for --incremental mode to make sure we don't break it.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-09-18 15:13:32 +10:00
NeilBrown 0e60042683 Compile fixes, particularly moving more stuff under MDASSEMBLE
Now 'make everything' works again.
2008-09-18 15:04:47 +10:00
Dan Williams c94709e83f Add ping_monitor() to mdadm --wait
The action we are waiting for may not be complete until the monitor has
had a chance to take action on the result.

The following script can now remove the device on the first attempt,
versus a few attempts with the original Wait():
#!/bin/bash
#export MDADM_NO_MDMON=1
export IMSM_DEVNAME_AS_SERIAL=1
./mdadm -Ss
./mdadm --zero-superblock /dev/loop[0-3]
echo 2 > /proc/sys/dev/raid/speed_limit_max
./mdadm --create /dev/imsm /dev/loop[0-3] -n 4 -e imsm -a md
./mdadm --create /dev/md/r1 /dev/loop[0-3] -n 4 -l 5 --force -a mdp
./mdadm --fail /dev/md/r1 /dev/loop3
./mdadm --wait /dev/md/r1
x=0
while  ! ./mdadm --remove /dev/imsm /dev/loop3 > /dev/null 2>&1
do
        x=$((x+1))
done
echo "removed after $x attempts"
./mdadm --add /dev/imsm /dev/loop3

Include 2 small cleanups:
* remove the almost open coded fd2devnum() in Wait() by introducing a
  new utility routine stat2devnum()
* teach connect_monitor() to parse the container device from a subarray
  string

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-09-15 20:58:42 -07:00
NeilBrown 3c558363a1 Factor out test for subarray version string.
We are about to change the syntax of the version string
for 'subarray's.  So factor out the test into a single function.

Signed-off-by: NeilBrown <neilb@suse.de>
2008-08-19 17:55:15 +10:00
NeilBrown 9fe3204317 mdmon: fork and run as a daemon.
start_mdmon now waits for mdmon to complete initialisation and,
importantly, listen on the socket, before continuing.

Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:20 +10:00
NeilBrown 44d2e36556 Don't assume that mdmon is in the current directory.
Rather, assume that it is in the same directory from which
mdadm was run.  If not, then maybe /sbin or current directory.

Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:13 +10:00
NeilBrown 8850ee3e1e Factor common code into new "start_mdmon".
Signed-off-by: Neil Brown <neilb@suse.de>
2008-07-18 16:37:11 +10:00
Dan Williams 5dcfcb715d mdadm: add an environment variable to prevent auto-launching mdmon
Useful for attaching gdb to mdmon before any action is taken on the array.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-07-14 14:59:32 -07:00
Neil Brown 77472ff8d0 Introduce devname2devnum
and use it instead of opencoding.
2008-07-12 20:28:38 +10:00
Neil Brown 6416d5275d Use O_DIRECT for all IO to devices.
Using buffered IO risks non-atomic updates to parts of the
device that we don't actually want to write to.  This isn't in
general safe.
So switch to O_DIRECT for all that IO and make sure we have
properly aligned buffers.
2008-07-12 20:28:33 +10:00
Neil Brown edd8d13c02 Create arrays via metadata-update
Support creating arrays inside an active ddf container by
sending a metadata update over a pipe to mdmon.
2008-07-12 20:27:40 +10:00
Neil Brown d2ca644994 Remove getinfo_super_n and do some other cleaning up.
Getting close to a sensible description of what some of the
superswitch methods are supposed to do!
2008-07-12 20:27:39 +10:00
Neil Brown f7e7067b47 Add subarray field to supertype.
When loading the metadata for a subarray (super_by_fd), we set
->subarray to be the name read from md/metadata_version so that
getinfo_super can return info about the correct array.

With this we can differentiate between a container and
an array within the container by looking at ->subarray[0].
2008-07-12 20:27:38 +10:00
Neil Brown ef60947720 Always initialise a struct super_type to zero 2008-07-12 20:27:36 +10:00
Neil Brown 159c3a1a77 Remove st->text_version in favour of info->text_version
I want the metadata handler to have more control over the 'version',
particularly for arrays which are members of containers.
So discard st->text_version and instead use info->text_version
which getinfo_super can initialise.
2008-05-27 09:18:55 +10:00
Neil Brown a931db9ed7 auto-start mdmon on --create
FIXME uses sill hardcoded path.

Need --assemble too.
2008-05-27 09:18:42 +10:00
Neil Brown 355726fa01 Remember to close directories when we are finished with them. 2008-05-27 09:18:34 +10:00
Neil Brown 8c21018330 Alway use a unique file name for opendev
Else mdadm and mdmon running in parallel can tread on each other.
2008-05-27 09:18:33 +10:00
Dan Williams f7dd881f90 handle Manage_subdevs() for 'external' arrays
From: Dan Williams <dan.j.williams@intel.com>

1/ Block attempts to add/remove devices from container members
2/ Forward add/remove requests to containers

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-05-15 16:48:35 +10:00
Dan Williams cdddbdbca0 imsm: initial Intel(R) Matrix Storage Manager support
From: Dan Williams <dan.j.williams@intel.com>

The following now work:
--examine
--examine --brief

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-05-15 16:48:22 +10:00
Neil Brown 2f6079dc96 Create a container member
From: Neil Brown <neilb@suse.de>
2008-05-15 16:48:21 +10:00
Neil Brown 598f0d58ac Can now mostly assemble DDF arrays 2008-05-15 16:48:19 +10:00
Neil Brown 5f8097beb9 more ddf stuff
Create a BVD in a DDF

Do not actually assemble it yet...
2008-05-15 16:48:15 +10:00
Dan Williams a322f70c41 Initial DDF support code.
Create a ddf array by naming the device /dev/ddf* or
specifying metadata 'ddf'.

If ddf is specified with no level, assume a container (indeed,
anything else would be wrong).

**Need to use text_Version to set external metadata...

More ddf support

Load a ddf container.  Now
   --examine /dev/ddf
works.
super-ddf: fix compile warning

From: Dan Williams <dan.j.williams@intel.com>

super-ddf.c:723: format %lu expects type long unsigned int, but argument 3 has type unsigned int

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-05-15 16:48:14 +10:00
Neil Brown d03373f1de Some support for external metadata.
Allow specifying metadata type when creating arrays etc.
2008-05-15 16:48:13 +10:00
Neil Brown ea24acd073 Compiple fixes for mdassemble and diet-libc 2008-05-15 15:50:56 +10:00
Neil Brown ff1f6545db Fix support for --update=swapsuper
The user of dup_super broke it.
2008-05-15 15:50:48 +10:00
Neil Brown 3b0896f899 Fix possible NULL dereference in super_by_fd 2008-05-15 15:50:45 +10:00
Dan Williams 95b79df03e let '-a' be specified for Incremental mode
From: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-05-05 21:55:37 +10:00
Neil Brown 63152c1b33 Unify code into find_free_devnum.
Two places have code to find a free md device number.  Make this
a subroutine.
2008-05-05 21:55:36 +10:00
Dan Williams 5e747af24a fix load_super/free_super mismatch in util.c
From: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2008-05-05 21:55:35 +10:00
Neil Brown c2c9bb6fe0 diff -ru mdadm-2.6.4-orig/Query.c mdadm-2.6.4/Query.c 2008-04-29 17:13:55 +10:00
Neil Brown 142cb9e181 Use sysfs info for metadata version info in Detail and elsewhere. 2007-12-14 20:15:21 +11:00
Neil Brown 7e0f69790c Replace sysarray with mdinfo
Sure, mdinfo is bigger, but having a uniform structure for lots of things
will make life easier.
2007-12-14 20:14:59 +11:00
Neil Brown 1686dc25ec Find super from fd on an array.
We used to use the major/minor numbers, but that isn't sufficient
any more, so pass the fd, and possibly check 'text' version.
2007-12-14 20:14:38 +11:00
Neil Brown 3da92f272d Drop the superblock arg from all metadata methods.
It is now in the 'supertype'
2007-12-14 20:14:33 +11:00
Neil Brown 68c7d6d790 Add 'supertype' arg to almost all metadata methods.
The 'superblock' will be moved into this structure soon.
2007-12-14 20:14:16 +11:00
Neil Brown df37ffc039 Allow metadata handlers to free their own superblock.
As the metadata handler allocates the superblock, it should free it
too.  DDF will have a more complex 'superblock' which needs more complex
freeing.
2007-12-14 20:14:00 +11:00
Neil Brown aba69144fd Remove spaces/tabs from ends of lines. 2007-12-14 20:13:43 +11:00
Neil Brown 8e22992203 Remove bogus add_dev definition.
If nether ftw nor nftw are available, add_dev gets defined twice.
Fix that...
2007-05-08 17:12:33 +10:00
Neil Brown 8382f19bdc Add new mode: --incremental
--incremental allows arrays to be assembled one device at a time.
This is expected to be used with udev.
2006-12-21 17:10:52 +11:00
Neil Brown 350f29f90d Centralise code for copying uuid
Rather than opencoding the byteswap all the time.
2006-12-14 17:33:14 +11:00
Neil Brown beae1dfe2e Central calls to ioctl BLKGETSIZE
Instead of opencoding the same thing everywhere.
2006-12-14 17:32:57 +11:00