Commit Graph

1388 Commits

Author SHA1 Message Date
NeilBrown 945f0fcd01 Release mdadm-3.1.5
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 15:50:18 +11:00
NeilBrown ebdc4d1253 Incr: don't exclude 'active' devices from auto inclusion in a container.
For containers, it is always appropriate to include a device in the
container.
Whether it should then be included in an array is a separate question.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 15:42:35 +11:00
NeilBrown fb0d4b9ca2 --stop: separate 'is busy' test for 'did it stop properly'.
Stopping an md array requires that there is no other user of it.
However with udev and udisks and such there can be transient other
users of md devices which can interfere with stopping the array.

If there is a transient users, we really want "mdadm --stop" to wait a
little while and retry.
However if the array is genuinely in-use (e.g. mounted), then we
don't want to wait at all - we want to fail immediately.

So before trying to stop, re-open device with O_EXCL.  If this fails
then the device is probably in use, so give up.

If it succeeds, but a subsequent STOP_ARRAY fails, then it is possibly
a transient failure, so try again for a few seconds.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 15:42:24 +11:00
NeilBrown ce7a187b9a Monitor: handle v.quick removal of devices better.
If a device fails and then is removed before Monitor sees
the failure, GET_DISK_INFO returns nothing so Monitor relies
on mdstat info where '_' is incorrectly interpreted as 'a spare'.

We should treat '_' as 'removed' - that is safer.

Without this, a v.quick fail+remove gets reported as 'Failed' then
'SpareActive'.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:09:48 +11:00
NeilBrown 7c19b781d5 ddf: fix up detection of failed/missing devices.
If a device hasn't been found yet we can still tell if it is
expected to be working, and we must to do to make sure
'working_disks' is correct.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:09:38 +11:00
Piergiorgio Sartor b306624e2b restripe: allow test code to have an offset on each device.
If device name ends :number, e.g.
   /dev/sda0:1234

then assume the RAID data starts that many sectors from start of
device.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:09:29 +11:00
NeilBrown 89ced23b58 Assemble: improve efficacy of -Af in assembling degraded dirty arrays.
If a degraded dirty array has some superblocks which are clean and
others that are dirty, and the dirty ones are newer by precisely '1'
in the event count, then the current code to force the array to be
clean will not work.
We need to make sure to find a superblock with most recent event count
and force that one to be 'clean'.

Reported-by: A J Wyborny <ajwyborny@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-23 11:07:27 +11:00
NeilBrown 666bba9b50 mdadm.man: added encouragement to shrink filesystem before array.
Suggesting by Rory Jaffe <rsjaffe@gmail.com> to make the danger
of shrinking, and to recommended avoidance technique, more explicit.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 15:24:03 +11:00
NeilBrown b0edee6efb ddf: implement remove_from_super
This is needed to remove devices from mdmon's knowledge when the
device is removed from the md container.

Now that ddf have a remove_from_super we don't need the code
that allows some personalities not to implement this.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 15:10:32 +11:00
Labun, Marcin 258c3e921b IMSM: Fix problem in mdmon monitor of using removed disk in imsm container.
Manager thread shall pass the information to monitor thread (mdmon)
that some devices are removed from container.  Otherwise, monitor
(mdmon) might use such devices (spares) to rebuild the array that has
gone degraded.

This problem happens for imsm containers, since a list of the
container disks is maintained in intel_super structure. When array
goes degraded, the list is searched to find a spare disks to start
rebuild.  Without this fix the rebuild could be stared on the spare
device that was a member of the container, but has been removed from
it.

New super type function handler has been introduced to prepare
metadata format specific information about removed devices.

int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo)

The message prepared in remove_from_super is later processed by
process_update handler in monitor thread.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 15:09:31 +11:00
NeilBrown 33b0edd78a DDF Allow a RAID1 to be 'partially optimal'.
If a RAID1 is meant to have more than 2 device and while it doesn't
have that many, it still has more than 1, then according to the
DDF spec it is "partially optional" rather than "degraded"
So make that so.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 15:09:24 +11:00
NeilBrown c7079c8441 ddf: remove failed devices that are no longer in use.
The DDF spec requires we have a phys disk record for every physically
attached device.  But it isn't clear what that means in the case
of soft raid in a general purpose Linux computer.
So remove phys disk records for any failed device that is not
active in any array.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 15:02:49 +11:00
NeilBrown 8401644c3a ddf: set Rebuilding flag when adding devices to a degraded array
This is a big fragile, but DDF has wierd rules that we aren't really
set up to handle properly.

When we add a device to a degraded array it must be a spare, so
mark it as Rebuilding.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 14:57:46 +11:00
NeilBrown e5cc7d469f ddf: use correct loop variable in activate_spare
Using 'i' when you mean 'j' just shows how silly it is to use
variables named 'i' and 'j'.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 14:54:46 +11:00
NeilBrown 77632af906 ddf: Don't consider 'dl' entries with state_fd < 0
These have been marked as invalid (recently failed) so
don't trust the major/minor associated with them.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 14:53:00 +11:00
NeilBrown 0c4f6e378b managemon: Don't do spare assignment while any updates are pending.
Spare assignment requires full knowledge of array state.  A pending
update might modify that state (such as a pending spare assignment)
so don't try while there are updates pending.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 14:51:12 +11:00
NeilBrown 02c39ab1d5 Manage/external: for external metadata, add_to_super needs lock on container.
add_to_super could use information from the current superblock (ddf
does), so add_to_super for external metadata should be called with
the O_EXCL lock held on the container to ensure the update is complete
before any other process tries to make any changes (like adding
another device to array).

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-15 14:48:20 +11:00
NeilBrown 1502a43a08 ddf: set vcnum correctly when creating a new virtual device in conflist
We weren't setting ->vcnum at all when an array was added.  This
meant that a subsequent device failure could be assigned to the
wrong array.

Reported-by: Albert Pauw <albert.pauw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:47:47 +11:00
NeilBrown e1316fab98 ddf: teach set_disk to cope with new or changed devices.
When set_disk is called, we need to check if the disk has changed or
recently appeared, and update everything properly if it has.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:45:26 +11:00
NeilBrown 8a38cb04de ddf: free_super should be add_list as well.
It is possible there is data and even an open file descriptor
on 'add_list' - so it must be freed too.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:32:38 +11:00
NeilBrown 7590d5623b ddf: minor activate_super fixes.
1/ ignore devices with "state_fd < 0" as these have been removed.
2/ Set update 'length' properly and clear 'space'.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:30:34 +11:00
NeilBrown e40512fddb monitor: close recovery_fd when closing state_Fd
These should be open or closed together.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-14 18:24:01 +11:00
NeilBrown 18cb44962d ddf: Failed should suppress Online and others.
so the notes say, so make it so.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 18:14:43 +11:00
NeilBrown d6508f0cfb Manage: be more careful about --add attempts.
If an --add is requested and a re-add looks promising but fails or
cannot possibly succeed, then don't try the add.  This avoids
inadvertently turning devices into spares when an array is failed but
the devices seem to actually work.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:25:40 +11:00
NeilBrown 37e430d163 ddf: remove duplicate container_member setting.
We were setting ->container_member twice in ddf get_info.
Once to currentconf->vcnum,
once to atoi(st->subarray).

Both should be the same.
For consistency with super-intel, use the first.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:24:44 +11:00
NeilBrown 01afe516ce Fix warning about host-endian bitmaps.
Hostendian bitmaps should be warned about on all arch's.
And fix a speeling mistake.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:22:25 +11:00
NeilBrown 0d924c9a93 Grow: give useful message when adding bitmap gives EBUSY.
If adding a bitmap fails with EBUSY, then it is because the array is
currently resyncing/recovering/reshaping.
As this is non-obvious, give a message explaining the fact.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:21:57 +11:00
NeilBrown 1f9476aaf8 Assemble: add --update=no-bitmap
This allows an array with a corrupt internal bitmap to be assembled
without the bitmap.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:21:43 +11:00
NeilBrown 321e575910 Assemble: call remove_partitions later.
We shouldn't call remove_partitions until we have made a really firm
decision to include the device into the array.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:21:19 +11:00
NeilBrown 515afde355 mdmon: don't copy an invalid chunk_size
As chunk_size in mdstat_ent is never set, we shouldn't copy
it into a->info.array.
In fact, it is safest to get rid of the field altogether.

Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:21:03 +11:00
NeilBrown 002a3de3d4 ddf: fail creation of new subarray with same name as old.
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:20:25 +11:00
NeilBrown e42eb35545 Create: report failure if array cannot be started.
We weren't checking the result of writing 'active' to array_state

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:20:07 +11:00
NeilBrown 48b1fc9ddb Grow: disallow placing backup file on array being reshaped.
the tests here aren't perfect, but they could catch some cases.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:19:20 +11:00
NeilBrown 0f3bd5b6c4 Create/grow: improve checks on number of devices.
Check on upper limit of number of devices was in the wrong place.
Result was could not create array with more than 27 devices without
explicitly setting metadata, even though default metadata allows more.

Fixed, and also perform check when growing an array.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:18:26 +11:00
NeilBrown 5101e4681f error check reading of 'degraded' from sysfs.
I'm seen mdadm spinning while failing to read 'degraded'.
This doesn't really fix it, but is a reminder that it needs to be
fixed.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:07:53 +11:00
Krzysztof Wojcik 3275e05ec1 FIX: Reset disk state if disk is missing
If we can't read actual disk state, it shoud be initiated
to 0.
Overwise it may be out of date value resulting false action
later in code (e.g. set disk to improper state).

Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:07:04 +11:00
NeilBrown abe7250bc7 open_mddev: open RDONLY if RDWR doesn't work.
If an array is read-only then "mdadm -S"
cannot open it to stop it without this fix.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:07:04 +11:00
NeilBrown a4622ac62c Initialise all of file when opening backup file for reshape.
Due to a miscalculation we didn't initialise the whole file.
There is 4K (8 sectors) for the metadata, then the data.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 17:06:59 +11:00
NeilBrown 66dedd88e6 mdadm.man add encouragement to shrink filesystem before shrinking array.
Before resizing an array with --size or --array-size, then filesystem
should be resized.  mdadm cannot do this so the user should.

Reported-by: Gavin Flower <gavinflower@yahoo.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 16:58:48 +11:00
NeilBrown d4f9ad2de8 Detail: report subarrays of a container properly.
Due to the wrong variable being used, this part of --detail
wasn't working at all.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 16:55:26 +11:00
NeilBrown a821dc7fdf dev_open should always open read-only.
When opening an array to manipulate it we never need to write to the
array and  sometimes it might be read-only so the open for write will
fail.
So always open read-only.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-10 16:54:32 +11:00
NeilBrown fa033beca9 ddf: exclude failed devices from container_content
If a device is failed, then don't include it in the reported
container_content, else it might get included in the array.

Reported-by: Albert Pauw <albert.pauw@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-03 09:04:42 +11:00
NeilBrown c870b7dda3 mdadm.man: remove duplicate documentation for --array-size
We somehow got to version of documentation for --array-size.
So merge them it one.

Reported-by:  Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-02 11:34:49 +11:00
Albert Pauw 866f509fb9 FIX: ReadMe.c -Y option missing in short_options
Hi Neil,

I noticed that the -Y option, as in mdadm -D -Y /dev/md0, doesn't work
but used as --export it works.

So I made a little patch to fix it, but it is simply sticking a Y in the
list of short_options in ReadMe.c.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-27 15:56:54 +11:00
NeilBrown d43494fc3c Teach --assemble --force to handle reshapes a little better.
When we force-assemble an array which is in the middle of a reshape,
we should repeat the reshape of any parts that aren't recorded in
the oldest superblock.

This is unlikely to make a significant difference, but could make
a small difference, and is safer.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-21 11:41:01 +11:00
NeilBrown 47573b0015 Fix regression with removing 'failed' and 'detached' devices.
If a request to remove all 'failed' or 'detached' devices chooses to
remove the first device, it will not actually try the removal and will
skip any following devices.

This fixes it.

Reported-by:  Rémi Rérolle <rrerolle@lacie.com>
Tested-by:  Rémi Rérolle <rrerolle@lacie.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-15 10:45:42 +11:00
Dan Williams f58dd43741 fix extended partition detection
# mdadm --detail --export /dev/md127p1

Before:
MD_LEVEL=raid5
MD_DEVICES=4
MD_METADATA=0.90

After:
MD_LEVEL=raid5
MD_DEVICES=4
MD_CONTAINER=/dev/md0
MD_MEMBER=0
MD_UUID=55746a20:925d24a7:4f9bd7e2:9c9a411f

We parse the symlink target with a format:

../../block/mdXXX/mdXXXpYY

...and need the second '/' from the end of the string to read detect a
'md' device.

Reported-by: Krzysztof Wasilewski <krzysztof.wasilewski@intel.com>
Cc: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-15 10:45:33 +11:00
Luca Berra a2973b6af2 segfault in imsm create with wrong arguments
When calling mdadm -C --metadata=imsm -l 1 /dev/sd..
mdadm segfaults in default_chunk_imsm()
above syntax is incorrect, but mdadm should error instead of segfaulting

Signed-off-by: Luca Berra <bluca@comedia.it>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-13 13:51:07 +11:00
John Robinson cd19c0cf1c mdadm.8: man page improvements concerning reshape and metadata types.
- other differences between 0.90 and 1.x metadata explained
- reshape text enhanced to properly acknowledge shrinks and in-place
  reshapes, particularly in the context of --backup-file.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-02 23:08:00 -04:00
NeilBrown a2ce5a1af1 Fix byte-order conversion in update_super1("assemble")
This code is wrong is several ways, and failed on big-endian machines.
Put in correct endian coversions: 'want' is cpu-order, dev_roles[] is little-endian,
16 bit.

Reported-by: Doug Nazar <nazard.michi@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-09-16 20:58:31 +10:00