Commit Graph

110 Commits

Author SHA1 Message Date
NeilBrown 38376c2e51 Monitor: handle v.quick removal of devices better.
If a device fails and then is removed before Monitor sees
the failure, GET_DISK_INFO returns nothing so Monitor relies
on mdstat info where '_' is incorrectly interpreted as 'a spare'.

We should treat '_' as 'removed' - that is safer.

Without this, a v.quick fail+remove gets reported as 'Failed' then
'SpareActive'.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-22 14:47:55 +11:00
Adam Kwolek 983fff45a1 FIX: ping_monitor() usage causes memory leaks
When for ping_monitor() input devnum2devname() is used,
received string pointer should be passed to free() for memory release.
It is not made in several places. This use case should have function
to avoid memory leak.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-03-18 12:32:16 +11:00
NeilBrown 71204a5029 Various compile fixes.
Make "make everything" succeed.
This fixed some real bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 15:48:03 +11:00
NeilBrown e5508b361d Allow domain_test to report that no domains were found.
Sometime we will need to know the difference between no domains found
and domains didn't match.
So allow domain_test to return different values and fix up all callers
to maintain current behaviour.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 14:44:02 +11:00
Czarnowska, Anna bfd76b9309 Monitor: do not move partitions to external container
Arrays on partitions are not supported for external metadata
so do not take such spare from native array.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-02-01 10:40:56 +11:00
Czarnowska, Anna a1e49d6956 Monitor: avoid adding too many spares to container
Tests revealed that sometimes there are still more spares taken
than needed. The reason for this is that after adding one spare
to container with degraded subarray
if between ioctl in main loop and load_container in try_spare_migration
mdmon activates the spare we see active<raid but find no spares in parent
container and so add an extra spare.

To prevent such behaviour we count active disks in the list returned
by getinfo_super_disks and compare it with subarray->active.
If the number has increased it means new spare was added and activated
so there is no need for more.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-28 11:18:57 +10:00
Czarnowska, Anna 300f503323 fix: Monitor: min_size must be set to 0
Otherwise a random value will be used for comparison later
for native and ddf metadata (until min_acceptable_spare_size is defined).

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-17 12:46:14 +11:00
Czarnowska, Anna c0dc0ad5f8 fix: segfault if subarray is monitored but container is not
In this situation to->parent is null so "to" doesn't change to
parent container and to->metadata is still null.
This results in segmentation fault when checking
to->metadata->ss->external.
We should just skip this array as container is needed to move spares to.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-17 12:01:11 +11:00
Anna Czarnowska de697acc4c Monitor: skip array if error getting size
load_super tries to load container first anyway but if it fails
eg. after physically removing a disk
then it tries to read metadata from container device.
This will always fail and print confusing errors.
So use load_container instead of load_super on container.

On failure to read metadata we should skip this array.
It will be dealt with the next time round.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-12 16:16:14 +11:00
Anna Czarnowska d52bb542d4 move_spare function modified and moved to Manage.c
It will also be needed for Incremental.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-05 14:34:32 +11:00
Anna Czarnowska 326727d9c9 Use one function chosing spares from container
container_chose_spares in Monitor.c and
get_spares_for_grow in super-intel.c
do the same thing: search for spares in a container.

Another version will also be needed for Incremental
so a more general solution is presented here and
applied in two previous contexts.

Normally domlist==NULL would lead an empty list but
this is typically checked earlier so here it is interpreted
as "do not test domains".

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-01-05 14:34:14 +11:00
Marcin Labun 5ec0f3738a Monitor: Check destination array domain early.
Destination arrays that do not have any domains are excluded
from spare sharing. We can check it early, without searching
for donor arrays.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-21 09:07:08 +11:00
Anna Czarnowska 44d337f04d fix: Monitor doesn't return after starting daemon
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-15 15:51:53 +11:00
NeilBrown 833bb0f8f6 Allow --update=devicesize with --re-add
This is useful with 1.1 and 1.2 metadata to update the metadata if
the device size has changed.
The same functionality can be achieved by writing to the device size
in sysfs after re-adding normally, but in some cases this might be
easier.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-09 13:06:29 +11:00
Anna Czarnowska e9a2ac028e Monitor: don't add more spares than needed
When we add a spare to a container it takes a while
before it is noticed by mdmon and recovery starts.
During this time the array remains degraded but we don't want to add
any more spares to this container. Therefore we must check container
with degraded array if it doesn't already have a suitable spare.
container_choose_spare is reused with from=to
Domain check is not needed in this situation.

Ping_manager after moving disk is needed to be able to see
newly added disk in container after coming back through the loop.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-03 14:11:29 +11:00
Anna Czarnowska f0b8530630 Monitor: only get min_size once
We may call chose_spare several times before we find a suitable one
so it is better to get the size beforehand.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-03 14:05:11 +11:00
Anna Czarnowska 83f3bc5f04 Monitor: pass statelist reference when adding new arrays
Otherwise it will not get updated.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-12-03 14:03:35 +11:00
Anna Czarnowska ef15641fb5 Monitor: array that has disappeared doesn't need spares
If a degraded array disappears we still have it in statelist
with active<raid but it is pointless to look for spares for it.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:58:22 +11:00
Anna Czarnowska a1bb206520 Monitor: fix writing autorebuild.pid
If /var/run/mdadm doesn't exist we can never succeed writing
so we should try to create it first. When we make sure it is there we
write pid file as before.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:57:41 +11:00
Anna Czarnowska 24baa548c4 Monitor: reset dev when size too small
Cc: linux-raid@vger.kernel.org, Williams, Dan J <dan.j.williams@intel.com>, Ciechanowski, Ed <ed.ciechanowski@intel.com>

Otherwise spare will be considered good anyway.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:56:48 +11:00
Anna Czarnowska 0f0749ad93 Monitor: devid should be dev_t
For consistency with makedev().
int is not sufficient.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:56:28 +11:00
Anna Czarnowska ff044d6ba7 Monitor: few bug fixes for spare migration
1. If array not changed we should still report any degraded
    - another array may have a new spare that we can move.
2. Array with err=1 can't give a spare.
3. We look for spares in "from" not "st" which is supertype
   and has devname=NULL.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-29 09:51:27 +11:00
NeilBrown 5739e0d007 Monitor: choose spare correctly for external metadata.
When metadata is managed externally - probably as a container - we
need to examine that metadata to see which devices are spares.

So use the getinfo_super_disk message and use the info returned.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-25 18:58:27 +11:00
NeilBrown 0fa21e8522 Monitor: separate 'choose_spare' out from 'move_spare'
choosing a spare from a container is more complicated that
from a native array.  So separate out choose_spare to make it easier
to use an alternate implementation

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-25 18:37:23 +11:00
NeilBrown 062dc4817d Monitor: check spare group is non-NULL before adding to domain list
... otherwise we crash.

Reported-by: "Labun, Marcin" <Marcin.Labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-23 11:11:45 +11:00
Anna Czarnowska 80e7f8c31a Monitor: Allow metadata to set minimum size for spare to migrate in.
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown 66f5c4b665 Monitor: teach spare migration about containers
When trying to move a spare, move to the container of a degraded
array, not to the array itself.
And don't try to move from a subarray, only from a native or container
array.
And don't move from a container which contains degraded subarrays.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown e78dda3bf5 Monitor: policy based spare migration.
Rather than only migrating between arrays with the same spare_group,
we now migrate based on domains set in the policy.

In order for spare_group to continue to work, we treat it as a domain
of the destination array, and a domain of any device we might remove
from a source array.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown 2feb22efbc Monitor: split out check_donor
Checking compatibility between arrays for spare migration is going to
become a little more complicated, so split it out into a separate
function.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown 6d3d44d98c Monitor: split out move_spare in spare migration.
This is a simple refactoring with no functionality change.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown e0bd6a9637 Monior: create struct for holding alert info.
Rather than passing mailaddr, mailfrom, cmd, dosyslog around in
argument lists, create a structure to hold them all.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown 9bfc6a7d1a Monitor: use calloc rather than malloc
calloc zeros the memory allocated, which is safer, particularly as
we add more things to struct state.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown a90e1050b5 Monitor: minor optimisation to spare migration.
Only try spare migration if we know that at least one array is
degraded.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
Marcin Labun c3621c0a5f Monitor: link containers with subarrays in statelist
Each containers has list of its subarrays. Each subarray
has back link to its parent container.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown 2e0172b110 Break Monitor into smaller functions.
Monitor() has become way too big.  Break it up into multiple smaller
functions that are all called from the main loop.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown ca750d9830 Monitor: track metadata type or parent/container of arrays.
For subarrays, record the devid of the parent.
For others arrays, record the metadata type.

This will be used in a subsequent patch to link related arrays
together and allow spare migration between containers.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
Anna Czarnowska 5d19bb23dd Monitor: include containers in scan mode
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:07 +11:00
NeilBrown b1717f0afc Monitor: avoid skipping checks on external arrays
utime is not correct for external metadata so we must
not risk the observed time ever matching the old time.

Reported-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
Anna Czarnowska edde9560fa mdadm: added --no-sharing option for Monitor mode
--no-sharing option disables moving spares between arrays/containers.
Without the option spares are moved if needed according to config rules.
We only allow one process moving spares started with --scan option.
If there is such process running and another instance of Monitor
is starting without --scan, then we issue a warning but allow it
to continue.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
Anna Czarnowska 0eac199a2c Monitor: set err on arrays not in mdstat
mse can be NULL when the array was not in mdstat when we read it
but existed in statelist and was recreated after reading mdstat.
In this case we set err as we can't get full update on this array
this time.
If the same array is given twice in command line it appears twice
in statelist. The first one will mark mse->devnum=INT_MAX
so the second one can't find mse. We set err on the second one as
it's not needed. Also if it becomes degraded we would look for spares
twice for the same array.

Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:06 +11:00
NeilBrown a655e55064 Improve type names for mddev_dev
Remove the _t pointer typedef and remove the _s suffix for the
structure,

These things do not help readability.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:05 +11:00
NeilBrown fa56eddbd1 Improve mddev_ident type definitions.
Remove the _t typedef and remove the _s suffix from the struct name.

These things do not help readability.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-11-22 20:58:05 +11:00
NeilBrown f21e18ca89 Compile with -Wextra by default
This produced lots of warning, some of which pointed to actual bugs.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-08-05 13:13:02 +10:00
NeilBrown 7d2e6486e3 Add --test option to --re-add and similar
--test can be given in Manage mode.
This can be used when there is an attempt to fail or remove 'faulty',
'failed' or 'detached' devices, or to re-add 'missing' devices.
If no devices were failed, removed, or re-added, then mdadm will
exit with status '2'.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-06 12:07:07 +10:00
NeilBrown 4460f8f7c3 Monitor: don't report the disappearance of a faulty device as SpareActive.
Normally Monitor doesn't see faulty devices in active slots - they get
moved away too quickly.
But if it does, it reports the "faulty device disappeared" event (when
it finally does get moved away) as SpareActive due to insufficient
checking.

So add a better check.

Reported-by:  Pierre Vignéras <pierre@vigneras.name>
2010-05-18 12:31:29 +10:00
Zdenek Behan 9a36a9b713 Monitor: add option to specify rebuild increments
ie. the percent increments after which RebuildNN event is generated

This is particulary useful when using --program option, rather than
(only) syslog for alerts.

Signed-off-by: Zdenek Behan <rain@matfyz.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-10-19 13:13:58 +11:00
NeilBrown 6278fb3af7 Monitor: use pclose rather than fclose
Using pclose is probably the right thing to do seeing that we
used popen, but as there is no clear need to wait for sendmail
to finish, it isn't really important.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-10 14:39:20 +10:00
NeilBrown 3b435195fc Merge branch 'master' into devel-3.0
Conflicts:
	super0.c
	super1.c
2009-06-02 15:28:36 +10:00
NeilBrown 38a07ed61e Move WaitClean from Monitor.c to sysfs.c
That way mdmon doesn't need to include Monitor.o

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-02 15:27:16 +10:00
NeilBrown e736b62389 Update copyright dates and remove references to @cse.unsw.edu.au
Also removed 'paper' addresses.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-02 14:35:45 +10:00