Update external reshape documentation.
Revise documentation for external reshape, correcting some problems, and clarifying some issues. Signed-off-by: NeilBrown <neilb@suse.de>
This commit is contained in:
parent
6c93202898
commit
8bd67e345e
|
@ -35,7 +35,7 @@ raid5 in a container that also houses a 4-disk raid10 array could not be
|
||||||
reshaped to 5 disks as the imsm format does not support a 5-disk raid10
|
reshaped to 5 disks as the imsm format does not support a 5-disk raid10
|
||||||
representation. This requires the ->reshape_super method to check the
|
representation. This requires the ->reshape_super method to check the
|
||||||
contents of the array and ask the user to run the reshape at container
|
contents of the array and ask the user to run the reshape at container
|
||||||
scope (if both subarrays are agreeable to the change), or report an
|
scope (if all subarrays are agreeable to the change), or report an
|
||||||
error in the case where one subarray cannot support the change.
|
error in the case where one subarray cannot support the change.
|
||||||
|
|
||||||
1.3 Monitoring / checkpointing
|
1.3 Monitoring / checkpointing
|
||||||
|
@ -77,7 +77,7 @@ specific areas for managing reshape. The implementation also needs to spawn a
|
||||||
reshape-manager per subarray when the reshape is being carried out at the
|
reshape-manager per subarray when the reshape is being carried out at the
|
||||||
container level. For these two reasons the ->manage_reshape() method is
|
container level. For these two reasons the ->manage_reshape() method is
|
||||||
introduced. This method in addition to base tasks mentioned above:
|
introduced. This method in addition to base tasks mentioned above:
|
||||||
1/ Spawns a manager per-subarray, when necessary
|
1/ Processed each subarray one at a time in series - where appropriate.
|
||||||
2/ Uses either generic routines in Grow.c for md-style backup file
|
2/ Uses either generic routines in Grow.c for md-style backup file
|
||||||
support, or uses the metadata-format specific location for storing
|
support, or uses the metadata-format specific location for storing
|
||||||
recovery data.
|
recovery data.
|
||||||
|
@ -98,6 +98,22 @@ running concurrently with a Create() event.
|
||||||
|
|
||||||
2.1 Freezing sync_action
|
2.1 Freezing sync_action
|
||||||
|
|
||||||
|
Before making any attempt at a reshape we 'freeze' every array in
|
||||||
|
the container to ensure no spare assignment or recovery happens.
|
||||||
|
This involves writing 'frozen' to sync_action and changing the '/'
|
||||||
|
after 'external:' in metadata_version to a '-'. mdmon knows that
|
||||||
|
this means not to perform any management.
|
||||||
|
|
||||||
|
Before doing this we check that all sync_actions are 'idle', which
|
||||||
|
is racy but still useful.
|
||||||
|
Afterwards we check that all member arrays have no spares
|
||||||
|
or partial spares (recovery_start != 'none') which would indicate a
|
||||||
|
race. If they do, we unfreeze again.
|
||||||
|
|
||||||
|
Once this completes we know all the arrays are stable. They may
|
||||||
|
still have failed devices as devices can fail at any time. However
|
||||||
|
we treat those like failures that happen during the reshape.
|
||||||
|
|
||||||
2.2 Reshape size
|
2.2 Reshape size
|
||||||
|
|
||||||
1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
|
1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
|
||||||
|
@ -134,24 +150,52 @@ sync_action
|
||||||
because only redundant raid levels can modify the number of raid disks
|
because only redundant raid levels can modify the number of raid disks
|
||||||
2/ mdadm::Grow_reshape(): calls ->reshape_super() to check that the level
|
2/ mdadm::Grow_reshape(): calls ->reshape_super() to check that the level
|
||||||
change is allowed (being performed at proper scope / permissible
|
change is allowed (being performed at proper scope / permissible
|
||||||
geometry / proper spares available in the container) prepares a metadata
|
geometry / proper spares available in the container), chooses
|
||||||
update.
|
the spares to use, and prepares a metadata update.
|
||||||
3/ mdadm::Grow_reshape(): Converts each subarray in the container to the
|
3/ mdadm::Grow_reshape(): Converts each subarray in the container to the
|
||||||
raid level that can perform the reshape and starts mdmon.
|
raid level that can perform the reshape and starts mdmon.
|
||||||
4/ mdadm::Grow_reshape(): Pushes the update to mdmon...
|
4/ mdadm::Grow_reshape(): Pushes the update to mdmon.
|
||||||
4a/ mdmon::process_update(): marks the array as reshaping
|
5/ mdadm::Grow_reshape(): uses container_content to find details of
|
||||||
4b/ mdmon::manage_member(): adds the spares (without assigning a slot)
|
the spares and passes them to the kernel.
|
||||||
5/ mdadm::Grow_reshape(): Notes that mdmon has assigned spares and invokes
|
6/ mdadm::Grow_reshape(): gives raid_disks update to the kernel,
|
||||||
->manage_reshape()
|
sets sync_max, sync_min, suspend_lo, suspend_hi all to zero,
|
||||||
5/ mdadm::<format>->manage_reshape(): (for each subarray) sets sync_max to
|
and starts the reshape by writing 'reshape' to sync_action.
|
||||||
zero, starts the reshape, and pings mdmon
|
7/ mdmon::monitor notices the sync_action change and tells
|
||||||
5a/ mdmon::read_and_act(): notices that reshape has started and notifies
|
managemon to check for new devices. managemon notices the new
|
||||||
the metadata handler to record the slots chosen by the kernel
|
devices, opens relevant sysfs file, and passes them all to
|
||||||
6/ mdadm::<format>->manage_reshape(): saves data that will be overwritten by
|
monitor.
|
||||||
|
8/ mdadm::Grow_reshape() calls ->manage_reshape to oversee the
|
||||||
|
rest of the reshape.
|
||||||
|
|
||||||
|
9/ mdadm::<format>->manage_reshape(): saves data that will be overwritten by
|
||||||
the kernel to either the backup file or the metadata specific location,
|
the kernel to either the backup file or the metadata specific location,
|
||||||
advances sync_max, waits for reshape, ping mdmon, repeat.
|
advances sync_max, waits for reshape, ping mdmon, repeat.
|
||||||
6a/ mdmon::read_and_act(): records checkpoints
|
Meanwhile mdmon::read_and_act(): records checkpoints.
|
||||||
7/ mdadm::<format>->manage_reshape(): Once reshape completes changes the raid
|
Specifically.
|
||||||
|
|
||||||
|
9a/ if the 'next' stripe to be reshaped will over-write
|
||||||
|
itself during reshape then:
|
||||||
|
9a.1/ increase suspend_hi to cover a suitable number of
|
||||||
|
stripes.
|
||||||
|
9a.2/ backup those stripes safely.
|
||||||
|
9a.3/ advance sync_max to allow those stripes to be backed up
|
||||||
|
9a.4/ when sync_completed indicates that those stripes have
|
||||||
|
been reshaped, manage_reshape must ping_manager
|
||||||
|
9a.5/ when mdmon notices that sync_completed has been updated,
|
||||||
|
it records the new checkpoint in the metadata
|
||||||
|
9a.6/ after the ping_manager, manage_reshape will increase
|
||||||
|
suspend_lo to allow access to those stripes again
|
||||||
|
|
||||||
|
9b/ if the 'next' stripe to be reshaped will over-write unused
|
||||||
|
space during reshape then we apply same process as above,
|
||||||
|
except that there is no need to back anything up.
|
||||||
|
Note that we *do* need to keep suspend_hi progressing as
|
||||||
|
it is not safe to write to the area-under-reshape. For
|
||||||
|
kernel-managed-metadata this protection is provided by
|
||||||
|
->reshape_safe, but that does not protect us in the case
|
||||||
|
of user-space-managed-metadata.
|
||||||
|
|
||||||
|
10/ mdadm::<format>->manage_reshape(): Once reshape completes changes the raid
|
||||||
level back to the nominal raid level (if necessary)
|
level back to the nominal raid level (if necessary)
|
||||||
|
|
||||||
FIXME: native metadata does not have the capability to record the original
|
FIXME: native metadata does not have the capability to record the original
|
||||||
|
|
Loading…
Reference in New Issue