mdadm

Commit Graph

Author	SHA1	Message	Date
Song Liu	198d54787c	add crc32c and use it for r5l checksum In kernel space, r5l checksum will use crc32c: http://marc.info/?l=linux-raid&m=144598970529191 mdadm need to change too. This patch ports a simplified crc32c algorithm from kernel code, and used in super1.c:write_empty_r5l_meta_block(); Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-30 17:38:28 +11:00
Song Liu	051f326550	mdadm: refactor write journal code in Assemble and Incremental As discussed, standalone require_journal() in struct superswitch is not a very good idea. Instead, journal related information fits well in struct mdinfo. This patch simplifies journal support code in Assemble and Incremental as: - Add journal_device_required and journal_clean to struct mdinfo; - Remove function require_journal from struct superswitch; - Update Assemble and Incremental to use journal_device_required and journal_clean from struct mdinfo (instead of separate var). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-22 12:19:09 +11:00
Guoqing Jiang	d15a1f72bd	Safeguard against writing to an active device of another node Modifying an exiting device's superblock or creating a new superblock on an existing device needs to be checked because the device could be in use by another node in another array. So, we check this by taking all superblock locks in userspace so that we don't step onto an active device used by another node and safeguard against accidental edits. After the edit is complete, we release all locks and the lockspace so that it can be used by the kernel space. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-21 11:19:05 +11:00
Song Liu	69a481166b	Assemble array with write journal Example output: ./mdadm --assemble /dev/md0 /dev/sd[c-f] /dev/sdb1 mdadm: /dev/md0 has been started with 4 drives and 1 journal. mdadm checks superblock for journal devices. If the journal device is missing or faulty, mdadm will show warning ./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1 mdadm: Not safe to assemble with missing or stale journal device, consider --force. User can insist to start the array (read only) with --force ./mdadm --assemble /dev/md0 /dev/sd[c-q] /dev/sdb1 --force mdadm: Journal is missing or stale, starting array read only. mdadm: /dev/md0 has been started with 15 drives. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-19 13:06:15 +11:00
Song Liu	cc1799c3dd	Enable create array with write journal (--write-journal DEVICE). Specify the write journal device with --write-journal DEVICE ./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. Only one journal device is allowed. If multiple --write-journal are given, mdadm will use the first and ignore others ./mdadm --create -f /dev/md0 --assume-clean -c 32 --raid-devices=4 --level=5 /dev/sd[c-f] --write-journal /dev/sdb1 --write-journal /dev/sdx mdadm: Please specify only one journal device for the array. mdadm: Ignoring --write-journal /dev/sdx... mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-19 13:06:12 +11:00
Song Liu	ed94976d84	Show device as journal in --detail --examine Example output: ./mdadm --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Wed May 13 17:01:12 2015 Raid Level : raid5 Array Size : 11720662464 (11177.69 GiB 12001.96 GB) Used Dev Size : 3906887488 (3725.90 GiB 4000.65 GB) Raid Devices : 4 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed May 13 17:01:12 2015 State : clean Active Devices : 4 Working Devices : 5 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 32K Name : 0 UUID : 8fb9ee05:3831d52f:e5c23825:28cd6881 Events : 0 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 48 1 active sync /dev/sdd 2 8 64 2 active sync /dev/sde 3 8 80 3 active sync /dev/sdf 4 8 17 - journal /dev/sdb1 ./mdadm -E /dev/sdb2 /dev/sdb2: Magic : a92b4efc Version : 1.2 Feature Map : 0x201 Array UUID : 562b2334:35b9bcc1:add50892:1f30c4bd Name : 0 Creation Time : Thu Aug 27 12:55:26 2015 Raid Level : raid5 Raid Devices : 15 Avail Dev Size : 249796608 (119.11 GiB 127.90 GB) Array Size : 54696423936 (52162.57 GiB 56009.14 GB) Used Dev Size : 7813774848 (3725.90 GiB 4000.65 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : active Device UUID : 5015e522:d39ba566:5909cf3c:9c51f2ff Internal Bitmap : 8 sectors from superblock Update Time : Thu Aug 27 13:16:55 2015 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 4e6fd76d - correct Events : 262 Layout : left-symmetric Chunk Size : 256K Device Role : Journal Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing) Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-19 13:06:07 +11:00
Song Liu	fa7574f6d4	add macros for MD_DISK_ROLE_(SPARE/FAULTY) Replace special disk roles (0xffff, 0xfffe) with macros: define MD_DISK_ROLE_SPARE 0xffff define MD_DISK_ROLE_FAULTY 0xfffe Will add macro for journal device in next patch: define MD_DISK_ROLE_JOURNAL 0xfffd Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-10-19 13:05:59 +11:00
NeilBrown	86a406c226	super1: Do not create bad block log for clustered devices. We currently have no synchronization techniques for the bad block log, so disable it for the cluster. Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-09-28 12:27:37 +10:00
Goldwyn Rodrigues	6d9c7c2551	Increment version for clustered bitmaps Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels to assemble a clustered device. In order to maximize compatibility, the major version is set to BITMAP_MAJOR_CLUSTERED only if the bitmap is clustered. Also, added MD_FEATURE_CLUSTERED in order to return error for older kernels which would assemble MD in case bitmap is corrupted. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-09-28 11:47:04 +10:00
Guoqing Jiang	2cf42394f0	md-cluster: use %-64s to print cluster_name Left align is better for cluster with name less than 64. Also make the output of cluster name is aligned with others. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.com>	2015-07-29 17:26:12 +10:00
Guoqing Jiang	4a3d29edce	Reuse calc_bitmap_size to reduce code size We can use the new added calc_bitmap_size func to remove some redundant lines. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-06-17 09:44:38 +10:00
Guoqing Jiang	7e6e839a26	mdadm: change the num of cluster node This extends nodes option for assemble mode, make the num of cluster node could be change by user. Before that, it is necessary to ensure there are enough space for those nodes, calc_bitmap_size is introduced to calculate the bitmap size of each node. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-06-17 09:43:31 +10:00
Guoqing Jiang	0aa2f15b20	mdadm: add the ability to change cluster name To support change the cluster name, the commit do the followings: 1. extend original write_bitmap function for new scenario. 2. add the scenarion to handle the modification of cluster's name in write_bitmap1. 3. let the cluster name also show in examine_super1 and detail_super1 Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-06-17 09:33:39 +10:00
Guoqing Jiang	06bd679317	Skip clustered devices in incremental We want the clustered devices to be started exclusively by a cluster resource-agent. So, avoid starting using the incremental option. This also skips a clustered md from starting during boot in inactive mode. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-06-17 09:33:18 +10:00
Guoqing Jiang	7716570e6d	Set home-cluster while creating an array The home-cluster is stored in the bitmap super block of the array. The device can be assembled on a cluster with the cluster name same as the one recorded in the bitmap. If home-cluster is not specified, this is auto-detected using dlopen corosync cmap library. neilb: allow code to compile when corosync-devel is not installed. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-06-17 09:06:30 +10:00
Guoqing Jiang	529e2aa573	Add nodes option while creating md Specifies the maximum number of nodes in the cluster that may use this device simultaneously. This is equivalent to the number of bitmaps created in the internal superblock (patches to follow). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-06-17 09:04:16 +10:00
Guoqing Jiang	95a05b37e8	Create n bitmaps for clustered mode For a clustered MD, create bitmaps equal to number of nodes so each node has an independent bitmap. Only the first bitmap is has the bits set so that the first node that assembles the device also performs the sync. The bitmaps are aligned to 4k boundaries. On-disk format: 0 4k 8k 12k ------------------------------------------------------------------- \| idle \| md super \| bm super [0] + bits \| \| bm bits[0, contd] \| bm super[1] + bits \| bm bits[1, contd] \| \| bm super[2] + bits \| bm bits [2, contd] \| bm super[3] + bits \| \| bm bits [3, contd] \| \| \| Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-06-17 07:54:03 +10:00
NeilBrown	7a862a020f	Don't break long strings onto multiple lines. It is best to keep strings all together so that they are easier to search for in the source code. If a string is so long that it looks ugly one line, them maybe it should be broken into multiple lines for display too. Only strings which contain a newline can be broken into multiple lines: "It is OK to\n" "break this string\n" Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-12 13:46:53 +11:00
NeilBrown	1ade5cc15a	Consistently print program Name and __func__ in debug messages. make dprintf() print program name and __func__, so that this messaging is consistent. Also remove all __func__ messages from pr_err(). We shouldn't leak that internal data in error message. If we really want function name there, we new pr_XXX might be wanted. Signed-off-by: NeilBrown <neilb@suse.de>	2015-02-12 13:21:17 +11:00
NeilBrown	21dc47172d	super1: remove some debugging printfs in update_super1 These should never have been there. Signed-off-by: NeilBrown <neilb@suse.de>	2014-11-03 12:56:37 +11:00
NeilBrown	6ac17e734b	super1: make sure 'room' includes 'bbl_size' when creating array. Because we then go ahead and subtrace bbl_size from room. Signed-off-by: NeilBrown <neilb@suse.de>	2014-08-21 10:57:55 +10:00
NeilBrown	268cccac2e	super1: don't allow adding a bitmap if there is no space. If the data is too close to the superblock there may be no space for a bitmap. If that happens, fail the adding of the bitmap rather than corrupt data. Reported-by: Lars Wijtemans <rhelbugzilla@lars.wijtemans.nl> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=922944	2014-08-15 15:45:54 +10:00
NeilBrown	e2efe9e7bc	config: new option to suppress adding bad block lists. CREATE bbl=no in mdadm.conf will cause any devices added to an array to not have a bad block list. By default they do for 1.x metadata. This is useful if you are suspicious of the bad-block-list implementation. Reported-by: Ethan Wilson <ethan.wilson@shiftmail.org> Signed-off-by: NeilBrown <neilb@suse.de>	2014-08-07 12:23:45 +10:00
NeilBrown	f4dc5e9b7f	super: make sure to ignore disk state flags that we don't understand. This make it easier to add new flags that some super-types don't understand. Signed-off-by: NeilBrown <neilb@suse.de>	2014-08-07 11:34:50 +10:00
Cristian Rodríguez	04f903b21a	mdadm: Do not reimplment offsetof Proper implementations have offsetof in stddef.h Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-22 14:29:14 +10:00
NeilBrown	4c0ea7b0d9	super1: fix setting of data_offset for 1.0 metadata. commit `23bf42cc79` super1: simplify setting of array size. removed the setting for sb->data_offset for 1.0 metadata for some reason, and messed up the size calculation for 1.0 metadata too. Signed-off-by: NeilBrown <neilb@suse.de>	2013-08-14 17:16:35 +10:00
NeilBrown	23bf42cc79	super1: simplify setting of array size. Currently the extra space to leave before the data in the array is calculated in two separate places, and they can be inconsistent. Instead, do it all in validate_geometry. This records the 'data_offset' chosen which all other devices then use. 'write_init_super' now just uses the value rather than doing all the calculations again. This results in more consistent numbers. Also, load_super sets st->data_offset so that it is used by "--add", so the new device has a data offset matching a pre-existing device. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 17:05:47 +10:00
NeilBrown	641da74591	super1: separate to version of _avail_space1(). _avail_space1() is calls from both avail_space1() and validate_geometry1() and does slightly different things. The partial code sharing doesn't really help. In particularly the responsibility for setting the size of the array is currently confused. So duplicate the code into the two locations - one where 'super' is always NULL (validate_geometry1) and one where it is never NULL (avail_space1), and simplify. No behaviour change - just code re-organisation. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 15:59:03 +10:00
NeilBrown	7ccc4cc4fc	Manage: remove call to validate_geometry. This call to validate_geometry is really rather gratuitous. It is purely about the fact that super0 cannot use more than 4TB. So just make it an explicit test - less confusing that way. With this, validate_geometry is only called from Create, which makes it easier to reason about. Also validate_geometry is now never passed NULL for the 'chunk' parameter, so we can remove those annoying tests for NULL. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-30 13:45:22 +10:00
NeilBrown	2bf62891c1	super0/1: fix typo in error messages. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 12:22:58 +10:00
NeilBrown	419e018284	super1: update data_size when performing "revert-reshape". The "data_size" is with respect to "data_offset". When the kernel changes "data_offset" it modifies "data_size" to match - see md_finish_reshape() in the kernel. So when mdadm switches the data_offset for the new data_offset, it must update data_size correspondingly. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-24 10:21:27 +10:00
NeilBrown	efb3994e48	revert-reshape: only impose reshape_position tests on raid[456] This test is irrelevant for RAID10, so restrict it to those levels in which it is meaningful. Signed-off-by: NeilBrown <neilb@suse.de>	2013-07-02 16:10:27 +10:00
NeilBrown	a2836f12c4	revert-reshape: make sure reshape_position is acceptable. We can only revert a reshape if the reshape_position aligns properly for the old geometry. If it doesn't we just fail for now. Also fix a +/- error with updating raid_disks for super1.c Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-27 16:38:53 +10:00
NeilBrown	0ddc35beed	super1: fix space_{before,after} for RAID0 For RAID0 we need to use 'data_size', no 'size' as later is 0. Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-24 16:24:08 +10:00
NeilBrown	ccec2685ab	Add test for --update=metadata and fix bug it found. We were not setting device size correctly for raid0. Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-19 16:28:05 +10:00
NeilBrown	1011e8344a	Remove lots of unnecessary white space. Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>	2013-06-19 12:31:45 +10:00
NeilBrown	26bf55874d	super1: set RESHAPE_NO_BACKUP based on new_offset. We need to check for a backup iff the data_offset has changed. Testing against level==10 was an effective but short-sighted approach. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-28 16:58:18 +10:00
NeilBrown	199f1a1fad	Assemble: allow --update=revert-reshape This will cause a reshape to start going backwards.	2013-05-28 16:44:23 +10:00
NeilBrown	afa368f49a	Assemble: --update=metadata converts v0.90 to v1.0 This allows the smooth conversion of legacy 0.90 arrays to 1.0 metadata. Old metadata is likely to remain but will be ignored. It can be removed with mdadm --zero-superblock --metadata=0.90 /dev/whatever Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-28 16:44:22 +10:00
NeilBrown	d6e4b44fdb	super1: fix some casts of signed superblock fields. These need to be cast to uint32_t before being cast to 'long', else sign extension doesn't happen on 64bit hosts. And bitmap_offset is le32, not le64 !! Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-28 16:43:03 +10:00
NeilBrown	5e1863d49d	Examine/super1 - report Unused space, before and after. Might be confusing, or might be useful when reshaping. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-22 16:37:19 +10:00
NeilBrown	f79bbf4f69	super1: don't put the bblog at the end of the free space. It seems like a nice location, but it means that we cannot decrease the data_offset during a reshape. So put it just after the bitmap, leaving 32K. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-22 16:00:21 +10:00
NeilBrown	c4b26c643d	Grow: allow metadata to indicate that changing data_offset not supported. If space_after and space_before are zero (the default) then assume that metadata doesn't support changing data_offset. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-22 12:26:19 +10:00
NeilBrown	cc3130a786	super1: improve calculation of space_before/space_after 1/ these must allow for bad-block-list 2/ they must match the kernel, which has a 32k buffer after the superblock. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-21 15:38:49 +10:00
NeilBrown	8772113ab2	Examine/super1: don't report "New Offset" when feature not set. The "new_offset" field may be non-zero, but if the feature flag is not set, it should be ignored. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-21 15:37:20 +10:00
NeilBrown	74db60b00a	Add --dump / --restore functionality. This allows the metadata on a device to be saved and later restored. This can be useful before experimenting on an array that is misbehaving. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-16 15:07:16 +10:00
NeilBrown	0cf8322999	Always test return value of posix_memalign. FORTIFY_SOURCE likes this, and it is good practice. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-13 17:09:55 +10:00
NeilBrown	5a23a06ea4	mdassemble - fix new compile-time problems. Signed-off-by: NeilBrown <neilb@suse.de>	2013-05-13 17:05:16 +10:00
NeilBrown	4dd2df0966	Discard devnum in favour of devnm We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>	2013-02-21 17:05:23 +11:00
NeilBrown	def1133297	make --update=homehost work again Commit `1e2b276535` (Report error in --update string is not recognised) broke homehost updating functionality because it depended on each string comparison being done even after we already found a match. Make it work again by restructuring code. Reported-by: (and original version by) Justin Maggard <jmaggard10@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2013-02-10 15:40:42 +11:00

1 2 3 4 5 ...

267 Commits