标签:
>FOR FREEDOM!<
{A} Introduction
Here‘s a short description of what is supported in the Linux RAID drivers. RAID is not a guarantee for data integrity, it just allows you to keep your data if a disk dies.
The current RAID drivers in Linux support the following levels:
{B} Swapping on RAID
Swapping on a mirrored RAID can help you survive a failing disk. If a disk fails, then data for swapped processes would be inaccessable in a non-mirrored environment. If you run in a mirrored environment, then the system can go on running even if a disk fails in service.
There‘s not much reason to use RAID0 for swap performance reasons. The kernel itself can stripe swapping on several devices, if you just give them the same priority in the /etc/fstab file.
A nice /etc/fstab could look like:
/dev/sda2 none swap defaults,pri=4 0 0
/dev/sdb2 none swap defaults,pri=4 0 0
/dev/sdc2 none swap defaults,pri=4 0 0
/dev/sdd2 none swap defaults,pri=4 0 0
/dev/sde2 none swap defaults,pri=4 0 0
/dev/sdf2 none swap defaults,pri=4 0 0
/dev/sdg2 none swap defaults,pri=4 0 0
This setup lets the machine swap in parallel on seven SAS devices. No need for RAID0, since this has been a kernel feature for a long time.
A different reason to use RAID for swap is high availability. If you set up a system to boot on eg. a RAID-1 device, the system should be able to survive a disk crash. If a system without mirrored swapping has been swapping on the now faulty device, you will most likely be going down. Swapping on a mirrored RAID partition such as RAID-1, raid10,n2 or raid10,f2 type would solve this problem.
{C} Spare disks
Spare disks (often called hot spares) are disks that do not take part in the RAID set until one of the active disks fail. When a device failure is detected, that device is marked as "faulty" and reconstruction is immediately started on the first spare disk available.
once reconstruction to a hot-spare begins, the RAID layer will start reading from all the other disks to re-create the redundant information. If multiple disks have built up bad blocks over time, the reconstruction itself can actually trigger a failure on one of the "good" disks. This can lead to a complete RAID failure and is the major reason for using RAID-6 in preference to RAID-5 and a hot spare.
{D} Faulty disks
When the RAID layer handles device failures just fine, crashed disks are marked as faulty, and reconstruction is immediately started on the first spare-disk available. If no spare is available then the array runs in ‘degraded‘ mode.
Faulty disks still appear and behave as members of the array. The RAID layer just avoids reading/writing them.
If a device needs to be removed from an array for any reason (eg pro-active replacement due to SMART reports) then it must be marked as faulty before it can be removed.
{E} RAID setup
Install the package "mdadm", and "modprobe raid456"、“modprobe raid10” etc.Then you will see:
[root@6 ~]# cat /proc/mdstat Personalities : [raid10] [raid6] [raid5] [raid4] unused devices: <none>
mdadm has 7 major modes of operation. Normal operation just uses the ‘Create‘, ‘Assemble‘ and ‘Monitor‘ commands - the rest is typically used for fixing or changing your array.
Create the Partition Table (GPT)
It is highly recommended to pre-partition the disks to be used in the array.
Note: It is also possible to create a RAID directly on the raw disks (without partitions), but not recommended because it can cause problems when swapping a failed disk.
parted -a optimal /dev/vdX -mklabel gpt
parted -a optimal /dev/vdX mkpart 1M xM #x = total_Mb - 100M parted -a optimal /dev/vdX set 1 raid ... parted -a optimal /dev/vdZ -mklabel gpt
parted -a optimal /dev/vdZ mkpart 1M xM #x is the previous x, do not recalculate! parted -a optimal /dev/vdZ set 1 raid
Create RAID device
Raid0 mdadm --create --auto=yes,p /dev/mdX --level=0 --raid-devices=26 /dev/vd{a..z}1
#If --auto is not given on the command line or in the config file, then the default will be --auto=yes
#"mdp", "part" or "p" causes a partitionable array (2.6 and later) to be used
Raid1
mdadm --create /dev/mdX --level=1 --raid-devices=2 /dev/vd{a,b}1 --spare-devices=2 /dev/vd{c,d}1
Raid6
mdadm --create /dev/mdX --level=6 --raid-devices=4 /dev/vd{a..d}1 --spare-devices=1 /dev/vde1
Raid10 #Raid10 with “--layout=f2" algorithm perform best in reading data
mdadm --create --verbose /dev/mdX --metadata=1.2 --chunk=256 --level=10 --raid-devices=6 --layout=f2 /dev/vd{a..f}1 --spare-devices=2 /dev/vd{g,h}1
Remember to this for possiable assembling in the future:
# echo ‘DEVICE partitions‘ > /etc/mdadm.conf # mdadm --detail --scan >> /etc/mdadm.conf
This results in something like the following:
root # cat /etc/mdadm.conf
DEVICE partitions ARRAY /dev/md/0 metadata=1.2 name=pine:0 UUID=27664f0d:111e493d:4d810213:9f291abe
Create partitions on array (or use LVM upon it,will discussing in the {H} chapter)
Same as normal disk-partitions: use parted OR gdisk And format them: mke2fs -t ext4 -b 4096 /dev/md0_pX ...
Removing devices from an array
mdadm --fail /dev/md0 /dev/sdxx
mdadm -r /dev/md0 /dev/sdxx
mdadm --zero-superblock /dev/sdxx OR dd if=/dev/null of=/dev/sdxx bs=1M count=10
Warning: Reusing the removed disk without zeroing the superblock WILL CAUSE LOSS OF ALL DATA on the next boot. (After mdadm will try to use it as the part of the raid array).
mdadm --stop /dev/md0
Adding a New Device to an Array for repair or spare purpose(Not mean growing numbers of array!)
Adding new devices with mdadm can be done on a running system with the devices mounted. Partition the new device using the same layout as others in the same array.
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1
OR
mdadm --assemble UUID=27664f0d:111e493d:4d810213:9f291abe #Need "mdadm.conf" which must be prepared in adance
mdadm --add /dev/md0 /dev/sdc1
Syncing can take a while. If the machine is not needed for other tasks the speed limit can be increased.
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda3[2] sdb3[1] 155042219 blocks super 1.2 [2/1] [_U] [>....................] recovery = 0.0% (77696/155042219) finish=265.8min speed=9712K/sec unused devices: <none>
Check the current speed limit.
# cat /proc/sys/dev/raid/speed_limit_min 1000 # cat /proc/sys/dev/raid/speed_limit_max 200000
Increase the limits.
# echo 400000 >/proc/sys/dev/raid/speed_limit_min # echo 400000 >/proc/sys/dev/raid/speed_limit_max
Then check out the syncing speed and estimated finish time
# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda3[2] sdb3[1] 155042219 blocks super 1.2 [2/1] [_U] [>....................] recovery = 1.3% (2136640/155042219) finish=158.2min speed=16102K/sec unused devices: <none>
{F} Further reading
Calculating the Stride and Stripe-width
The array will have an entry in
# /sys/devices/virtual/block/mdX/queue/optimal_io_size
(where mdX is the name of your array). It will give the stripe-width in bytes. Divide by the block size to get the stripe width in blocks, then divide by number of data disks to get the stride. The following calculations should match this.
Stride = (chunk size/block size)
what is a reasonable chunk size?
Next, calculate:
Stripe-width = (# of physical data disks * stride)
Example: RAID10,far2[formatting to ext4 with the correct stripe-width and stride]
# cat /sys/devices/virtual/block/md0/queue/optimal_io_size # 1048576
Hypothetical RAID10 array is composed of 2 physical disks. Because of the properties of RAID10 in far2 layout, both count as data disks. Chunk size is 512k. Block size is 4k. So the stripe-width should match 1048576 / 4096 = 256, and the stride should match 256 / 2 = 128. Stride = (chunk size/block size). In this example, the math is (512/4) so the stride = 128. Stripe-width = (# of physical data disks * stride). In this example, the math is (2*128) so the stripe-width = 256. # mkfs.ext4 -v -L myarray -m 0.01 -b 4096 -E stride=128,stripe-width=256 /dev/md0
{G} How to replace the broken disks?
Remove all usage of the failed disk
- mdadm --manage /dev/mdX --remove /dev/sdX
- umount /dev/sdX*
(FIRST) Remove the data cable of the failed disk
(SECOND) Remove the power cable of the failed disk
- Force system to re-scan
- echo "- - -" > /sys/class/scsi_host/hostX/scan # For all "X"
- tail -f /var/log/syslog OR journalctl -kf # is a good idea
Replace the failed disk
(FIRST) Connect the power cable of the new disk (and wait some seconds)
(SECOND) Connect the data cable of the new disk
- Force system to re-scan
- echo "- - -" > /sys/class/scsi_host/hostX/scan # For all "X"
- tail -f /var/log/syslog OR journalctl -kf # is a good idea
{H} Linux LVM
REFERENCE
- https://wiki.archlinux.org/index.php/RAID
- https://raid.wiki.kernel.org/index.php/Linux_Raid
- https://wiki.gentoo.org/wiki/LVM
- https://wiki.archlinux.org/index.php/LVM
Linux基本功杂记——[013]——『mdadm Raid & LVM』
标签:
原文地址:http://www.cnblogs.com/hadex/p/5814473.html