Managing hardware RAID
Objective
On a server with a hardware RAID configuration, the RAID array is managed by a physical component called a RAID controller.
Requirements
- a dedicated server with a hardware RAID configuration
- administrative (sudo) access to the server via SSH
It is not advisable to reconfigure your RAID controller using MegaCli and lsiutil if you're unfamiliar with these tools, as you could risk losing your data. Please make a backup before making any changes.
Instructions
Using the MegaRaid RAID controller
Step 1: Retrieve RAID information
Prior to verifying your RAID state, verify that you have a MegaRaid controller:
This confirms the server has a MegaRaid controller installed.
To gather and list available RAID arrays, you can use the MegaCli command:
We can see two virtual drives which are composed of two physical hard drives each, so a total of four physical disks. In this case, the RAID status is "Optimal", which means the RAID is functioning correctly.
If the RAID status is "Degraded", we recommend that you verify the hard drive's state as well.
Step 2: Determine the disk's state
First, you must list the device Id for each drive in order to fully test them with smartmontools:
With smartmontools' smartctl command, we will test each hard drive like this:
In this example, /dev/sda is the first RAID, and /dev/sdb is the second.
In some situations, you may receive this output:
console
/dev/sda [megaraid_disk_00] [SAT]: Device open changed type from 'megaraid' to 'sat'
You must then replace megaraid with sat+megaraid:
bash
smartctl -d sat+megaraid,N -a /dev/sdX
If one of your hard drives is showing SMART errors, you should perform a full backup of your data as soon as possible and contact our support team. Our support team will need the slot number and device ID in order to identify the faulty disk.
Step 3: Verify the health of the RAID controller
To make sure, your RAID controller is working correctly, you can list all information with
The most important section of the output is the error counter:
If the counted errors are more than zero, you should create a backup of your data and contact the support with the full output. Then, the support will schedule an intervention for the replacement of the RAID controller.
For a succinct output of only the error counters, the command can be expanded by a grep:
Step 4: Resynchronising the RAID
If you had one or more hard drives replaced, the RAID will re-synchronise automatically. You can use this command to see which hard drives are currently rebuilding:
To monitor the progress of the rebuild operation, you can use this command:
The command will retrieve the enclosure ID and slot ID, as shown above.
Step 5a: Using CacheCade
CacheCade is a module from LSI used to improve random read performance of hard drives using an SSD as front caching device.
To verify the CacheCade's configuration, use the following command:
To see which RAID array is associated with the CacheCade:
Step 5b: Checking the status of the backup battery unit
to receive a full list of status parameters for the BBU, use this command:
the most important value to check is if Battery State is Optimal. If there are indicators of a failing battery, create a backup of your data and provide the outpout of this command to the support, when creating the Ticket.
Using the LSI RAID controller
This RAID controller card is deprecated and no longer available for new servers. It is gradually replaced by MegaRaid controllers.
Step 1: Retrieve RAID information
Prior to verifying the RAID state, ensure that an LSI RAID controller card is installed with the following command:
This confirms the presence of an LSI RAID controller.
The grep -v megaraid command removes the MegaRaid RAID controller card from the lspci output, as MegaRaid cards are made by LSI Corporation as well.
To gather and list available RAID arrays, you can use the lsiutil command:
Caution, the values (1,0 21) may differ depending on the version. Be very careful when handling this type of control.
In the example above, we can see one virtual drive, which is composed of two physical hard drives. In this case, the RAID status is "Optimal", which means the RAID is functioning correctly.
If the RAID status is "Degraded", we recommend that you verify the hard drive's state as well.
In the case of a newly provisioned server, you may see this message: [In Progress: data scrub]. This message is not an error. Rather, it's an automated process generated by the controller's firmware in order to lower uncorrectable errors as much as possible.
Step 2: Determine the disk's state
To take a look at the hard drive's state from the RAID controller, you can use this command:
In this case both drives show as "Optimal".
Since the LSI card uses sg-map, we must test the /dev/sgX (X being the device number, like /dev/sg1, for example) corresponding to the hard drives in order to test them with smartmontools.
Here's how to list them:
Each line represents an sg device, which is mapped according to the order of the device shown here:
In order to list the right devices within one command, use the following:
With smartmontools' smartctl command, we will test each hard drive, as shown below:
The sg device number is shown in the above command.
If one of your hard drives is showing SMART errors, you should perform a full backup of your data as soon as possible and contact our support team.
Step 3: Resynchronise the RAID
If you had one or more hard drives replaced, the RAID will re-synchronise automatically. To see if the RAID is in re-sync and monitor the resync progression, use this command:
Caution, the values (3,0 21) may differ depending on the version. Be very careful when handling this type of control.
The percentage value shown in the command result is NOT the completion percentage. It is the remaining percentage.
3Ware RAID controller
This RAID controller card is deprecated. We highly recommend that you contact OVHcloud Support teams to schedule an intervention to replace the RAID controller with a MegaRaid controller, as 3ware RAID controllers are proven to be rather unstable. This type of intervention requires a reinstallation of your server. Be sure to backup your data first.
Go further
Configuring MegaRAID for RAID Level 0
Join our community of users.