Please enter search query.
Search <product_name> all support & community content...
Article: 100019230
Last Published: 2019-04-03
Ratings: 6 0
Product(s): InfoScale & Storage Foundation
Problem
This article discusses how to replace a failed disk that is under Volume Manager control.
Solution
There are several scenarios for replacing a failed disk.
- Disk failed, but came back online after some time.
- Disk failed, corrupt, needs to be replaced, and spare Disk is already available in the configuration.
- Disk failed, corrupt, needs to be replaced, and new Disk needs to be added to the configuration.
Note: To replace a disk, a disk must be available that is not already in a disk group.
Disk failed, but came back online after some time
Let's assume we have the following configuration.
Two Disks (c1t13d0, c1t14d0) in a Disk group (testdg), one mirrored Volume (testvol01):
c1t13d0s2 auto:cdsdisk disk01 testdg online
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:none - - online invalid
dg testdg default default 24000 1223026544.30.jerome
dm disk01 c1t13d0s2 auto 65536 35774960 -
dm disk02 c1t14d0s2 auto 65536 35774960 -
v testvol01 - ENABLED ACTIVE 2097152 SELECT - fsgen
pl testvol01-01 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk01-01 testvol01-01 disk01 0 2097152 0 c1t13d0 ENA
pl testvol01-02 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk02-02 testvol01-02 disk02 2097152 2097152 0 c1t14d0 ENA
Now, c1t13d0 went offline caused by a SAN issue.
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:none - - online invalid
- - disk01 testdg failed was:c1t13d0s2
dg testdg default default 24000 1223026544.30.jerome
dm disk01 - - - - NODEVICE
dm disk02 c1t14d0s2 auto 65536 35774960 -
v testvol01 - ENABLED ACTIVE 2097152 SELECT - fsgen
pl testvol01-01 testvol01 DISABLED NODEVICE 2097152 CONCAT - RW
sd disk01-01 testvol01-01 disk01 0 2097152 0 - RLOC
pl testvol01-02 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk02-02 testvol01-02 disk02 2097152 2097152 0 c1t14d0 ENA
You can see that c1t13d0 is now showing as failed, and the plex is disabled. After some time, the SAN is back online, and the disk is available again on the System.
c1t13d0s2 auto:cdsdisk - (testdg) online
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:none - - online invalid
- - disk01 testdg failed was:c1t13d0s2
You can see it is now back online, but still showing as failed. All we need to do is run vxreattach to check if we can reattach the disk.
# vxreattach -c c1t13d0s2
This will reattach the disk to its old disk media name inside its old disk group, and run a vxrecover in the background if needed.
# vxreattach -rb c1t13d0s2
Use vxtask to check the status
# vxtask list
TASKID PTID TYPE/STATE PCT PROGRESS
206 PARENT/R 0.00% 1/0(1) VXRECOVER disk01 testdg
207 207 ATCOPY/R 04.10% 0/2097152/86016 PLXATT testvol01 testvol01-01 testdg
We should be back to normal after the vxrecover is finished.
c1t13d0s2 auto:cdsdisk disk01 testdg online
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:none - - online invalid
Disk failed, corrupt, needs to be replaced, and new disk needs to be added to the configuration.
We are taking the same configuration as before, but this time the disk is corrupt and needs to be replaced.
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:none - - online invalid
- - disk01 testdg failed was:c1t13d0s2
As we already have a disk available (c1t15d0), we can use that for replacement.
There are two ways to achieve this.
- Using vxdiskadm, which will help you get the disk replaced by running all the needed commands in the background.
- Or you can run the commands by yourself. We will show you both.
First, the vxdiskadm way:
Run vxdiskadm and select option 5.
# vxdiskadm
Volume Manager Support Operations
Menu: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Get the newly connected/zoned disks in VxVM view
22 Change/Display the default disk layouts
23 Mark a disk as allocator-reserved for a disk group
24 Turn off the allocator-reserved flag on a disk
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: 5
On the next page we select the disk to be replaced, and the disk which we are going to use for the replacement
Replace a failed or removed disk
Menu: VolumeManager/Disk/ReplaceDisk
Use this menu operation to specify a replacement disk for a disk
that you removed with the "Remove a disk for replacement" menu
operation, or that failed during use. You will be prompted for
a disk name to replace and a disk device to use as a replacement.
You can choose an uninitialized disk, in which case the disk will
be initialized, or you can choose a disk that you have already
initialized using the Add or initialize a disk menu operation.
Select a removed or failed disk [<disk>,list,q,?] list
Disk group: testdg
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
dm disk01 - - - - NODEVICE
Select a removed or failed disk [<disk>,list,q,?] disk01
The following devices are available as possible replacements after being
initialized (or reinitiliazed):
c0t0d0 c1t15d0
You can choose one of these devices to replace disk01.
Choose "none" to abort the replacement of disk01.
Choose a device, or select "none"
[<device>,none,q,?] (default: c0t0d0) c1t15d0
VxVM INFO V-5-2-378
The requested operation is to initialize disk device c1t15d0 and
to then use that device to replace the removed or failed disk
disk01 in disk group testdg.
Continue with operation? [y,n,q,?] (default: y) y
Use FMR for plex resync? [y,n,q,?] (default: n) n
VxVM INFO V-5-2-282
Replacement of disk disk01 in group testdg with disk device
c1t15d0 completed successfully.
Replace another disk? [y,n,q,?] (default: n) n
This initializes the disk. Add it to the disk group with the old disk media name, and run a vxrecover on the volume.
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:cdsdisk disk01 testdg online
dg testdg default default 24000 1223026544.30.jerome
dm disk01 c1t15d0s2 auto 65536 35774960 -
dm disk02 c1t14d0s2 auto 65536 35774960 -
v testvol01 - ENABLED ACTIVE 2097152 SELECT - fsgen
pl testvol01-01 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk01-01 testvol01-01 disk01 0 2097152 0 c1t15d0 ENA
pl testvol01-02 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk02-02 testvol01-02 disk02 2097152 2097152 0 c1t14d0 ENA
The only thing left here is to remove the disk to get it physically replaced. You can do this with option 3 in vxdiskadm. We will show you how to add a new disk in the next scenario.
Now, the same can be done manually on the command line.
If we go back to our problem:
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:none - - online invalid
- - disk01 testdg failed was:c1t13d0s2
Here are the steps needed to replace the Disk on the command line.
We will initialize the Disk, add it to the Disk group with the same Media Name, and then run a recovery in the background
# vxdisksetup -i c1t15d0 format=cdsdisk
# vxdg -g testdg -k adddisk disk01=c1t15d0s2
# vxrecover -b testvol01
After the recovery is done
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:cdsdisk disk01 testdg online
dg testdg default default 24000 1223026544.30.jerome
dm disk01 c1t15d0s2 auto 65536 35774960 -
dm disk02 c1t14d0s2 auto 65536 35774960 -
v testvol01 - ENABLED ACTIVE 2097152 SELECT - fsgen
pl testvol01-01 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk01-01 testvol01-01 disk01 0 2097152 0 c1t15d0 ENA
pl testvol01-02 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk02-02 testvol01-02 disk02 2097152 2097152 0 c1t14d0 ENA
We need to remove the corrupt Disk and physically replace it. We will show you how to add a new Disk in the next scenario.
# vxdisk rm c1t13d0s2
Disk failed, corrupt, needs to be replaced, and new Disk needs to be added to the configuration.
In the last two scenarios we either replaced the failed Disk with the same Disk, or one which was already added to the configuration. Now, we will show you how to replace a failed Disk by physically removing the failed one, and get it replaced by a new Disk.
Here is our configuration:
c1t13d0s2 auto:cdsdisk disk01 testdg online
c1t14d0s2 auto:cdsdisk disk02 testdg online
dg testdg default default 24000 1223026544.30.jerome
dm disk01 c1t13d0s2 auto 65536 35774960 -
dm disk02 c1t14d0s2 auto 65536 35774960 -
v testvol01 - ENABLED ACTIVE 2097152 SELECT - fsgen
pl testvol01-01 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk01-01 testvol01-01 disk01 0 2097152 0 c1t13d0 ENA
pl testvol01-02 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk02-02 testvol01-02 disk02 2097152 2097152 0 c1t14d0 ENA
Now, again, c1t13d0 fails.
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
- - disk01 testdg failed was:c1t13d0s2
We first need to remove the failed disk for replacement.
This is option 4 in vxdiskadm
Volume Manager Support Operations
Menu: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Get the newly connected/zoned disks in VxVM view
22 Change/Display the default disk layouts
23 Mark a disk as allocator-reserved for a disk group
24 Turn off the allocator-reserved flag on a disk
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: 4
Then we select the failed disk to be removed.
Remove a disk for replacement
Menu: VolumeManager/Disk/RemoveForReplace
Use this menu operation to remove a physical disk from a disk
group, while retaining the disk name. This changes the state
for the disk name to a "removed" disk. If there are any
initialized disks that are not part of a disk group, you will be
given the option of using one of these disks as a replacement.
Enter disk name [<disk>,list,q,?] list
Disk group: testdg
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
dm disk01 - - - - NODEVICE
dm disk02 c1t14d0s2 auto 65536 35774960 -
Enter disk name [<disk>,list,q,?] disk01
VxVM NOTICE V-5-2-371
The following volumes will lose mirrors as a result of this
operation:
testvol01
No data on these volumes will be lost.
VxVM NOTICE V-5-2-381
The requested operation is to remove disk disk01 from disk group
testdg. The disk name will be kept, along with any volumes using
the disk, allowing replacement of the disk.
Select "Replace a failed or removed disk" from the main menu
when you wish to replace the disk.
Continue with operation? [y,n,q,?] (default: y) y
VxVM INFO V-5-2-265 Removal of disk disk01 completed successfully.
Remove another disk? [y,n,q,?] (default: n) n
After this we should see the following output:
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
- - disk01 testdg removed was:c1t13d0s2
Now you can remove the disk and replace it with another one. Once you have replaced it, we need to let Volume Manager know that there is a new disk.
Run the following:
# vxdiskconfig
VxVM INFO V-5-2-1401 This command may take a few minutes to complete execution
Executing Solaris command: devfsadm (part 1 of 2) at 10:35:05 BST
Executing VxVM command: vxdctl enable (part 2 of 2) at 10:35:18 BST
Command completed at 10:35:21 BST
It will use devfsadm to check the OS for new devices, and after that it will run vxdctl enable to add them to Volume Manager.
Now we should see a new disk.
c1t13d0s2 auto - - error
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:none - - online invalid
- - disk01 testdg removed was:c1t13d0s2
Run vxdiskadm again, and select option 5 to replace the removed disk
Volume Manager Support Operations
Menu: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Get the newly connected/zoned disks in VxVM view
22 Change/Display the default disk layouts
23 Mark a disk as allocator-reserved for a disk group
24 Turn off the allocator-reserved flag on a disk
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: 5
Now select the removed disk, and the new one to be used as a replacement
Replace a failed or removed disk
Menu: VolumeManager/Disk/ReplaceDisk
Use this menu operation to specify a replacement disk for a disk
that you removed with the "Remove a disk for replacement" menu
operation, or that failed during use. You will be prompted for
a disk name to replace and a disk device to use as a replacement.
You can choose an uninitialized disk, in which case the disk will
be initialized, or you can choose a disk that you have already
initialized using the Add or initialize a disk menu operation.
Select a removed or failed disk [<disk>,list,q,?] list
Disk group: testdg
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
dm disk01 - - - - REMOVED
Select a removed or failed disk [<disk>,list,q,?] disk01
The following devices are available as possible replacements after being
initialized (or reinitiliazed):
c0t0d0 c1t15d0
You can choose one of these devices to replace disk01.
Choose "none" to abort the replacement of disk01.
Choose a device, or select "none"
[<device>,none,q,?] (default: c0t0d0) c1t15d0
VxVM INFO V-5-2-378
The requested operation is to initialize disk device c1t15d0 and
to then use that device to replace the removed or failed disk
disk01 in disk group testdg.
Continue with operation? [y,n,q,?] (default: y) y
Use FMR for plex resync? [y,n,q,?] (default: n) n
VxVM INFO V-5-2-282
Replacement of disk disk01 in group testdg with disk device
c1t15d0 completed successfully.
Replace another disk? [y,n,q,?] (default: n) n
It will automatically run a vxrecover in the background. Once this is done, we can remove the old disk entry.
# vxdisk rm c1t13d0s2
And then we should be back to normal
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:cdsdisk disk01 testdg online
dg testdg default default 24000 1223026544.30.jerome
dm disk01 c1t15d0s2 auto 65536 35774960 -
dm disk02 c1t14d0s2 auto 65536 35774960 -
v testvol01 - ENABLED ACTIVE 2097152 SELECT - fsgen
pl testvol01-01 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk01-01 testvol01-01 disk01 0 2097152 0 c1t15d0 ENA
pl testvol01-02 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk02-02 testvol01-02 disk02 2097152 2097152 0 c1t14d0 ENA
You can do the same from the command line. Instead of using option 4 in vxdiskadm, you use this command to remove the disk:
# vxdg -g testdg -k rmdisk disk01
Then run this to get the OS to scan for the new disk, add it to Volume Manger, and initialize it for use.
# vxdiskconfig
# vxdisksetup -i c1t15d0 format=cdsdisk
Then, run the following to add the new disk with the old Disk Media name, recover the mirror and remove the old disk entry from Volume Manager.
Once the recover is done, we should see the following:
# vxdg -g testdg -k adddisk disk01=c1t15d0s2
# vxrecover -b testvol01
# vxdisk rm c1t13d0s2
c1t14d0s2 auto:cdsdisk disk02 testdg online
c1t15d0s2 auto:cdsdisk disk01 testdg online
dg testdg default default 24000 1223026544.30.jerome
dm disk01 c1t15d0s2 auto 65536 35774960 -
dm disk02 c1t14d0s2 auto 65536 35774960 -
v testvol01 - ENABLED ACTIVE 2097152 SELECT - fsgen
pl testvol01-01 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk01-01 testvol01-01 disk01 0 2097152 0 c1t15d0 ENA
pl testvol01-02 testvol01 ENABLED ACTIVE 2097152 CONCAT - RW
sd disk02-02 testvol01-02 disk02 2097152 2097152 0 c1t14d0 ENA