Important Update: Cohesity Products Knowledge Base Articles
All Cohesity Knowledge Base Articles are now managed via the Cohesity Support Portal: https://support.cohesity.com/s/searchunify. The Knowledge Base articles available here will not reflect the latest information or may no longer be accessible.
Description
The following article attempts to explain the steps required to replace a faulty boot disk (boot device) on Solaris.
Figure 1.0
In the above example, disk media (dm) name "rootdg01" has failed and needs to be replaced.
Configuration details
# modinfo | grep vx
35 1347360 37f28 308 1 vxdmp (VxVM 5.0-2006-05-11a: DMP Drive)
37 7c002000 337840 309 1 vxio (VxVM 5.0-2006-05-11a I/O driver)
39 137b4e0 d48 310 1 vxspec (VxVM 5.0-2006-05-11a control/st)
188 7b7ff338 c30 311 1 vxportal (VxFS 5.0_REV-5.0A55_sol portal )
189 7ae00000 1ba6d0 21 1 vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
# uname -a
SunOS dopey 5.10 Generic_138888-01 sun4v sparc SUNW,T5140
# cat /etc/release
Solaris 10 10/08 s10s_u6wos_07b SPARC
Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 27 October 2008
Veritas Volume Manager (VxVM) disk content
# vxdisk -eo alldgs list
DEVICE TYPE DISK GROUP STATUS OS_NATIVE_NAME
c1t0d0s2 auto - - error c1t0d0s2 <<<<<<<<<<< disk to replace
c1t1d0s2 auto - - online c1t1d0s2
c1t2d0s2 auto rootdg02 rootdg online c1t2d0s2
c1t3d0s2 auto rootdg03 rootdg online c1t3d0s2
- - rootdg01 rootdg failed was:c1t0d0s2
Disk Healthcheck
Boot disk c1t0d0s2 has failed, unable to label access the disk VTOC
# prtvtoc /dev/rdsk/c1t0d0s2
prtvtoc: /dev/rdsk/c1t0d0s2: Unable to read Disk geometry errno = 0x5
Current boot device
The system is booted from c1t2d0 in this instance, as shown by the Solaris prtconf command
# prtconf -vp | grep boot
bootarchive: '/ramdisk-root'
bootfs: fe942968
bootargs: 00
bootpath: '/pci@400/pci@0/pci@8/scsi@0/disk@2,0:a' <<<<<< The Solaris server is currently booted from c1t2d0s2 ( aka rootdg02 )
reboot-command:
auto-boot-on-error?: 'false'
auto-boot?: 'false'
network-boot-arguments:
boot-command: 'boot'
boot-file:
boot-device: '/pci@400/pci@0/pci@8/scsi@0/disk@0,0:a disk net'
multipath-boot?: 'false'
boot-device-index: '0'
error-reset-recovery: 'boot'
NEW to VxVM 6.0
With VxVM 6.0 onwards, it will be possible to display the bootpath (disk the server is actually booted from) using the VxVM vxeeprom CLI command:
Sample output
# vxeeprom bootpath
/pci@1c,600000/scsi@2/disk@1,0:a
This saves the need to run O/S specific commands (SOLARIS SPARC only) such as prtconf.
Diskgroup configuration prior to disk replacement
# vxprint -qhtg rootdg
dg rootdg default default 72000 1232444437.8.dopey
dm rootdg01 - - - - NODEVICE <<<<< disks needs to be replaced
dm rootdg02 c1t2d0s2 auto 101759 286596864 -
dm rootdg03 c1t3d0s2 auto 81151 286596864 SPARE
v rootdg017vol - ENABLED ACTIVE 1444992 ROUND - gen
pl rootdg017vol-01 rootdg017vol ENABLED ACTIVE 1444992 CONCAT - RW
sd rootdg03-03 rootdg017vol-01 rootdg03 285151872 1444992 0 c1t3d0 ENA
pl rootdg017vol-02 rootdg017vol ENABLED ACTIVE 1444992 CONCAT - RW
sd rootdg02-03 rootdg017vol-02 rootdg02 285151872 1444992 0 c1t2d0 ENA
v rootvol - ENABLED ACTIVE 251693184 ROUND - root
pl rootvol-02 rootvol ENABLED ACTIVE 251693184 CONCAT - RW
sd rootdg02-02 rootvol-02 rootdg02 33458688 251693184 0 c1t2d0 ENA
pl rootvol-03 rootvol ENABLED ACTIVE 251693184 CONCAT - RW
sd rootdg03-01 rootvol-03 rootdg03 0 251693184 0 c1t3d0 ENA
v swapvol - ENABLED ACTIVE 33458688 ROUND - swap
pl swapvol-02 swapvol ENABLED ACTIVE 33458688 CONCAT - RW
sd rootdg02-01 swapvol-02 rootdg02 0 33458688 0 c1t2d0 ENA
pl swapvol-03 swapvol ENABLED ACTIVE 33458688 CONCAT - RW
sd rootdg03-02 swapvol-03 rootdg03 251693184 33458688 0 c1t3d0 ENA
Steps
1.] As the disk is reported as failed was, the Veritas Disk Access (DA) name can be removed from VxVM's view.
Close the Veritas Disk Access (DA) name to be replaced, ie c1t0d0s2 as in this instance.
# vxdisk rm c1t0d0s2
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t1d0s2 auto:none - - online invalid
c1t2d0s2 auto:sliced rootdg02 rootdg online
c1t3d0s2 auto:sliced rootdg03 rootdg online spare
- - rootdg01 rootdg failed was:c1t0d0s2
2.] View the O/S device handles prior to removing the faulty disk.
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown <<<<<< access path to be removed
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
usb2/1 unknown empty unconfigured ok
usb2/2 usb-storage connected configured ok
usb2/3 unknown empty unconfigured ok
usb2/4 usb-hub connected configured ok
usb2/4.1 unknown empty unconfigured ok
usb2/4.2 unknown empty unconfigured ok
usb2/4.3 unknown empty unconfigured ok
usb2/4.4 unknown empty unconfigured ok
usb2/5 unknown empty unconfigured ok
3.] Disable all the paths relating to the faulty boot disk. In this instance, there is a single path to c1t0d0s2.
# vxdmpadm -f disable path=c1t0d0s2
4.] Unconfigure the O/S device handles.
In this instance, the cfgadm interface can be used to unconfigure the internal boot device instance.
# cfgadm -c unconfigure c1::dsk/c1t0d0
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected unconfigured unknown <<<<<<<<<<<< unconfigured
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
usb2/1 unknown empty unconfigured ok
usb2/2 usb-storage connected configured ok
usb2/3 unknown empty unconfigured ok
usb2/4 usb-hub connected configured ok
usb2/4.1 unknown empty unconfigured ok
usb2/4.2 unknown empty unconfigured ok
usb2/4.3 unknown empty unconfigured ok
usb2/4.4 unknown empty unconfigured ok
usb2/5 unknown empty unconfigured ok
5.] Clean-up the stale O/S device handles.
# devfsadm -Cvc disk
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s0
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s1
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s2
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s3
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s4
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s5
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s6
devfsadm[27465]: verbose: removing file: /dev/dsk/c1t0d0s7
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s0
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s1
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s2
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s3
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s4
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s5
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s6
devfsadm[27465]: verbose: removing file: /dev/rdsk/c1t0d0s7
6.] Remove the faulty disk.
Faulty disk removed
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
usb2/1 unknown empty unconfigured ok
usb2/2 usb-storage connected configured ok
usb2/3 unknown empty unconfigured ok
usb2/4 usb-hub connected configured ok
usb2/4.1 unknown empty unconfigured ok
usb2/4.2 unknown empty unconfigured ok
usb2/4.3 unknown empty unconfigured ok
usb2/4.4 unknown empty unconfigured ok
usb2/5 unknown empty unconfigured ok
7.] Refresh VxVM details refreshed before NEW disk is inserted.
# vxdctl enable
Note: Make sure the Veritas Disk Access (da) name (c1t0d0s2) relating to the faulty boot device is no longer listed by "vxdisk list", prior to inserting the replacement disk.
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t1d0s2 auto:none - - online invalid
c1t2d0s2 auto:sliced rootdg02 rootdg online
c1t3d0s2 auto:sliced rootdg03 rootdg online spare
- - rootdg01 rootdg failed was:c1t0d0s2
8.] Insert the replacement disk
NEW disk inserted
# tail -f /var/adm/messages
<snippet>
Sep 2 21:16:53 dopey genunix: [ID 408114 kern.info] /pci@400/pci@0/pci@8/scsi@0/sd@0,0 (sd0) offline
Sep 2 21:27:36 dopey SC Alert: [ID 394168 daemon.notice] IPMI | minor: ID = 3 : 09/02/2011 : 11:46:55 : Entity Presence : /HDD0/PRSNT : Device Absent
Sep 2 21:27:48 dopey genunix: [ID 408114 kern.info] /pci@400/pci@0/pci@8/scsi@0/sd@0,0 (sd0) offline
Sep 2 21:28:29 dopey SC Alert: [ID 404314 daemon.notice] IPMI | minor: ID = 4 : 09/02/2011 : 11:48:00 : Entity Presence : /HDD0/PRSNT : Device Present
Sep 2 21:28:46 dopey scsi: [ID 193665 kern.info] sd0 at mpt0: target 0 lun 0
Sep 2 21:28:46 dopey genunix: [ID 936769 kern.info] sd0 is /pci@400/pci@0/pci@8/scsi@0/sd@0,0
Sep 2 21:28:46 dopey genunix: [ID 408114 kern.info] /pci@400/pci@0/pci@8/scsi@0/sd@0,0 (sd0) online
Sep 2 21:28:49 dopey SC Alert: [ID 624537 daemon.error] Chassis | major: Hot insertion of HDD0
Sep 2 21:28:51 dopey vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 308/0x18
Sep 2 21:28:51 dopey vxdmp: [ID 736771 kern.notice] NOTICE: VxVM vxdmp V-5-0-148 enabled path 32/0x0 belonging to the dmpnode 308/0x18
<snippet>
9.] View the revised O/S device handle content, following the disk replacement.
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown <<<<< New disk seen by leadville (cfgadm) stack
c1::dsk/c1t1d0 disk connected configured unknown
c1::dsk/c1t2d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb1/1 unknown empty unconfigured ok
usb1/2 unknown empty unconfigured ok
usb2/1 unknown empty unconfigured ok
usb2/2 usb-storage connected configured ok
usb2/3 unknown empty unconfigured ok
usb2/4 usb-hub connected configured ok
usb2/4.1 unknown empty unconfigured ok
usb2/4.2 unknown empty unconfigured ok
usb2/4.3 unknown empty unconfigured ok
usb2/4.4 unknown empty unconfigured ok
usb2/5 unknown empty unconfigured ok
9a.] In some cases, you may need to run the cfgadm command manually to pick up the newly presented disk.
# cfgadm -c configure c1::dsk/c1t0d0
10.] Create the OS device handles for the replacement disk.
# devfsadm
# echo | format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> <<<<< new disk seen by format
/pci@400/pci@0/pci@8/scsi@0/sd@0,0
1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@400/pci@0/pci@8/scsi@0/sd@1,0
2. c1t2d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@400/pci@0/pci@8/scsi@0/sd@2,0
3. c1t3d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@400/pci@0/pci@8/scsi@0/sd@3,0
Specify disk (enter its number): Specify disk (enter its number):
11.] Label the new (replacement) disk (i.e. c1t0d0) using the Solaris format utility.
Label the new disk using format
# format c1t0d0
selecting c1t0d0
[disk formatted]
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format> p
PARTITION MENU:
0 - change `0' partition
1 - change `1' partition
2 - change `2' partition
3 - change `3' partition
4 - change `4' partition
5 - change `5' partition
6 - change `6' partition
7 - change `7' partition
select - select a predefined table
modify - modify a predefined partition table
name - name the current table
print - display the current table
label - write partition map and label to the disk
!<cmd> - execute <cmd>, then return
quit
partition> p
Current partition table (original):
Total disk cylinders available: 14087 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 2060 20.00GB (2061/0/0) 41945472
1 swap wu 2061 - 3091 10.01GB (1031/0/0) 20982912
2 backup wm 0 - 14086 136.71GB (14087/0/0) 286698624
3 usr wm 3092 - 5152 20.00GB (2061/0/0) 41945472
4 var wm 5153 - 7213 20.00GB (2061/0/0) 41945472
5 unassigned wm 7214 - 9274 20.00GB (2061/0/0) 41945472
6 home wm 9275 - 14086 46.70GB (4812/0/0) 97933824
7 unassigned wm 0 0 (0/0/0) 0
partition> l
Ready to label disk, continue? yes
partition> q
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format> q
12.]
Refresh VxVM with the new disk content
# vxdisk scandisks
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 auto:none - - online invalid <<<<<<<<< New disk seen by VxVM
c1t1d0s2 auto:none - - online invalid
c1t2d0s2 auto:sliced rootdg02 rootdg online
c1t3d0s2 auto:sliced rootdg03 rootdg online spare
- - rootdg01 rootdg failed was:c1t0d0s2
13.] Prepare the new disk for VxVM use.
# /etc/vx/bin/vxdisksetup -i c1t0d0 format=sliced noreserve
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 auto:sliced - - online <<<<<<<<< New disk ready for use
c1t1d0s2 auto:none - - online invalid
c1t2d0s2 auto:sliced rootdg02 rootdg online
c1t3d0s2 auto:sliced rootdg03 rootdg online spare
- - rootdg01 rootdg failed was:c1t0d0s2
# vxdg -g rootdg -k adddisk rootdg01=c1t0d0s2
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 auto:sliced rootdg01 rootdg online <<<<<<<<< New disk assigned to rootdg diskgroup
c1t1d0s2 auto:none - - online invalid
c1t2d0s2 auto:sliced rootdg02 rootdg online
c1t3d0s2 auto:sliced rootdg03 rootdg online spare
Figure 2.0
14.] Change the spare flag status if applicable.
# vxprint -qhtg rootdg
dg rootdg default default 72000 1232444437.8.dopey
dm rootdg01 c1t0d0s2 auto 81407 286617216 - <<<<<< Make the New disk, the spare disk
dm rootdg02 c1t2d0s2 auto 101759 286596864 -
dm rootdg03 c1t3d0s2 auto 81151 286596864 SPARE
v rootdg017vol - ENABLED ACTIVE 1444992 ROUND - gen
pl rootdg017vol-01 rootdg017vol ENABLED ACTIVE 1444992 CONCAT - RW
sd rootdg03-03 rootdg017vol-01 rootdg03 285151872 1444992 0 c1t3d0 ENA
pl rootdg017vol-02 rootdg017vol ENABLED ACTIVE 1444992 CONCAT - RW
sd rootdg02-03 rootdg017vol-02 rootdg02 285151872 1444992 0 c1t2d0 ENA
v rootvol - ENABLED ACTIVE 251693184 ROUND - root
pl rootvol-02 rootvol ENABLED ACTIVE 251693184 CONCAT - RW
sd rootdg02-02 rootvol-02 rootdg02 33458688 251693184 0 c1t2d0 ENA
pl rootvol-03 rootvol ENABLED ACTIVE 251693184 CONCAT - RW
sd rootdg03-01 rootvol-03 rootdg03 0 251693184 0 c1t3d0 ENA
v swapvol - ENABLED ACTIVE 33458688 ROUND - swap
pl swapvol-02 swapvol ENABLED ACTIVE 33458688 CONCAT - RW
sd rootdg02-01 swapvol-02 rootdg02 0 33458688 0 c1t2d0 ENA
pl swapvol-03 swapvol ENABLED ACTIVE 33458688 CONCAT - RW
sd rootdg03-02 swapvol-03 rootdg03 251693184 33458688 0 c1t3d0 ENA
Toggle spare flag from rootdg03 to rootdg01
# vxedit -g rootdg set spare=off rootdg03
# vxedit -g rootdg set spare=on rootdg01
# vxprint -qhtg rootdg
dg rootdg default default 72000 1232444437.8.dopey
dm rootdg01 c1t0d0s2 auto 81407 286617216 SPARE <<<<< SPARE flag set against the newly replaced disk
dm rootdg02 c1t2d0s2 auto 101759 286596864 -
dm rootdg03 c1t3d0s2 auto 81151 286596864 -
v rootdg017vol - ENABLED ACTIVE 1444992 ROUND - gen
pl rootdg017vol-01 rootdg017vol ENABLED ACTIVE 1444992 CONCAT - RW
sd rootdg03-03 rootdg017vol-01 rootdg03 285151872 1444992 0 c1t3d0 ENA
pl rootdg017vol-02 rootdg017vol ENABLED ACTIVE 1444992 CONCAT - RW
sd rootdg02-03 rootdg017vol-02 rootdg02 285151872 1444992 0 c1t2d0 ENA
v rootvol - ENABLED ACTIVE 251693184 ROUND - root
pl rootvol-02 rootvol ENABLED ACTIVE 251693184 CONCAT - RW
sd rootdg02-02 rootvol-02 rootdg02 33458688 251693184 0 c1t2d0 ENA
pl rootvol-03 rootvol ENABLED ACTIVE 251693184 CONCAT - RW
sd rootdg03-01 rootvol-03 rootdg03 0 251693184 0 c1t3d0 ENA
v swapvol - ENABLED ACTIVE 33458688 ROUND - swap
pl swapvol-02 swapvol ENABLED ACTIVE 33458688 CONCAT - RW
sd rootdg02-01 swapvol-02 rootdg02 0 33458688 0 c1t2d0 ENA
pl swapvol-03 swapvol ENABLED ACTIVE 33458688 CONCAT - RW
sd rootdg03-02 swapvol-03 rootdg03 251693184 33458688 0 c1t3d0 ENA
Process complete.