04-02-2013 10:31 AM
Hi all,
I'm need of an advice if anyone can help:
I have a few environments of SUN Solaris servers connected to SUN Storage in a clustered environment.
The version is Storage Foundation HA 5.1 SP1.
The environment contains three volumes, each one is a mirror between disks from two storage arrays.
It appears that every time I restart the environment or even run hastop and hastart the following happens:
[root@VN-HAN01-PTXASDB-02: /]# vxtask list
TASKID PTID TYPE/STATE PCT PROGRESS
165 PARENT/R 0.00% 1/0(1) VXRECOVER
166 165 ATCOPY/R 00.72% 0/419346432/3010560 PLXATT account_vol account_vol-02 dg_acct
This causes the DG resources in the cluster to not complete the online procedure and evantually fail due to timeout, even though I can manually start the volumes and mount them.
to overcome this I just freeze the SG and manually start the system until the task finishes and then start the resources in the cluster.
My question is what can cause this issue?
Any help would be much appreciated.
Thanks,
Yair
Solved! Go to Solution.
04-16-2013 02:43 AM
Please try to add Volume resource between Mount and DiskGroup resource. This must be added as a child of Mount and a parent of DiskGroup. Also set StartVolumes and StopVolumes attributes of DiskGroup resource to 0.
04-02-2013 11:12 AM
Can you provide you main.cf and "vxprint -th" output
Mike
04-10-2013 03:03 AM
This means the PLEX account_vol-02 of VOLUME account_vol is not sync completely in mirror pair. I think it's related the way you stop the VCS or shutdown if this is exactly the same scene you got every time you run hastart.
Usually it means mirror data not sync between 2 plexes of a mirrored volume when it stop or the DG deport. Considerring your mirror is setup on 2 disk array, maybe the IO performance is bad between them which makes mirror volume cannot sync data timely when the DG been deported / stopped.
You must wait and confirm the task complete for the volume account_vol which could check from "vxprint -ht" and see the volume KSTAT and STAT are ENABLED ACTIVE. And then check if the event repeat on next time.
04-14-2013 10:37 AM
Hi Mike and Stinsong,
Thanks for your replies.
The system was on its way to its target datacenter so I did not have connection to the servers until today.
I performed the same test and had the same result. The output Mike requested is detailed below.
Usually I either run standard shutdown command to reboot the servers or run "hastop -all" and then hastart on each system, nothing different than any other setup I have and I hadn't seen this behaviour elsewhere.
I do take Stinsong's remark regarding I/O between storage arrays under consideration and I will check it.
Below is an excerpt of main.cf with the Service Group containing the DG's with this issue:
group ACCT-SG (
SystemList = { PreProd-PTXDB-01 = 0, PreProd-PTXDB-02 = 1 }
AutoStartList = { PreProd-PTXDB-01, PreProd-PTXDB-02 }
)
Application acctd_app (
User = iprs
StartProgram = "/usr/local/iprs/bin/acctd_runner.sh start"
StopProgram = "/usr/local/iprs/bin/acctd_runner.sh stop"
PidFiles = { "/var/iprs/acctd.pid" }
RestartLimit = 3
)
DiskGroup acct-dg (
DiskGroup = dg_acct
)
IPMultiNIC acct-vip (
Address = "172.19.41.71"
NetMask = "255.255.255.240"
MultiNICResName = IPRS_MNIC_res
IfconfigTwice = 1
)
Mount acct-mnt (
MountPoint = "/export/account"
BlockDevice = "/dev/vx/dsk/dg_acct/account_vol"
FSType = vxfs
FsckOpt = "-y"
)
Proxy acct-nic (
TargetResName = IPRS_MNIC_res
)
acct-mnt requires acct-dg
acct-vip requires acct-nic
acctd_app requires acct-mnt
acctd_app requires acct-vip
.
.
.
.
group IPRS-DB (
SystemList = { PreProd-PTXDB-01 = 0, PreProd-PTXDB-02 = 1 }
AutoStartList = { PreProd-PTXDB-01, PreProd-PTXDB-02 }
)
DiskGroup backup-dg (
Critical = 0
DiskGroup = dg_backup
)
DiskGroup ora-dg (
DiskGroup = dg_data
)
IPMultiNIC ora-vip (
Address = "172.19.41.70"
NetMask = "255.255.255.240"
MultiNICResName = IPRS_MNIC_res
IfconfigTwice = 1
)
Mount backup-mnt (
Critical = 0
MountPoint = "/export/backup"
BlockDevice = "/dev/vx/dsk/dg_backup/backup_vol"
FSType = vxfs
FsckOpt = "-y"
)
Mount ora-mnt (
MountPoint = "/data1"
BlockDevice = "/dev/vx/dsk/dg_data/data_vol"
FSType = vxfs
FsckOpt = "-y"
)
Netlsnr ora-lsnr (
Owner = oracle
Home = "/export/oracle/product/11.2.0/dbhome_1"
TnsAdmin = "/export/oracle/product/11.2.0/dbhome_1/network/admin/"
Listener = LISTENER
)
Oracle ora-iprsdb (
Sid = IPRSDB
Owner = oracle
Home = "/export/oracle/product/11.2.0/dbhome_1"
Pfile = "/export/oracle/product/11.2.0/dbhome_1/dbs/initIPRSDB.ora"
)
Proxy ora-nic (
TargetResName = IPRS_MNIC_res
)
backup-mnt requires backup-dg
ora-iprsdb requires ora-mnt
ora-lsnr requires ora-iprsdb
ora-lsnr requires ora-vip
ora-mnt requires ora-dg
ora-vip requires ora-nic
The following is an output of "vxdg list" "vxtask list" and "vxprint -th" as requested by Mike:
[root@PreProd-PTXDB-01: /]# vxdg list
NAME STATE ID
dg_backup enabled,cds 1356387368.26.PreProd-PTXDB-01
[root@PreProd-PTXDB-01: /]#
[root@PreProd-PTXDB-01: /]# vxtask list
TASKID PTID TYPE/STATE PCT PROGRESS
161 PARENT/R 0.00% 1/0(1) VXRECOVER
162 161 RDWRBACK/R 05.26% 0/733902848/38580224 RESYNC backup_vol dg_backup
[root@PreProd-PTXDB-01: /]# vxprint -th
Disk group: dg_backup
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME RVG KSTATE STATE NVOLUME
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
EX NAME ASSOC VC PERMS MODE STATE
SR NAME KSTATE
dg dg_backup default default 4000 1356387368.26.PreProd-PTXDB-01
dm dg_backup01 st2540-3_1 auto 65536 733904640 -
dm st2540-2_1 st2540-2_1 auto 65536 733904640 -
v backup_vol - ENABLED SYNC 733902848 SELECT - fsgen
pl backup_vol-01 backup_vol ENABLED ACTIVE 733902848 CONCAT - RW
sd st2540-2_1-01 backup_vol-01 st2540-2_1 0 733902848 0 st2540-2_1 ENA
pl backup_vol-02 backup_vol ENABLED ACTIVE 733902848 CONCAT - RW
sd dg_backup01-01 backup_vol-02 dg_backup01 0 733902848 0 st2540-3_1 ENA
I only listed the parts related to dg_backup, I can provide similar information on other DGs if required.
I'll be happy to hear your thoughts.
Thanks,
Yair
04-15-2013 02:52 AM
Yes, yariz. I believe after the sync task complete restart VCS (deport/import DG) will not show sync data again.
04-16-2013 02:43 AM
Please try to add Volume resource between Mount and DiskGroup resource. This must be added as a child of Mount and a parent of DiskGroup. Also set StartVolumes and StopVolumes attributes of DiskGroup resource to 0.