Drives disappearing from fabric
HI All,
We are also facing some drive issues in our environment.Mainly in weekend full backups are running which are huge databse backups on the day all drives going down in master server along with media servers. When i check the var/adm/messages noticed invalid drives also found some drives are disappeaaring and reappearing from fabric.
Please find the output of the commands
bash-3.00$ cat /var/adm/messages |grep -i disappear
Sep 10 04:12:41 dm4cfi fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=180cc, PWWN=500507630f50f601 disappeared from fabric
Sep 10 16:51:18 dm4cfi fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=180cc, PWWN=500507630f50f601 disappeared from fabric
cat /var/adm/messages |grep -i reappeared
Sep 10 04:12:51 dm4cfi fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=180cc,
PWWN=500507630f50f601 reappeared in fabric
Sep 10 16:51:28 dm4cfi fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=180cc, PWWN=500507630f50f601 reappeared in fabric
------------------------------------------------------------------------------------------------------------------------------
cat /var/adm/messages |grep -i drive output has been attached below.
Can you any help me on this much appreciated.
Master server:solaris 5.10
Netbackup version 7.0
Tape library:IBMTS3580
Please let me know incase of more information required.
Thanks Marianne for moving this to a new post.
Ashok,
You have a SAN issue, not a netbackup issue. You should speak to your san team to troubleshhot.
Could well be a faulty HBA or switch.
You have many errors ...
1
Sep 10 04:12:41 dm4cfi fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=180cc, PWWN=500507630f50f601 disappeared from fabric
2
Sep 10 03:03:41 dm4cfi tldd[24118]: [ID 316020 daemon.notice] TLD(0) unload==TRUE, but no unload, drive 11 (device 19) Sep 10 03:04:00 dm4cfi tldd[24118]: [ID 921149 daemon.notice] TLD(0) [24118] waited 8 seconds for ready, drive 11 Sep 10 04:12:54 dm4cfi tldcd[19981]: [ID 718619 daemon.error] TLD(0) drive does not exist in robot, drive = 16 Sep 10 04:12:54 dm4cfi tldcd[19990]: [ID 587315 daemon.error] TLD(0) drive does not exist in robot, drive = 8
NBU will never work properly if there are issues at the os level. 1. (above) are the os level errors and therefore these MUST be fixed before the errors shown in 2.
In fact, when 1. is fixed, I suspect 2. will disappear.
So, the correct way to look at this is to completly forget about NBU beacuse you will ALWAYS have errors in NBU until you fix the os level stuff.
The errors you see in 1 and cause becasue the drives are becoming disconnected from the SAN.
The cause can be anything on the SAN for example
HBA fault
Firmware issue
Switch fault
GBIC fault
You will need to speak with your SAN team, and investigate with them. There is a high possibility you will have to swap parts to see if they are faulty, for example, inserting a new HBA into the server.
Martin