MSEO production issues

maheshes · ‎06-06-2009

Dear Techies,

I have a MSEO issue which is causing me great pain.

I have a simple MSEO setup ie master server is security server and 2 pem agents all on Solaris.

Master server is device/sso host for 1 of the pem agents.

Whenever we recycle nbu services i have noticed that MSEO gets unconfigured from the tape devices.

Is this behaviour normal across all production setups??

Is there any way i can migrate the security server to a different media server ? Note - Master will still remain the device host in this case.

Pls let me know your expert views/comments??

Regards

Mahesh

Abesama · ‎06-07-2009

Hi Mahesh,

I try to provide answers, try not to ask questions in this forum - but this time I have to ask few questions first.

You said master server is SS - that's fine.
You said 2 PEM agents - do you mean 2 media servers?
All Solaris - do you mean all 3 servers (master-ss, media-pem1, media-pem2) are Solaris based?

Master is device/sso for one of pem agents - I think you mean master server and the first media server are sharing tape drives, while the second media server has it's own dedicated tape drives - right?

If your answer is all yes-yes-yes, then let me proceed.

I think Solaris should not unconfigure MSEO when NBU services get recycled.
(The fact that the drives have been shared should not matter, I believe)
After NetBackup services have been recycled, if you really see MSEO (cgsb) has been unconfigured, then you will see /dev/rmt/0cbn - is that what you see from tpconfig?

That also means either the PEM daemon crashed or the kernel module for cgsb has been unloaded or both - which again should not happen as part of NetBackup service recycle.
Linux used to re-configure all the devices whenever NBU services get recycled, but I think it does not do that any more with 6.5.x ...

What's the command/method you recycle NBU service?
S77netbackup, goodies/netbackup, bp.kill_all these all work differently, you know.
Do you recycle service on SS first or on PEM first, or on one of them only?

Troubleshooting method I can think of is enabling MSEO auditing, as the manual says - and cross-check the log message lines with matching timestamp with NBU logs, when recycling services.

I think I gave you more questions than answers, but that's all I can think of ...

About "moving" SS to media server - yes, you can use cgconfig and cgadmin to specify another (this time, local host name) host to be used for SS auth, for the local PEM - see page 93 of MSEO guide pdf.

Hope it helps.

Cheers,

A

maheshes · ‎06-08-2009

Hi A, Pls find the below details You said 2 PEM agents - do you mean 2 media servers? 2 PEM agents = 2 Media servers All Solaris - do you mean all 3 servers (master-ss, media-pem1, media-pem2) are Solaris based? Yes Master is device/sso for one of pem agents - I think you mean master server and the first media server are sharing tape drives, while the second media server has it's own dedicated tape drives - right? pem2 has dedicated drives. master(SS) = mstnetbkp (device host for pem1 also) pem1 = netbackup6 pem2 = netbackup8 (device host of a remote site) If your answer is all yes-yes-yes, then let me proceed. I think Solaris should not unconfigure MSEO when NBU services get recycled. (The fact that the drives have been shared should not matter, I believe) After NetBackup services have been recycled, if you really see MSEO (cgsb) has been unconfigured, then you will see /dev/rmt/0cbn - is that what you see from tpconfig? I have total of 20 drives for MSEO config. I have selected few drives as example for you Output from tpconfig after recycle/reboot of master is as below 2 HPUltrium3-SCSI1 hcart3 TLD(5) DRIVE=10 /dev/cgsb/73cbn DOWN 4 IBMULTRIUM-TD34 hcart3 TLD(0) DRIVE=8 /dev/cgsb/64cbn DOWN 5 IBMULTRIUM-TD32 hcart3 TLD(0) DRIVE=6 /dev/cgsb/65cbn DOWN 7 HPUltrium3-SCSI7 hcart3 TLD(0) DRIVE=12 /dev/cgsb/68cbn DOWN 8 HPUltrium3-SCSI8 hcart3 TLD(0) DRIVE=13 /dev/cgsb/69cbn DOWN From /var/adm/messages Jan 22 16:10:53 mstnetbkp ltid[18009]: [ID 255353 daemon.error] Drive index 2 is incorrect, drive path /dev/cgsb/73cbn is incorrect, No such file or directory Jan 22 16:10:53 mstnetbkp ltid[18009]: [ID 259945 daemon.error] Drive index 4 is incorrect, drive path /dev/cgsb/64cbn is incorrect, No such file or directory Jan 22 16:10:53 mstnetbkp ltid[18009]: [ID 262505 daemon.error] Drive index 5 is incorrect, drive path /dev/cgsb/65cbn is incorrect, No such file or directory Jan 22 16:10:53 mstnetbkp ltid[18009]: [ID 268137 daemon.error] Drive index 7 is incorrect, drive path /dev/cgsb/68cbn is incorrect, No such file or directory Jan 22 16:10:53 mstnetbkp ltid[18009]: [ID 270697 daemon.error] Drive index 8 is incorrect, drive path /dev/cgsb/69cbn is incorrect, No such file or directory Jan 22 16:10:53 mstnetbkp ltid[18009]: [ID 268137 daemon.error] Drive index 9 is incorrect, drive path /dev/cgsb/60cbn is incorrect, No such file or directory That also means either the PEM daemon crashed or the kernel module for cgsb has been unloaded or both - which again should not happen as part of NetBackup service recycle. Linux used to re-configure all the devices whenever NBU services get recycled, but I think it does not do that any more with 6.5.x ... For me to believe cgsb daemons have crashed or kernel module has unloaded the below services should not be running on SS/pem servers. But they are running all the time unless box is rebooted. On SS server root@mstnetbkp # ps -ef |grep nbu root 1828 1 0 Jun 03 ? 0:23 /opt/vormetric/cgsb/server/bin/sbnbusd root 1509 1 0 Jun 04 ? 1:00 /opt/vormetric/cgsb/pem/bin/sbnbucd On pem server root@mstnetbkp # rsh netbackup8 ps -ef |grep nbu root 3409 1 0 Jun 04 ? 0:00 /opt/vormetric/cgsb/pem/bin/sbnbucd What's the command/method you recycle NBU service? S77netbackup, goodies/netbackup, bp.kill_all these all work differently, you know. Usually i use goodies/netbackup stop. Once it completes if there are any processes yet to be killed then i use bp.kill_all. All the time the pbx process is still running. Do you recycle service on SS first or on PEM first, or on one of them only? Usually if there are any config changes on master/device host i have to recycle SS first. Troubleshooting method I can think of is enabling MSEO auditing, as the manual says - and cross-check the log message lines with matching timestamp with NBU logs, when recycling services. I think I gave you more questions than answers, but that's all I can think of ... About "moving" SS to media server - yes, you can use cgconfig and cgadmin to specify another (this time, local host name) host to be used for SS auth, for the local PEM - see page 93 of MSEO guide pdf. When i move SS to a media server do i have to configure cgsb devices on that new SS server ?? Regards Mahesh

Satkay_Satish · ‎10-15-2009

Hi Mahesh,

Hash out the entry in crontab for mseo monitor script and this should resolve your problem. The version you are using 1.0 is having this issue.

/opt/vormetric/mseo/agent/bin/mseo_monitor.sh

Regards,
Satkay Satish

Satkay_Satish · ‎11-20-2009

Hi All,

Symantec has a published Tech note for this issue at the below URL.

http://seer.entsupport.symantec.com/docs/327624.htm

-Satkay Satish

VOX

MSEO production issues