BasicDisk going down / up every few minutes
Hey Kids!
Old guy here with an *old* system on its last legs that's causing me a little grief before it's shown the door permanently & me being me I just want to get to the bottom of it before our final farewells.
Master / Media server Solaris 9
NetBackup Version 6.5.6
We only have a few old OS's backed up to Basic Disk storage & every now & again backups fail (inc. Catalog) with "Disk Volume is down(2074)". They retry, more often than not successfully, 10 minutes or so later but every now & again failures are constant until a service restart or system re-boot, or latterly I could only get backups to work with an nbrbutil -resetAll.
The storage is simply an NFS mount from one of our NetApp filers on the master/media that has been successfully used for the last few years, but only recently has started to display 'issues'.
As it currently stands:
# nbdevquery -listdv -stype BasicDisk -U
Disk Pool Name : ERYCSV02_Disk
Disk Type : BasicDisk
Disk Volume Name : Internal_16
Disk Media ID : @aaaah
Total Capacity (GB) : 3072.00
Free Space (GB) : 1624.73
Use% : 47
Status : UP
Flag : OkOnRoot
Flag : ReadOnWrite
Flag : AdminUp
Flag : InternalUp
Num Read Mounts : 0
Num Write Mounts : 1
Cur Read Streams : 0
Cur Write Streams : 0
bperror -disk shows entries such as:
1517281124 1 1536 8 cream 0 0 0 *NULL* nbemm Volume ERYCSV02_Disk:Internal_16 marked down, Storage server cream
1517281872 1 1536 8 cream 0 0 0 *NULL* nbemm Volume cream:ERYCSV02_Disk:Internal_16 marked up
and there are frequent restarts as the following time stamps convey:
1517281124 = Tue Jan 30 02:58:44 2018
1517281872 = Tue Jan 30 03:11:12 2018
1517282547 = Tue Jan 30 03:22:27 2018
1517283367 = Tue Jan 30 03:36:07 2018
1517284040 = Tue Jan 30 03:47:20 2018
1517284383 = Tue Jan 30 03:53:03 2018
1517285714 = Tue Jan 30 04:15:14 2018
1517285758 = Tue Jan 30 04:15:58 2018
1517287107 = Tue Jan 30 04:38:27 2018
1517287868 = Tue Jan 30 04:51:08 2018
1517289141 = Tue Jan 30 05:12:21 2018
1517289208 = Tue Jan 30 05:13:28 2018
1517289816 = Tue Jan 30 05:23:36 2018
1517290568 = Tue Jan 30 05:36:08 2018
1517291245 = Tue Jan 30 05:47:25 2018
1517291519 = Tue Jan 30 05:51:59 2018
1517292136 = Tue Jan 30 06:02:16 2018
1517292755 = Tue Jan 30 06:12:35 2018
1517294084 = Tue Jan 30 06:34:44 2018
1517294358 = Tue Jan 30 06:39:18 2018
1517295021 = Tue Jan 30 06:50:21 2018
1517295083 = Tue Jan 30 06:51:23 2018
1517295753 = Tue Jan 30 07:02:33 2018
1517296025 = Tue Jan 30 07:07:05 2018
There is certainly something 'weird' going on, although not sure for how long as things have been running very smoothly for some years without need for interaction, as additionally I get "error connecting to oprd on cream: network protocol error(39)" when initially connecting to the STU via the Admin Console (remember that?) .... but that's probably another story, as it does connect once the pop-up has been dismissed.
If I've missed any info that you feel may be useful, let me know as I've been out of the loop for sooooo long. But if you guys have any clue as to where to look further it would be much appreciated.
Hey - a old frind has showed up
Take a look in OID 220 - dps - disk polling service.
vxlogview - if you still remember :-D