01-03-2017 06:41 AM
master Linux, nbu 7.7.3; media Appliance 5230 in 7.7.3; vSpheare 6.0.0;
Hi all, we have this Linux VM that fails backup by some weeks, do you have any troubleshooting suggestion ? tks in advance
Jan 1, 2017 9:29:55 AM - Info bpbrm (pid=78045) salclt4220 is the host to backup data from Jan 1, 2017 9:29:55 AM - Info bpbrm (pid=78045) reading file list for client Jan 1, 2017 9:29:55 AM - Info bpbrm (pid=78045) accelerator enabled Jan 1, 2017 9:29:55 AM - Info bpbrm (pid=78045) There is no complete backup image match with track journal, a regular full backup will be performed. Jan 1, 2017 9:29:55 AM - Info bpbrm (pid=78045) starting bpbkar on client Jan 1, 2017 9:29:56 AM - Info bpbkar (pid=78061) Backup started Jan 1, 2017 9:29:56 AM - Info bpbrm (pid=78045) bptm pid: 78062 Jan 1, 2017 9:29:57 AM - Info bptm (pid=78062) start Jan 1, 2017 9:29:57 AM - Info bptm (pid=78062) using 1048576 data buffer size Jan 1, 2017 9:29:57 AM - Info bptm (pid=78062) using 512 data buffers Jan 1, 2017 9:29:59 AM - Info bpbkar (pid=78061) INF - Backing up vCenter server svvcmot11.syssede.systest.sanpaoloimi.com, ESX host svwmop920.sede.corp.sanpaoloimi.com, BIOS UUID 4228ea9b-ac41-9a9a-c429-1ba02e6386bf, Instance UUID 502804b9-75eb-1926-353e-26d1f4a34ae9, Display Name salclt4220, Hostname salclt4220 Jan 1, 2017 9:30:02 AM - Info bptm (pid=78062) start backup Jan 1, 2017 9:30:22 AM - Info nbjm (pid=9940) starting backup job (jobid=3246268) for client salclt4220, policy CCM_svvcmot11_Settimanale, schedule Full_Weekly Jan 1, 2017 9:30:22 AM - Info nbjm (pid=9940) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=3246268, request id:{BA4002A9-7A0B-4C38-B226-88355A31C313}) Jan 1, 2017 9:30:22 AM - requesting resource ccmnbu04 Jan 1, 2017 9:30:22 AM - requesting resource scapmop0307.NBU_CLIENT.MAXJOBS.salclt4220 Jan 1, 2017 9:30:22 AM - requesting resource scapmop0307.VMware.Datastore.svvcmot11.syssede.systest.sanpaoloimi.com/Moncalieri_CCM/A_024_PURE_BRONZE_WEBLOGIC_BRZ_SIL_CCM Jan 1, 2017 9:30:22 AM - requesting resource scapmop0307.VMware.ESXserver.svwmop920.sede.corp.sanpaoloimi.com Jan 1, 2017 9:30:22 AM - requesting resource scapmop0307.VMware.vCenter.svvcmot11.syssede.systest.sanpaoloimi.com Jan 1, 2017 9:30:22 AM - granted resource scapmop0307.NBU_CLIENT.MAXJOBS.salclt4220 Jan 1, 2017 9:30:22 AM - granted resource scapmop0307.VMware.Datastore.svvcmot11.syssede.systest.sanpaoloimi.com/Moncalieri_CCM/A_024_PURE_BRONZE_WEBLOGIC_BRZ_SIL_CCM Jan 1, 2017 9:30:22 AM - granted resource scapmop0307.VMware.ESXserver.svwmop920.sede.corp.sanpaoloimi.com Jan 1, 2017 9:30:22 AM - granted resource scapmop0307.VMware.vCenter.svvcmot11.syssede.systest.sanpaoloimi.com Jan 1, 2017 9:30:22 AM - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=ccmnbu04_DP;Path=PureDiskVolume;StorageServer=ccmnbu04;MediaServer=ccmnbu04 Jan 1, 2017 9:30:22 AM - granted resource ccmnbu04 Jan 1, 2017 9:30:23 AM - estimated 0 kbytes needed Jan 1, 2017 9:30:23 AM - Info nbjm (pid=9940) started backup (backupid=salclt4220_1483259422) job for client salclt4220, policy CCM_svvcmot11_Settimanale, schedule Full_Weekly on storage unit ccmnbu04 using backup host ccmnbu04 Jan 1, 2017 9:30:33 AM - started process bpbrm (pid=78045) Jan 1, 2017 9:30:36 AM - connecting Jan 1, 2017 9:30:36 AM - connected; connect time: 0:00:00 Jan 1, 2017 9:30:47 AM - begin writing Jan 1, 2017 9:30:55 AM - Error bpbrm (pid=78045) db_FLISTsend failed: file read failed (13) Jan 1, 2017 9:30:57 AM - Error bptm (pid=78062) media manager terminated by parent process Jan 1, 2017 9:31:42 AM - Info ccmnbu04 (pid=78062) StorageServer=PureDisk:ccmnbu04; Report=PDDO Stats (multi-threaded stream used) for (ccmnbu04): scanned: 82947 KB, CR sent: 10031 KB, CR sent over FC: 0 KB, dedup: 87.9%, cache disabled Jan 1, 2017 9:31:46 AM - Info bpbkar (pid=0) done Jan 1, 2017 9:31:46 AM - Info bpbkar (pid=0) done. status: 13: file read failed Jan 1, 2017 9:32:27 AM - end writing; write time: 0:01:40 file read failed (13)
01-03-2017 06:55 AM
1) Have you checked whether the file-systems within the Linux VM are supported for VM style backups?
Can you tell us more about the VM, such as...
2) OS type and version?
3) Any RDMs?
4) File-systems types and versions?
5) File-system "bit-ness" - i.e. 32bit or 64-bit file systems?
6) Any other VMs of same OS and same layout and same file-systems which do work for backups?
01-03-2017 08:32 AM
Hello sdo,
I think that it is all supported, I do more backup of similar VM without any issue, I tried a manual snapshot on vcenter and run properly.
[root@salclt4220 ~]# uname -a Linux salclt4220 2.6.32-642.1.1.el6.x86_64 #1 SMP Fri May 6 14:54:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux [root@salclt4220 ~]# [root@salclt4220 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.8 (Santiago)
01-03-2017 09:51 AM
Ok - if you are 100% absolutely definitely certain that there are no differences in backup policy configuration and no differences in VM technology and guest OS versions and VMtools versions and file system types and versions (bit-ness) between the "failing for backup VM" and other "working for backups VMs"... then, for starters, you/we will probably need to dig in to the bpfis log on the backup proxy/host. Are you ok doing that?
01-03-2017 10:02 AM
01-04-2017 01:34 AM
hello, I just tried without accelerator but the issue is the same, now I gather the logs, and I will update you
Jan 4, 2017 10:07:30 AM - Info bpbrm (pid=182310) salclt4220 is the host to backup data from Jan 4, 2017 10:07:30 AM - Info bpbrm (pid=182310) reading file list for client Jan 4, 2017 10:07:30 AM - Info bpbrm (pid=182310) starting bpbkar on client Jan 4, 2017 10:07:30 AM - Info bpbkar (pid=182322) Backup started Jan 4, 2017 10:07:30 AM - Info bpbrm (pid=182310) bptm pid: 182323 Jan 4, 2017 10:13:41 AM - Info nbjm (pid=9940) starting backup job (jobid=3258060) for client salclt4220, policy Esecuzioni_spot_settimo, schedule full Jan 4, 2017 10:13:42 AM - estimated 0 kbytes needed Jan 4, 2017 10:13:42 AM - Info nbjm (pid=9940) started backup (backupid=salclt4220_1483521221) job for client salclt4220, policy Esecuzioni_spot_settimo, schedule full on storage unit setnbu04 using backup host setnbu04 Jan 4, 2017 10:13:52 AM - started process bpbrm (pid=182310) Jan 4, 2017 10:07:30 AM - Info bptm (pid=182323) start Jan 4, 2017 10:07:31 AM - Info bptm (pid=182323) using 1048576 data buffer size Jan 4, 2017 10:07:31 AM - Info bptm (pid=182323) using 512 data buffers Jan 4, 2017 10:13:54 AM - connecting Jan 4, 2017 10:13:54 AM - connected; connect time: 0:00:00 Jan 4, 2017 10:07:32 AM - Info bpbkar (pid=182322) INF - Backing up vCenter server svvcmot11.syssede.systest.sanpaoloimi.com, ESX host svwmop920.sede.corp.sanpaoloimi.com, BIOS UUID 4228ea9b-ac41-9a9a-c429-1ba02e6386bf, Instance UUID 502804b9-75eb-1926-353e-26d1f4a34ae9, Display Name salclt4220, Hostname salclt4220 Jan 4, 2017 10:07:49 AM - Info bptm (pid=182323) start backup Jan 4, 2017 10:14:21 AM - begin writing Jan 4, 2017 10:16:13 AM - Error bpbrm (pid=182310) db_FLISTsend failed: file read failed (13) Jan 4, 2017 10:16:15 AM - Error bptm (pid=182323) media manager terminated by parent process Jan 4, 2017 10:16:49 AM - Info setnbu04 (pid=182323) StorageServer=PureDisk:setnbu04; Report=PDDO Stats (multi-threaded stream used) for (setnbu04): scanned: 64515 KB, CR sent: 7576 KB, CR sent over FC: 0 KB, dedup: 88.3%, cache disabled Jan 4, 2017 10:16:54 AM - Info bpbkar (pid=0) done Jan 4, 2017 10:16:54 AM - Info bpbkar (pid=0) done. status: 13: file read failed Jan 4, 2017 10:23:19 AM - end writing; write time: 0:08:58 file read failed (13)
01-04-2017 03:02 AM
hello, in attachment you can find "/usr/openv/netbackup/logs/bpbrm" log. I tried to find on internet something for help me without success. Below you can find some extracts with alert upper than <4>
10:35:57.697 [241932.241932] <16> non_mpx_backup_archive_verify_import: db_FLISTsend failed: file read failed (13) 10:36:33.732 [264551.264551] <8> vnet_check_user_certificate: [vnet_vxss_helper.c:3728] vnet_get_user_credential_path failed 35 0x23 10:36:33.890 [264551.264551] <8> do_pbx_service: [vnet_connect.c:2302] via PBX VNETD CONNECT FROM 10.254.255.202.40089 TO 10.254.255.202.1556 fd = 5 10:36:33.891 [264551.264551] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:455] VN_REQUEST_CONNECT_FORWARD_SOCKET 10 0xa 10:36:33.931 [264551.264551] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:480] ipc_string /usr/openv/var/tmp/vnet-64569483522593930667000000761-jty4DA
01-04-2017 04:24 AM
I am honestly battling with level 5 logs...
Veritas Support engineers have tools to sift through these large files...
What I have found was this:
10:35:55.885 [241932.241932] <2> non_mpx_backup_archive_verify_import: from client salclt4220: INF - Files System Name to catalog /opt/oracle
10:35:55.885 [241932.241932] <2> non_mpx_backup_archive_verify_import: start to write catalog
...
10:35:55.885 [241932.241932] <2> ConnectionCache::connectAndCache: Acquiring new connection for host scapmop0307, query type 78
10:35:55.901 [241932.241932] <2> vnet_pbxConnect: pbxConnectEx Succeeded
10:35:55.902 [241932.241932] <2> logconnections: BPDBM CONNECT FROM 10.254.171.33.54938 TO 10.254.168.215.1556 fd = 8
10:35:56.026 [241932.241932] <2> db_end: Need to collect reply
10:35:57.697 [241932.241932] <2> db_end: no DONE from db_getreply(): file read failed
10:35:57.697 [241932.241932] <16> non_mpx_backup_archive_verify_import: db_FLISTsend failed: file read failed (13)
What I see here is that the attempt from the media server to update the catalog on the master server failed.
You need bpdbm log on the master server to see what happened with the update attempt between 10:35:55 and 10:35:57.
Probably good idea to log a Support call with Veritas as well - they have the time and the tools to sift through level 5 logs...
01-05-2017 01:40 AM
hello Marianne,
I searched also in bpdbm (master server) without success, nothing that can help me.
I found something in bpbkar log of the media server, maybe can help us, what do you think about:
3e-26d1f4a34ae9, Display Name salclt4220, Hostname salclt4220 16:53:02.118 [61200.61200] <2> vnet_pbxConnect: pbxConnectEx Succeeded 16:53:02.118 [61200.61200] <2> logconnections: BPRD CONNECT FROM 10.254.171.33.52372 TO 10.254.168.215.1556 fd = 7 16:53:02.743 [61200.61200] <4> stage_one_virtual: INF - Snapshot mapping type: ON 16:53:02.743 [61200.61200] <4> stage_one_virtual: INF - Snapshot BLIB type: OFF 16:53:02.815 [61200.61200] <4> NbuVddkInitializer: Using created vixDiskLib config file: /usr/openv/var/tmp/vixDiskLib_61200_ 1483545182.ini 16:53:03.899 [61200.61200] <4> getSSLThumbprint: SSL Thumbprint for server svvcmot11.syssede.systest.sanpaoloimi.com = E2:F5: 23:6D:97:C5:0F:B0:EE:30:6B:37:58:56:B9:04:7A:1D:08:D0 17:01:04.322 [61200.61200] <4> get_virtual_volumes: INF - Found 14 virtual volume(s) 17:01:05.706 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog / 17:01:05.723 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog / 17:01:31.956 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog / 17:01:31.956 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog / 17:01:31.957 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog / 17:01:33.325 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /boot 17:01:33.325 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /boot 17:01:33.346 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /boot 17:01:33.346 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /boot 17:01:33.346 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /boot 17:01:34.797 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /home 17:01:34.797 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /home 17:01:37.549 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /home 17:01:37.550 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /home 17:01:37.550 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /home 17:01:38.874 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt 17:01:38.877 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt 17:01:46.995 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt 17:01:46.996 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt 17:01:46.996 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt 17:01:48.434 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/jdk 17:01:48.435 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/jdk 17:01:57.032 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/jdk 17:01:57.032 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/jdk 17:01:57.033 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/jdk 17:02:01.183 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/oracle 17:02:01.184 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/oracle 17:02:07.844 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/oracle 17:02:07.844 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/oracle 17:02:07.844 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/oracle 17:02:09.216 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/oracle-mw 17:02:09.217 [61200.61200] <4> catalogName_PS: INF - Files System Name to catalog /opt/oracle-mw 17:02:32.659 [61200.61200] <8> ~NbuVddkInitializer: Unable to remove temporary file /usr/openv/var/tmp/vixDiskLib_61200_14835 45182.ini, errno 0 17:02:32.728 [61200.61200] <8> bpbkar Exit: FTL - cleanup() failed, status 63 17:02:32.728 [61200.61200] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 63: process was killed by a signal 17:02:32.728 [61200.61200] <2> keep_alive_log: KeepAliveThread::stopKeepAlive Waiting for keepalive thread to finish 17:02:32.728 [61200.61200] <4> bpbkar Exit: INF - EXIT STATUS 63: process was killed by a signal 17:02:32.745 [61200.61200] <4> bpbkar Exit: INF - setenv FINISHED=0 setnbu04:/usr/openv/netbackup/logs/bpbkar #
Naturally, I will proceed to open a case in Veritas, thank you very much for your support.
regards
01-05-2017 01:51 AM
I saw the status 63 in bpbrm log as well.
If you Google status 63, you will see that there is an issue with catalog update on the master server.
If there is nothing useful in bpdbm log, then your logging level is probably not high enough.
For bpdbm log, you need to restart NBU when the folder is created as well as each time the logging level is changed.
I suggest that that you log a Support call with Veritas ASAP.
They WILL request a full set of level 5 logs.
01-05-2017 02:04 AM
Have a look at this TN:
http://www.veritas.com/docs/000076426
Touch file may help but I don't see the attempt to add file to catalog taking more than 10 minutes.
Extracts from TN:
You can change the batch size on the Media Server that is used to send metadata to the NetBackup Catalog during backups.
A change to the batch size can help in the following cases:
If backups fail because a query to add files to the catalog takes more than 10 minutes to complete.
In this case, the backup job fails, and the bpbrm
log contains a message that indicates a failed attempt to add files to the catalog. Note that the bpdbm
log does not contain a similar message.
To improve backup performance when the folders to back up contain a large number of small files or subfolders.
To adjust the batch size for sending metadata to the catalog:
Create the following file on the NetBackup Media Server:
For FlashBackup-Windows policies (including NetBackup for VMware and Hyper-V):
/usr/openv/netbackup/FBU_MAX_FILES_PER_ADD
For all other policy types (including FlashBackup for UNIX):
/usr/openv/netbackup/MAX_ENTRIES_PER_ADD
In the file, enter a value for the number of metadata entries to be sent to the catalog in each batch. The allowed values are from 1 to 100,000.
Note: |
In NetBackup versions 7.5.0.0 through 7.5.0.6 and in version 7.6.0.1, the In NetBackup version 7.5.0.7 and in versions 7.6.0.2 and later, the |
01-17-2017 08:33 AM
hi all,
update, we have opened a veritas case and it seems that on this machine has mounted an unsupported filesystem EXT4. We are waiting of OS support confirm, I will update you