cancel
Showing results for 
Search instead for 
Did you mean: 

Big size backup issue

stucci
Level 6

Hallo guys,

I must backup a big size volume (11 TB) but when the backup is almost finished the job end bad.

Some information about the environment:

Master Server: Linux RHE, NBU 7.6.1.2

Media Server: Windows 2012 standard edition, NBU 7.6.1

On client I have installed NBU Media Server so the client can write directly on storage (HP Store Once). The backup work fine and fast, but I can not complete fine, below the job details:

13-mar-2016 17.49.58 - Info bpbrm (pid=25348) pdwbck-bk860 is the host to backup data from
13-mar-2016 17.49.58 - Info bpbrm (pid=25348) reading file list for client
13-mar-2016 17.49.58 - Info bpbrm (pid=25348) accelerator enabled
13-mar-2016 17.49.58 - Info bpbrm (pid=25348) There is no complete backup image match with track journal, a regular full backup will be performed.
13-mar-2016 17.49.59 - Info bpbrm (pid=25348) starting bpbkar32 on client
13-mar-2016 17.49.59 - Info bpbkar32 (pid=22536) Backup started
13-mar-2016 17.49.59 - Info bpbkar32 (pid=22536) change time comparison:<enabled>
13-mar-2016 17.49.59 - Info bpbkar32 (pid=22536) accelerator enabled backup, archive bit processing:<disabled>
13-mar-2016 17.49.59 - Info bptm (pid=22140) start
13-mar-2016 17.50.00 - Info bpbkar32 (pid=22536) will attempt to use change journal data for <E:\backups\LIQ70>
13-mar-2016 17.50.05 - Info bptm (pid=22140) using 262144 data buffer size
13-mar-2016 17.50.05 - Info bptm (pid=22140) setting receive network buffer to 1049600 bytes
13-mar-2016 17.50.05 - Info bptm (pid=22140) using 30 data buffers
13-mar-2016 17.50.05 - Info bptm (pid=22140) start backup
13-mar-2016 17.50.42 - Info bpbkar32 (pid=22536) not using change journal data for <E:\backups\LIQ70>: no previous track log
13-mar-2016 18.00.00 - Info nbjm (pid=5272) starting backup job (jobid=3996) for client pdwbck-bk860, policy PDWH_db_LIQ70, schedule FULL
13-mar-2016 18.00.00 - Info nbjm (pid=5272) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=3996, request id:{058E9956-E93D-11E5-A80E-BB18F9129F2E})
13-mar-2016 18.00.00 - requesting resource STU_ccmso01-bk860_PDWH
13-mar-2016 18.00.00 - requesting resource nbumaster02.NBU_CLIENT.MAXJOBS.pdwbck-bk860
13-mar-2016 18.00.00 - requesting resource nbumaster02.NBU_POLICY.MAXJOBS.PDWH_db_LIQ70
13-mar-2016 18.00.00 - granted resource  nbumaster02.NBU_CLIENT.MAXJOBS.pdwbck-bk860
13-mar-2016 18.00.00 - granted resource  nbumaster02.NBU_POLICY.MAXJOBS.PDWH_db_LIQ70
13-mar-2016 18.00.00 - granted resource  MediaID=@aaabl;DiskVolume=CCMSO01_PDWH;DiskPool=DP_ccmso01-bk860_PDWH;Path=CCMSO01_PDWH;StorageServer=ccmso01-bk860.sede.corp.sanpaoloimi.com;MediaServer=pdwbck
13-mar-2016 18.00.00 - granted resource  STU_ccmso01-bk860_PDWH
13-mar-2016 18.00.00 - estimated 0 kbytes needed
13-mar-2016 18.00.00 - Info nbjm (pid=5272) started backup (backupid=pdwbck-bk860_1457888400) job for client pdwbck-bk860, policy PDWH_db_LIQ70, schedule FULL on storage unit STU_ccmso01-bk860_PDWH
13-mar-2016 18.00.07 - started process bpbrm (pid=25348)
13-mar-2016 18.00.07 - connecting
13-mar-2016 18.00.07 - connected; connect time: 0:00:00
13-mar-2016 18.00.15 - begin writing
15-mar-2016 1.31.09 - Error bpbrm (pid=25348) from client pdwbck-bk860: ERR - failure reading file: E:\backups\LIQ70\LIQ70_full_2016_03_06_ 1.05\userdata_4_0.bak (WIN32 2: The system cannot find the file specified. )
15-mar-2016 1.31.15 - Error bpbrm (pid=25348) from client pdwbck-bk860: ERR - Snapshot Error while reading file: GLOBALROOT\Device\HarddiskVolumeShadowCopy69\backups\LIQ70\LIQ70_full_2016_03_06_ 1.05\userdata_4_0.bak
15-mar-2016 1.31.19 - Critical bpbrm (pid=25348) from client pdwbck-bk860: FTL - Backup operation aborted!
15-mar-2016 1.31.19 - Info bpbkar32 (pid=22536) accelerator sent 11987746816000 bytes out of 11986283363328 bytes to server, optimization 0.0%
15-mar-2016 1.31.19 - Error bpbrm (pid=25348) bpcd on nbumaster02 exited with status = 48
15-mar-2016 1.31.19 - Info bpbkar32 (pid=22536) bpbkar waited 424308 times for empty buffer, delayed 2215030 times.
15-mar-2016 1.31.26 - Error bpbrm (pid=25348) could not send server status message
15-mar-2016 1.31.26 - Info bpbkar32 (pid=22536) done. status: 156: snapshot error encountered
15-mar-2016 1.31.26 - Info bpbrm (pid=23188) Starting delete snapshot processing
15-mar-2016 1.31.27 - Error bpbrm (pid=23188) from client pdwbck-bk860: Get bpfis state from nbumaster02 failed. status = 25
15-mar-2016 1.31.27 - Info bpfis (pid=13296) Backup started
15-mar-2016 1.31.27 - Critical bpbrm (pid=23188) from client pdwbck-bk860: cannot open C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\bpfis.fim.pdwbck-bk860_1457888400.1.0
15-mar-2016 1.31.27 - Info bpfis (pid=13296) done. status: 4207

15-mar-2016 1.31.27 - Info bpfis (pid=13296) done. status: 4207: Could not fetch snapshot metadata or state files
15-mar-2016 1.41.35 - end writing; write time: 31:41:20
snapshot error encountered  (156)

 I tried to use the multistream for increase the speed up but it does not help me because there is only one path. I have also enabled accelerator but if I cannot to do first full backup :( have you some idea or suggestion ?

 

thank you very much

2 ACCEPTED SOLUTIONS

Accepted Solutions

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
I have experienced the same problem with backup actually completing, but fails to update snapshot status on the master server. Have a look at this post : https://www.veritas.com/community/forums/netbackup-windows-code-156#comment-11132251 Either fix comms with master or change snapshot setting to disable and continue.

View solution in original post

Nicolai
Moderator
Moderator
Partner    VIP   

Check the windows event log - I suspect there be messages from VSS saying the VSS snapshot has been deleted because the maximum shadow storage pool has been exhausted

If that the case adjust the VSS pool size:

https://technet.microsoft.com/en-us/library/cc788050.aspx

A somehat related tech note form Veritas

https://www.veritas.com/support/en_US/article.000067950

View solution in original post

16 REPLIES 16

revarooo
Level 6
Employee

Would need to see the bpfis log as the issue was with a snapshot

 

stucci
Level 6

Hello Revarooo, thks.

I watched bpfis log and I think that you have reason and I think there is some snapshot issue.

I attached the bpfis log, below the last part, when the backup fail:

01:31:28.460 [22288.3756] <2> bprd_connect: vnet_connect_to_service(nbumaster02) failed: 6
01:31:28.460 [22288.3756] <2> bprd_bpfis_state_xfer_to_client_rqst: bprd_connect(nbumaster02) failed, cannot connect on socket (25)
01:31:28.460 [22288.3756] <16> bpfis: ERR - Get bpfis state from nbumaster02 failed. status = 25
01:31:28.460 [22288.3756] <4> bpfis: INF - BACKUP START 22288
01:31:28.460 [22288.3756] <32> bpfis: FTL - cannot open C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\bpfis.fim.pdwbck-bk860_1457989209.1.0
01:31:28.460 [22288.3756] <4> bpfis: Starting keep alive thread.
01:31:28.460 [22288.3756] <2> deleteReference:    reference not found in the list
01:31:28.460 [22288.3756] <4> bpfis: Delete reference: status [-4]
01:31:28.460 [22288.3756] <2> writeSnapshotLockFile:    Writing type[2] refcount[0]
01:31:28.460 [22288.3756] <2> writeSnapshotLockFile:    Writing Flags [0] [0]
01:31:28.460 [22288.3756] <2> writeSnapshotLockFile:   Writing Last addReffernt Time [Thu Jan 01 01:00:00 1970
]
01:31:28.460 [22288.3756] <2> writeSnapshotLockFile:   Writing Last Hostname []
01:31:28.460 [22288.3756] <2> getFileName:  FileName is [C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\pdwbck-bk860_1457989209_copy1.lock]
01:31:28.460 [22288.3756] <2> unlockSnapshot:    file[C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\pdwbck-bk860_1457989209_copy1.lock] : zero reference count 
01:31:28.460 [22288.3756] <4> bpfis: Released snapshot lock: status [0] 
01:31:28.460 [22288.3756] <4> do_post_processing: Deleting snapshot lock files from client for id [pdwbck-bk860_1457989209] and copy_number [1]
01:31:28.460 [22288.3756] <2> getFileName:  FileName is [C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\pdwbck-bk860_1457989209_copy1.lock]
01:31:28.460 [22288.3756] <2> deleteSnapshotLockFile:   Renamed lock file from [C:\Program Files\Veritas\NetBackup\online_util\fi_cntl\pdwbck-bk860_1457989209_copy1.lock] to [\slog.] for deletion
01:31:28.460 [22288.3756] <2> deleteSnapshotLockFile:   File[\slog.] deleted: status[0]
01:31:28.460 [22288.3756] <4> bpfis: INF - EXIT STATUS 4207: Could not fetch snapshot metadata or state files

My backup started at:

Cattura_1.JPG

Today I have relaunched backup at 9.43AM.

Have you some idea or test that I can try ?

I add that on the same client/media I can end fine other smaller backup.

thank you

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
I have experienced the same problem with backup actually completing, but fails to update snapshot status on the master server. Have a look at this post : https://www.veritas.com/community/forums/netbackup-windows-code-156#comment-11132251 Either fix comms with master or change snapshot setting to disable and continue.

Nicolai
Moderator
Moderator
Partner    VIP   

Check the windows event log - I suspect there be messages from VSS saying the VSS snapshot has been deleted because the maximum shadow storage pool has been exhausted

If that the case adjust the VSS pool size:

https://technet.microsoft.com/en-us/library/cc788050.aspx

A somehat related tech note form Veritas

https://www.veritas.com/support/en_US/article.000067950

stucci
Level 6

Hello Marianne,

I think is the same situation, thank you.

And I think also that is better the solution number one, and I checked the connection between client and master, I found that client does'not solve the master server in Short name, but only in FQDN and I added in host file, I hope to resolve.

A curiosity, the solution number 2, "Disable Snapshot & Continue", can worsen backup performance ?

 

 

thank you very much,

 

stucci
Level 6

Hello Nicolai,

I checked and I found this volsnap event:

3/15/2016 1:31:09 AM:   "The shadow copies of volume E: were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied."

but it doesn't speak about MAX limit reached. Have you some idea ? now I will google search about this event.

 

tks !!

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
Option 2 will not affect performance. You can read up about all the snapshot options in NBU Admin Guide I under Host Properties section.

stucci
Level 6

great, I read Admin Guide, in particular:

"Snapshots are a point-in-time view of a source volume. NetBackup uses snapshots

to access busy or active files during a backup job. Without a snapshot provider,

active files are not accessible for backup."

and in my case the files that I must backup, don't need of "windows Open File Backup" so I disabled Snap.

 

Now, the backup in progress I hope that end fine for etcHosts update, and from next backup I will not use more snapshot.

I update you, thank you very much

Nicolai
Moderator
Moderator
Partner    VIP   

shadow copy storage could not grow in time

How full is the e:\ drive, by default shadow copy storage is taken from same drives, however you can re-assign shadow storage to another drives using the vssadmin command, alternative free up space on e:\

 

stucci
Level 6

tks Nicolai,

but now I decided to disable the snapshot, I learn your suggestion and I take it for future

Nicolai
Moderator
Moderator
Partner    VIP   

OK - please close off the thread by marking a post  that helped you to resolve the issue or  use the "request split solution"  if multiple answers deserve the solution marking.

Don't mark your own reply unless you delivered the solution yourself.

Have a great day smiley

~Nicolai

stucci
Level 6

Hello guys,

I want update you.

The full backup started before my changes, I have disabled the snapshot "windows Open File Backup" during backup running, it is ended fine, finally I have a FULL backup in the catalog :)

But naturally, now I have a new problem, I was indecided if to open a new post, but I prefered to continue here ( please tell me if I have wrong tks).

The next backups ended in Status Job 12, below the job detail:

20-mar-2016 20.56.33 - Info bpbrm (pid=18628) pdwbck-bk860 is the host to backup data from
20-mar-2016 20.56.33 - Info bpbrm (pid=18628) reading file list for client
20-mar-2016 20.56.33 - Info bpbrm (pid=18628) accelerator enabled
20-mar-2016 20.56.34 - Info bpbrm (pid=18628) starting bpbkar32 on client
20-mar-2016 20.56.34 - Info bpbkar32 (pid=20512) Backup started
20-mar-2016 20.56.39 - Info bpbkar32 (pid=20512) change time comparison:<enabled>
20-mar-2016 20.56.39 - Info bpbkar32 (pid=20512) accelerator enabled backup, archive bit processing:<disabled>
20-mar-2016 20.56.39 - Info bptm (pid=3068) start
20-mar-2016 20.56.42 - Info bpbkar32 (pid=20512) will attempt to use change journal data for <E:\backups\LIQ70>
20-mar-2016 20.56.43 - Info bptm (pid=3068) using 262144 data buffer size
20-mar-2016 20.56.43 - Info bptm (pid=3068) setting receive network buffer to 1049600 bytes
20-mar-2016 20.56.43 - Info bptm (pid=3068) using 512 data buffers
20-mar-2016 20.56.49 - Info bpbkar32 (pid=20512) accelerator sent 0 bytes out of 0 bytes to server, optimization 0.0%
20-mar-2016 20.56.49 - Info bpbkar32 (pid=20512) bpbkar waited 0 times for empty buffer, delayed 0 times.
20-mar-2016 20.57.44 - Error bptm (pid=3068) The "Global\NetBackup Media Manager SHM Info Path pdwbck-bk860_1458493205" shared memory file already exists.
20-mar-2016 20.57.44 - Info bptm (pid=3068) EXITING with status 12 <----------
20-mar-2016 20.57.47 - Error bpbrm (pid=18628) could not send server status message
20-mar-2016 20.57.47 - Info bpbkar32 (pid=20512) done. status: 12: file open failed
20-mar-2016 21.06.41 - Info nbjm (pid=15837) starting backup job (jobid=9495) for client pdwbck-bk860, policy PDWH_db_LIQ70, schedule FULL
20-mar-2016 21.06.41 - Info nbjm (pid=15837) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=9495, request id:{42C9FF32-EED7-11E5-A734-5ABDB30269FF})
20-mar-2016 21.06.41 - requesting resource STU_ccmso01-bk860_PDWH
20-mar-2016 21.06.41 - requesting resource nbumaster02.NBU_CLIENT.MAXJOBS.pdwbck-bk860
20-mar-2016 21.06.41 - requesting resource nbumaster02.NBU_POLICY.MAXJOBS.PDWH_db_LIQ70
20-mar-2016 21.06.41 - granted resource  nbumaster02.NBU_CLIENT.MAXJOBS.pdwbck-bk860
20-mar-2016 21.06.41 - granted resource  nbumaster02.NBU_POLICY.MAXJOBS.PDWH_db_LIQ70
20-mar-2016 21.06.41 - granted resource  MediaID=@aaabl;DiskVolume=CCMSO01_PDWH;DiskPool=DP_ccmso01-bk860_PDWH;Path=CCMSO01_PDWH;StorageServer=ccmso01-bk860.sede.corp.sanpaoloimi.com;MediaServer=pdwbck
20-mar-2016 21.06.41 - granted resource  STU_ccmso01-bk860_PDWH
20-mar-2016 21.06.41 - estimated 0 kbytes needed
20-mar-2016 21.06.41 - Info nbjm (pid=15837) resumed backup (backupid=pdwbck-bk860_1458493205) job for client pdwbck-bk860, policy PDWH_db_LIQ70, schedule FULL on storage unit STU_ccmso01-bk860_PDWH
20-mar-2016 21.06.46 - started process bpbrm (pid=18628)
20-mar-2016 21.06.46 - connecting
20-mar-2016 21.06.46 - connected; connect time: 0:00:00
20-mar-2016 21.08.00 - end writing
20-mar-2016 23.57.47 - Info bpbrm (pid=25004) Starting delete snapshot processing
20-mar-2016 23.57.52 - Error bpbrm (pid=25004) from client pdwbck-bk860: Get bpfis state from nbumaster02 failed. status = 25
20-mar-2016 23.57.52 - Info bpfis (pid=24508) Backup started
20-mar-2016 23.57.52 - Error bpbrm (pid=25004) from client pdwbck-bk860: Delete bpfis state from nbumaster02 failed. status = 71
20-mar-2016 23.57.53 - Info bpfis (pid=24508) done. status: 0
20-mar-2016 23.57.53 - Info bpfis (pid=24508) done. status: 0: the requested operation was successfully completed
file open failed  (12)

I read some post about this job error but I cannot understand, have you some idea ?

thank you very much, regards

Nicolai
Moderator
Moderator
Partner    VIP   

Please open a new thread - you will get more attention in a new thread instead of a old one where you already stated the problem is solved ...

I have the power to move a post to a new thread, but opening text will be odd since text is written with this thread in mind.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
As I've said before , Client needs to update snapshot status on the master server. In this case, client was unable to do so. If backup is done with default WOFB option enabled, client must be able to communicate with the master. Nothing we can do to change this. Either fix comms or disable snapshot. If backup started with snapshot option enabled, then disabling it while backup is running will not help.

stucci
Level 6

Tks Nicolai,

I will open a new post

regards

stucci
Level 6

Hello Marianne,

sorry but I think that you have misunderstand me.

Now I have disabled the snapshot, but the backups are finished with error status job 12:

pippo.JPG

anyway I will open a new post for this, thank you!!