cancel
Showing results for 
Search instead for 
Did you mean: 

NetBackup backup fails with "checkpoint version mismatch"

Thomas_Anthony
Level 5

Hello Forum,

We had a recent backup failure. The error "checkpoint version mismatch" was in the Job Detailed Status and searches haven't returned anything relevant. Any suggestions on what this error is and/or which logs might provide more info ? Thanks !

 

09/01/2014 1:23:45 - Info bpbrm (pid=30219) starting bpbkar on client

09/01/2014 1:23:45 - Info bpbkar (pid=17664) Backup started

09/01/2014 1:23:45 - Info bpbrm (pid=30219) bptm pid: 13025

09/01/2014 1:23:45 - Error bpbrm (pid=30219) from client ServerName: ERR - checkpoint version mismatch

09/01/2014 1:23:45 - Info bptm (pid=30252) start

09/01/2014 1:23:45 - Info bptm (pid=30252) using 1048576 data buffer size

09/01/2014 1:23:45 - Info bptm (pid=30252) using 128 data buffers

09/01/2014 1:23:45 - connected; connect time: 0:00:00

09/01/2014 1:23:47 - Info bptm (pid=30252) resuming backup-id ServerName_1407047411 copy 1 at fragment 48, 512 blocks: 3415050240

09/01/2014 1:23:47 - Info bptm (pid=30252) start backup

09/01/2014 1:23:47 - Info bptm (pid=30252) backup child process is pid 30260

09/01/2014 1:23:47 - Info bptm (pid=30252) waited for full buffer 0 times, delayed 0 times

09/01/2014 1:23:47 - Info bptm (pid=30252) EXITING with status 90 <----------

09/01/2014 1:23:47 - Info bpbkar (pid=17664) done. status: 6: the backup failed to back up the requested files

09/01/2014 1:23:47 - begin writing

09/01/2014 1:23:47 - end writing; write time: 0:00:00

the backup failed to back up the requested files (6)

(NOTE: the dates and names were changed)

1 ACCEPTED SOLUTION

Accepted Solutions

Andrew21
Level 3
Employee
Hi, Thomas, The issue is caused by NetBackup client version is inconsistent between the suspended job and resumed job. Typically, it's caused by user updates NBU client after backup job is suspended. For example: o) Install NBU 7.5 on client machine, and start a backup job. o) After the check-point is generated, the information is transfer to NBU server. You can check the info under: C:\Program Files\Veritas\NetBackup\db\jobs\restart\job_ID.ckpts. And, the info is like: -------------------------- CPR - 696832 2856 0 0 38 0 0 0 1 356777984 0 1 512 0 1 750000 0 0 0 25 /C/data/0/000/000/033.10M -------------------------- '750000' is the NBU version on client. o) Suspend the backup job and update NBU client to 7.6. o) Resume the backup job, it errors out with 'checkpoint version mismatch'. So, if you need to update NBU client, please wait until job is finished. Otherwise, it may causes job failure. Hope it help!

View solution in original post

11 REPLIES 11

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

Actually when did you get this error? It seems like backup that once had started at Aug 3 2014 06:30(in GMT) was resuming but something wrong in checkpoint information. Any clue in debug logs?

I suppose next regular backup will run successfully. If it have already run,please tell us if it eneded s successfuly or not.

Marianne
Level 6
Partner    VIP    Accredited Certified

NOTE: the dates and names were changed

Changing names is not a problem, but changing dates makes it difficult to assist or even guess what caused the problem.

You show a date in the future in Job details  (1 September) while the resumed job shows a timestamp of  1407047411 - 3 August. 
How long after the initial failure (supposedly on 3 Aug) was the job resumed?

 

Thomas_Anthony
Level 5

oops - too much editing - here's the job details with only the names modified

=========

08/03/2014 16:32:23 - Info nbjm (pid=7144) starting backup job (jobid=1165740) for client ClientName1, policy Backup_Policy1, schedule Full

08/03/2014 16:32:23 - Info nbjm (pid=7144) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1165740, request id:{A86578B4-1B55-11E4-BE15-1AF9A0246D25})

08/03/2014 16:32:23 - requesting resource Storage_Unit1

08/03/2014 16:32:23 - requesting resource Master_Server1.NBU_CLIENT.MAXJOBS.ClientName1

08/03/2014 16:32:23 - requesting resource Master_Server1.NBU_POLICY.MAXJOBS.Backup_Policy1

08/03/2014 16:32:24 - granted resource Master_Server1.NBU_CLIENT.MAXJOBS.ClientName1

08/03/2014 16:32:24 - granted resource Master_Server1.NBU_POLICY.MAXJOBS.Backup_Policy1

08/03/2014 16:32:24 - granted resource MediaID=@aaaaa;DiskVolume=Disk_Pool1;DiskPool=Disk_Pool1;Path=Disk_Pool1;StorageServer=Storage_Server1;MediaServer=MEDIA_SERVER1

08/03/2014 16:32:24 - granted resource  DiskStorageUnit1-Media_Server1

08/03/2014 16:32:24 - estimated 4655922319 kbytes needed

08/03/2014 16:32:24 - Info nbjm (pid=7144) resumed backup (backupid=ClientName1_1407047411) job for client ClientName1, policy Backup_Policy1, schedule Full on storage unit  Media_Server1-DDSU

08/03/2014 16:32:24 - started process bpbrm (pid=30219)

08/03/2014 16:32:25 - connecting

08/03/2014 16:32:25 - Info bpbrm (pid=30219) starting bpbkar on client

08/03/2014 16:32:25 - Info bpbkar (pid=17664) Backup started

08/03/2014 16:32:25 - Info bpbrm (pid=30219) bptm pid: 30252

08/03/2014 16:32:25 - Error bpbrm (pid=30219) from client ClientName1: ERR - checkpoint version mismatch

08/03/2014 16:32:25 - Info bptm (pid=30252) start

08/03/2014 16:32:25 - Info bptm (pid=30252) using 1048576 data buffer size

08/03/2014 16:32:25 - Info bptm (pid=30252) using 128 data buffers

08/03/2014 16:32:25 - connected; connect time: 0:00:00

08/03/2014 16:32:27 - Info bptm (pid=30252) resuming backup-id ClientName1_1407047411 copy 1 at fragment 48, 512 blocks: 3415050240

08/03/2014 16:32:27 - Info bptm (pid=30252) start backup

08/03/2014 16:32:27 - Info bptm (pid=30252) backup child process is pid 30260

08/03/2014 16:32:27 - Info bptm (pid=30252) waited for full buffer 0 times, delayed 0 times

08/03/2014 16:32:27 - Info bptm (pid=30252) EXITING with status 90 <----------

08/03/2014 16:32:27 - Info bpbkar (pid=17664) done. status: 6: the backup failed to back up the requested files

08/03/2014 16:32:27 - begin writing

08/03/2014 16:32:27 - end writing; write time: 0:00:00

the backup failed to back up the requested files (6)

-------

1: (50) client process aborted

2: (6) the backup failed to back up the requested files

 

 

Andrew21
Level 3
Employee
Hi, Thomas, The issue is caused by NetBackup client version is inconsistent between the suspended job and resumed job. Typically, it's caused by user updates NBU client after backup job is suspended. For example: o) Install NBU 7.5 on client machine, and start a backup job. o) After the check-point is generated, the information is transfer to NBU server. You can check the info under: C:\Program Files\Veritas\NetBackup\db\jobs\restart\job_ID.ckpts. And, the info is like: -------------------------- CPR - 696832 2856 0 0 38 0 0 0 1 356777984 0 1 512 0 1 750000 0 0 0 25 /C/data/0/000/000/033.10M -------------------------- '750000' is the NBU version on client. o) Suspend the backup job and update NBU client to 7.6. o) Resume the backup job, it errors out with 'checkpoint version mismatch'. So, if you need to update NBU client, please wait until job is finished. Otherwise, it may causes job failure. Hope it help!

Thomas_Anthony
Level 5

Forgot the basic environmental info

The Master and Media servers are RedHat Linux 2.6 running NetBackup 7.6.0.2

The Client is a Windows 2008 64bit server  running NetBackup 7.1.0.4

The backup go to a Data Domain disk STU running DDoS 5.4.1.1 (w/OST 2.6.2.0)

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Maybe you can give us all relevant info related to this job?

Did it fail or was it suspended?
Status 50 is normally indication that "something" happened on the client side, e.g.
someone killed the bpbkar processes
system crashed or was rebooted while the backup ran.

Do you have logs enabled on client and media server?
You will need bpbrm on the media server and bpbkar on the client for a better understanding of the initial backup up to the point where it first failed and then resumed.

Andrew21
Level 3
Employee

 

Please check my comment, and make sure if your issue is caused by client version inconsistence.

Thomas_Anthony
Level 5

Thanks for the excellent info regarding the checkpoint files. A review of the checkpoint files for this problem shows the client version as shown above....710000.  That actual client version is 7.1.0.4.

I'm going to update the client to 7.6.0.2 to match the Master and Media servers.

Thomas_Anthony
Level 5

Thank you Marianne.  Both the initial and auto restart jobs failed without client service failure, system crash, reboot or job cancel/suspend/abort.  I will enable the bpbkar log on the client. The logs are enabled on the media server

Marianne
Level 6
Partner    VIP    Accredited Certified

The 1st attempt in this case failed with status 50, meaning 'something' on the client side killed the backup. 
Without any logs I guess we will never know.

Curious to know if this was once-off issue?

What happened to subsequent backups for this client?

Thomas_Anthony
Level 5

Thanks.. .there was no change or upgrade of the Netbackup versions on the Master, Media or clients during this backup.