cancel
Showing results for 
Search instead for 
Did you mean: 

Making sense of checkpoint limitations NDMP / EMC /

jim_dalton
Level 6

We are having some issues with netbackup when backing up via NDMP on our EMC storage.

The issue seems to revolve around creating snapsure snapshots (a Netbackup recommendation).

The filesystems we are taking NDMP backups of are also covered by EMCs checkpoints, so we have online copies and we take additional offline copies.

This appears to contradict a statement made by EMC here:

https://community.emc.com/docs/DOC-50718

"SnapSure creation will fail if enabled during NDMP backup of a checkpoint file system. This is because creating a checkpoint of another checkpoint is not possible and fails. This may occur more commonly when backing up a NDMP Client in NetWorker with SnapSure enabled and save set "All" - where all checkpoints may be included for backup along with regular file systems. In that case, multiple SnapSure failures may be observed, one for each checkpoint file system being backed up through the NDMP Client in NetWorker. "

Our backups start, a tape is mounted and the next step is the snapshot phase. This can regularly fail and we are trying to figure why.

01/27/2017 11:33:55 - positioned 004822; position time: 0:00:05
01/27/2017 11:33:55 - begin writing
01/27/2017 11:39:13 - Error bpbrm (pid=10233) socket read failed: errno = 62 - Timer expired
01/27/2017 11:39:13 - Error ndmpagent (pid=10238) ndmp_data_get_state_failed, status = 18 (NDMP_XDR_DECODE_ERR)
01/27/2017 11:39:14 - Error ndmpagent (pid=10238) ndmp_mover_get_state failed, status = 12 (NDMP_EOF_ERR)

Theres a 5 minute timeout (client_read_timeout) but we are wondering if the snapshot phase is the root cause, exacerbated by us breaking the data into multiple streams, each stream trying to create its own snapsure checkpoint eg

/root_vdm/toplevel_data/level_1

/root_vdm/toplevel_data/level_2

etc.

Anyone out there with involvement wih this kind of issue?

Thanks in advance, Jim  

3 REPLIES 3

Genericus
Moderator
Moderator
   VIP   

I have an isilon, I have this as the first line of my selections:

SET BACKUP_MODE=SNAPSHOT

I assume that is what you are doing as well?

 

Based on your notes, it SHOULD fail during snapshot creation:

The filesystems we are taking NDMP backups of are also covered by EMCs checkpoints

SnapSure creation will fail if enabled during NDMP backup of a checkpoint file system

 

 

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Precisely so Genericus, unless I'm reading or understanding it incorrectly. Yet we do this as default across the whole of our NDMP backups and many/most succeed. I'm trying to figure if its something  shouldnt even be attempting (which raises other issues...)

quebek
Moderator
Moderator
   VIP    Certified

Hello

To isolate the error please run the following from control station of this EMC NDMP filer:

server_log server_2 |grep -i -e NDMP -e PAX

If in these outputs you will not find a clue most likely you will have to enable more verbose logging for the NDMP and PAX facility on EMC by executing:

.server_config server_2 -v "logsys set severity PAX=LOG_DBG3"
.server_config server_2 -v "logsys set severity NDMP=LOG_DBG3"

The rerun the backup have this running

server_log server_2 -f 

so it will put a 'tail' on the outputs - will be updated all the time...

Hopefully you will find some clues what to check do...

Once done please revert back logging to save space:

.server_config server_2 -v "logsys set severity PAX=LOG_ERR"
.server_config server_2 -v "logsys set severity NDMP=LOG_ERR"

you can post the logs from eventuall falied NDMP try (server_log server_2)

BTW. my take is you do backup a file system so snapsure (checkpoint) should be working - I have lots of such configurations and these do works just fine...