Making sense of checkpoint limitations NDMP / EMC /
We are having some issues with netbackup when backing up via NDMP on our EMC storage.
The issue seems to revolve around creating snapsure snapshots (a Netbackup recommendation).
The filesystems we are taking NDMP backups of are also covered by EMCs checkpoints, so we have online copies and we take additional offline copies.
This appears to contradict a statement made by EMC here:
https://community.emc.com/docs/DOC-50718
"SnapSure creation will fail if enabled during NDMP backup of a checkpoint file system. This is because creating a checkpoint of another checkpoint is not possible and fails. This may occur more commonly when backing up a NDMP Client in NetWorker with SnapSure enabled and save set "All" - where all checkpoints may be included for backup along with regular file systems. In that case, multiple SnapSure failures may be observed, one for each checkpoint file system being backed up through the NDMP Client in NetWorker. "
Our backups start, a tape is mounted and the next step is the snapshot phase. This can regularly fail and we are trying to figure why.
01/27/2017 11:33:55 - positioned 004822; position time: 0:00:05
01/27/2017 11:33:55 - begin writing
01/27/2017 11:39:13 - Error bpbrm (pid=10233) socket read failed: errno = 62 - Timer expired
01/27/2017 11:39:13 - Error ndmpagent (pid=10238) ndmp_data_get_state_failed, status = 18 (NDMP_XDR_DECODE_ERR)
01/27/2017 11:39:14 - Error ndmpagent (pid=10238) ndmp_mover_get_state failed, status = 12 (NDMP_EOF_ERR)
Theres a 5 minute timeout (client_read_timeout) but we are wondering if the snapshot phase is the root cause, exacerbated by us breaking the data into multiple streams, each stream trying to create its own snapsure checkpoint eg
/root_vdm/toplevel_data/level_1
/root_vdm/toplevel_data/level_2
etc.
Anyone out there with involvement wih this kind of issue?
Thanks in advance, Jim