from 2 weeks ago i faced an issue ,all of VMs hosted on one of our hosts in a cluster with 8 host have been failed in backup procedure (windows vMS error code 13 and linux VMs error code 6).
I can ping that host from my media and master server, all LUNs are visible in host. I've defined new policy for those VMs and ran it but still failed. meanwile quising has been disabled but problem steal remains.
backup procedure is completed successfully for other VMs on other hosts.
Are there existing snapshots? That may result into backup failures.
Have you tried doing a vmotion to another ESXi host and run a backup? If that is successful then move it back to the original ESXi, run a backup and see if it works.
Thanks for your reply,
there are not any previous snapshots, already i've moved the affected VMs to another ESXI, and backup completed successfully. but when i moved them back to the original esxi, backups failed again.
Ok so the issue points to one ESXi host. Are the ESXi hosts on same build level as well?
Have you looked at the VMware logs on the ESX host when you start a backup?
The problem is clearly with the ESX host and not with NBU.
NBU is merely reporting the error and troubleshooting from within NBU will not tell us where exactly the issue is.
You will need to troubleshoot at VMware level.
Can you take snapshots directly from VMware?
Is there sufficient disk space to take snapshots?
Thanks dear Marianne,
direct snapshot is possible.
disks are shared LUNs amoung all ESXIs within the cluster, however there is enough space.
ESXI logs are normal and informes completed taking and removing snapshot.
the bellow log is for one of linux servers hosted on the mentioned ESXi.
04/25/2016 18:49:05 - Info nbjm (pid=4820) starting backup job (jobid=263667) for client linux.text.com, policy Linux1_VMs_Backup, schedule Full
04/25/2016 18:49:05 - estimated 132836064 kbytes needed
04/25/2016 18:49:05 - Info nbjm (pid=4820) started backup (backupid= linux.text.com _1461593945) job for client linux.text.com, policy Linux1_VMs_Backup, schedule Full on storage unit masterserver-hcart2-robot-tld-0
04/25/2016 18:49:08 - Info bpbrm (pid=6276) mediaserver.text.com is the host to backup data from
04/25/2016 18:49:08 - Info bpbrm (pid=6276) telling media manager to start backup on client
04/25/2016 18:49:08 - Info bptm (pid=7984) using 65536 data buffer size
04/25/2016 18:49:08 - Info bptm (pid=7984) using 12 data buffers
04/25/2016 18:49:32 - Info bpbrm (pid=1804) sending bpsched msg: CONNECTING TO CLIENT FOR linux.text.com_1461593945
04/25/2016 18:49:32 - connecting
04/25/2016 18:49:33 - Info bpbrm (pid=1804) start bpbkar32 on client
04/25/2016 18:49:33 - Info bptm (pid=10244) setting receive network buffer to 263168 bytes
04/25/2016 18:49:33 - connected; connect time: 0:00:00
04/25/2016 18:49:33 - begin writing
04/25/2016 18:49:34 - Info bpbkar32 (pid=23876) Backup started
04/25/2016 18:49:36 - Info bpbkar32 (pid=23876) CONTINUE BACKUP received.
04/25/2016 18:49:36 - Info bpbrm (pid=1804) Sending the file list to the client
04/25/2016 18:56:41 - Info bpbrm (pid=6276) child done, status 6
04/25/2016 18:56:41 - Info bpbrm (pid=6276) sending message to media manager: STOP BACKUP linux.text.com_1461593945
04/25/2016 18:56:43 - Info bpbrm (pid=6276) media manager for backup id linux.text.com_1461593945 exited with status 90: media manager received no data for backup image
04/25/2016 18:56:43 - end writing; write time: 0:07:10
the backup failed to back up the requested files (6)