Forum Discussion

ViktorBalogh's avatar
8 years ago

Media server reporting SCSI reservation conflict to backup source LUN

We have a Netbackup 7.6 environment with two media servers which are backing up VMs of a vSphere cluster. This has been configured to use SAN transport so the media servers are directly accessing those VMFS FC LUNs. During the backup window we see many SCSI reservation conflicts on the second media server - these are being reported for the backup source:

Jul 6 22:08:04 media-02-elf kernel: sd 1:0:1:2: reservation conflict
Jul 6 22:08:04 media-02-elf kernel: sd 1:0:1:2: reservation conflict
Jul 6 22:08:04 media-02-elf kernel: sd 1:0:1:2: reservation conflict
Jul 6 22:08:04 media-02-elf kernel: sd 1:0:1:2: reservation conflict
Jul 6 22:08:04 media-02-elf kernel: sd 1:0:1:2: reservation conflict

My question would be regarding NBU scalability, may the media servers share the same set of LUNs or should we create dedicated group of LUNs for each media server? This has been setup to avoid being one media server the SPOF, so currently both media servers see all the LUNs to be backed up.

Some more information about this issue:

- the reservation issues are mainly being reported on the second media server
- on the second site we have the same issue so this should exclude a hardware failure as root cause
- only the LUNs from one storage are being reported whereas we also backup data from another array

2 Replies

  • Caveat: I am not a VMware backup expert.

    However, I do believe how this works when you have multiple servers zoned to the same disks (including the esx server itself) is that each OS will attempt to access the disks at their level and cannot, due to one of the servers having a reseration. The server that has this would either be the esx writing the snapshot image or the media server doing the backup.  During any other time the OS will log reservation confclit messages as it is trying to "autoconnect" to these devices and a different server is the one that has the reservation.

    These messages are normal to me as I come across them frequently in customer environments that are doing san based VMware backups. 

    • ViktorBalogh's avatar
      ViktorBalogh
      Level 2

      thank you for your reply. What concerns me is that these are being reported over 200 times and to me it looks rather abnormal. After all we are only having just 2 media servers but we are already planning to add many more as this thing has to scale out. What would happen with e.g. 8 media server nodes? The storage vendor has already investigated and they were asking us if there is any clustering software coordinating the LUN access. On the vSphere stack there is but I am not sure about Netbackup.

      Moreover we are having PowerPath installed on the media servers and PowerPath marks those paths with iopf (I/O path failure) status without any specific cause. If a path remains in iopf for a longer time then PowerPath does a "garbage collection" and removes those completely. I also cannot understand why this issue only affects the LUNs from one specific array (VNX) whereas all the VMAX LUNs are fine during backup.