locked
Netbackup causing Host instability....maybe RRS feed

  • Question

  • So I have been battling a ghost for quite some time and at this point I may need some guidance.

    Equipment:

    HP Bl460(MAXED)Hyper-V environment 3 chassis (host spread out throughout the chassis on 5 node clusters) All running 2012r2 and patched to the latest.

    Hitachi G1000 SAN all fiber to Brocade then Brocade to Chassis

    Hitachi G200 SAN for backups

    HP proliant dl380(maxed)

    4 CSV for around 75 vms in each cluster

    Scenario

    I received a call that VMs started to become unresponsive. Got on VMM and seen that some VMs had incomplete config and some that stated host was unresponsive. I was able to quickly refresh the incomplete and status changed to running but the host that they were previously on was not responding. I try to migrate to no avail, then I have to do a hard boot and the host became responsive and vms are back to normal. At this time I noticed that the backups were running at the same time the host started to become unresponsive. I chalked this one up as a one time issue......Then it happened again Logs on Host and VMs do not appear to have any errors except disk missing which leads me to believe to much of a hit on the SAN

    When a separate host become non-responsive I immediately looked at the backups and seen that one was running for over 12 hours. We stopped the job and things became stable but unfortunately that wasn't the end of it.

    Happened again also again on a different host and cluster, verified backups were running stopped them but this time did not have the same effect. The host had to be restarted and vms became responsive again.

    Things we changed :

    Changed the snapshot count per host to 1

    Changed the larger VMs including all SQL servers to Agent backups

    Changed the schedule to even the hit on the chassis , SAN , CSVs

    Still is happening. Am I overlooking something? Could this be a queue depth issue on the HBA?

    We have an identical setup in another DC without this issue. Any help is appreciated.

    Thank You


    • Edited by John D. Cole Wednesday, September 20, 2017 12:49 PM
    Wednesday, September 20, 2017 12:19 PM

All replies