Asked by:
VSS issues with CentOS 6.6 x64
-
Hello,
We're trying to enable VSS backups through Windows Server Backup, but still having issues.
We're running a few VMs on CentOS 6.6 x64, fully updated / patched, hyperv modules installed, cpanel installed.
However, while trying to backup those VMs, *some* of them are crashing / loads going up like crazy.
We do have a few VMs running Ubuntu 14.04 LTS, fully updated / patched as well, but we're still able to backup those VMs without any issues.
We've been able to isolate the issue and it turns out that the VMs that get backed up are the ones that have a VHDX file of <50GB.
VMs that are having issues are the ones having a bigger VHDX file (>100GB).
Both type of VMs are running the exact same OS, patches, modules and services.
So it really appears to be the size of the disk that is causing issues.
Some snippet of our logfiles when a backup is working:
Jun 9 08:39:02 dev Hyper-V VSS: VSS: freeze of /boot: Success
Jun 9 08:39:02 dev Hyper-V VSS: VSS: freeze of /: Success
Jun 9 08:39:02 dev Hyper-V VSS: VSS: thaw of /boot: Success
Jun 9 08:39:02 dev Hyper-V VSS: VSS: thaw of /: Success
And when backup isn't working:
Jun 9 12:18:04 dev Hyper-V VSS: VSS: freeze of /boot: Success
[...] and it hangs / load spikes like crazy.
Does it sound familiar to you guys ?
Any fixes available ?
This is quite crucial as we need to get those backups up and running.
Thanks,
- Luc
- Changed type ltellier Tuesday, June 09, 2015 4:41 PM
Question
All replies
-
-
Hello Joshua,
I'm currently using the hyperv modules that come with the distro:
# rpm -qa |grep hyperv hypervfcopyd-0-0.15.20130826git.el6.x86_64 hypervkvpd-0-0.15.20130826git.el6.x86_64 hyperv-daemons-license-0-0.15.20130826git.el6.noarch hypervvssd-0-0.15.20130826git.el6.x86_64 hyperv-daemons-0-0.15.20130826git.el6.x86_64 # lsmod |grep hyperv hyperv_keyboard 3196 0 hid_hyperv 4278 0 hyperv_fb 8309 1 hv_vmbus 211327 6 hyperv_keyboard,hid_hyperv,hv_netvsc,hv_utils,hyperv_fb,hv_storvsc # dmesg |grep hv_ hv_vmbus: Hyper-V Host Build:9600-6.3-17-0.17039; Vmbus version:3.0 hv_vmbus: child device vmbus_0_1 registered hv_vmbus: child device vmbus_0_2 registered hv_vmbus: child device vmbus_0_3 registered hv_vmbus: child device vmbus_0_4 registered hv_vmbus: child device vmbus_0_5 registered hv_vmbus: child device vmbus_0_6 registered hv_vmbus: child device vmbus_0_7 registered hv_vmbus: child device vmbus_0_8 registered hv_vmbus: child device vmbus_0_9 registered hv_vmbus: child device vmbus_0_10 registered hv_vmbus: child device vmbus_0_11 registered hv_vmbus: child device vmbus_0_12 registered hv_vmbus: child device vmbus_0_13 registered hv_vmbus: child device vmbus_0_14 registered hv_vmbus: registering driver hv_storvsc hv_vmbus: registering driver hyperv_fb hv_utils: Registering HyperV Utility Driver hv_vmbus: registering driver hv_util hv_vmbus: registering driver hv_netvsc hv_netvsc: hv_netvsc channel opened successfully hv_netvsc vmbus_0_14: Send section size: 6144, Section count:170 hv_netvsc vmbus_0_14: Device MAC 02:00:00:06:24:3c link state up hv_netvsc vmbus_0_14: real num tx,rx queues:1, 1 hv_vmbus: registering driver hid_hyperv hv_vmbus: registering driver hyperv_keyboard hv_utils: KVP: user-mode registering done. hv_utils: VSS daemon registered
As for the errors, "wbadmin" show the following:
And on the guest's console:
The output of /var/log/messages:
Jun 30 16:00:58 dev Hyper-V VSS: VSS: freeze of /boot: Success Jun 30 16:00:58 dev Hyper-V VSS: VSS: freeze of /tmp: Success Jun 30 16:00:58 dev Hyper-V VSS: VSS: freeze of /var/tmp: Success
And the dmesg's output:
sd 2:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters. IPv6 addrconf: prefix with wrong length 56 INFO: task loop0:724 blocked for more than 120 seconds. Not tainted 2.6.32-504.16.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. loop0 D 0000000000000000 0 724 2 0x00000000 ffff880102ffdac0 0000000000000046 ffff880100a20040 000000018122c550 ffffffffa0121200 0000000000001000 ffff8801020d02c0 ffff8801038d2608 ffff880102ffdce0 0000000000000001 ffff880100a205f8 ffff880102ffdfd8 Call Trace: [<ffffffff8109eede>] ? prepare_to_wait+0x4e/0x80 [<ffffffff8119038c>] __sb_start_write+0xdc/0x120 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81126a79>] generic_file_aio_write+0x69/0x100 [<ffffffffa00e4e08>] ext4_file_write+0x58/0x190 [ext4] [<ffffffff8118e00a>] do_sync_write+0xfa/0x140 [<ffffffff81064a2e>] ? try_to_wake_up+0x24e/0x3e0 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81063c23>] ? perf_event_task_sched_out+0x33/0x70 [<ffffffff8106cc03>] ? dequeue_entity+0x113/0x2e0 [<ffffffff81379b54>] __do_lo_send_write+0x54/0xa0 [<ffffffff81379f81>] do_lo_send_direct_write+0x81/0xa0 [<ffffffff8137b0e5>] do_bio_filebacked+0x205/0x330 [<ffffffff81379f00>] ? do_lo_send_direct_write+0x0/0xa0 [<ffffffff8137b2e1>] loop_thread+0xd1/0x270 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8137b210>] ? loop_thread+0x0/0x270 [<ffffffff8109e71e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109e680>] ? kthread+0x0/0xc0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task flush-7:0:736 blocked for more than 120 seconds. Not tainted 2.6.32-504.16.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-7:0 D 0000000000000000 0 736 2 0x00000000 ffff880100795870 0000000000000046 0000000000000002 ffff8801010d7314 ffff8800d88dd3c0 ffff8801010d7200 0000000000000001 00000000ef313a60 ffff880100afac00 0000000100046703 ffff8801008b1068 ffff880100795fd8 Call Trace: [<ffffffff810aaad1>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff811c51f0>] ? sync_buffer+0x0/0x50 [<ffffffff8152a613>] io_schedule+0x73/0xc0 [<ffffffff811c5230>] sync_buffer+0x40/0x50 [<ffffffff8152aeaa>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff8106d1a5>] ? enqueue_entity+0x125/0x450 [<ffffffff811c51f0>] ? sync_buffer+0x0/0x50 [<ffffffff8152af88>] out_of_line_wait_on_bit_lock+0x78/0x90 [<ffffffff8109ec30>] ? wake_bit_function+0x0/0x50 [<ffffffff811c5560>] ? end_buffer_async_write+0x0/0x190 [<ffffffff811c53d6>] __lock_buffer+0x36/0x40 [<ffffffff811c66d5>] __block_write_full_page+0x305/0x330 [<ffffffff811c5560>] ? end_buffer_async_write+0x0/0x190 [<ffffffff811ca460>] ? blkdev_get_block+0x0/0x20 [<ffffffff811ca460>] ? blkdev_get_block+0x0/0x20 [<ffffffff811c67e0>] block_write_full_page_endio+0xe0/0x120 [<ffffffff81123e90>] ? find_get_pages_tag+0x40/0x130 [<ffffffff811c6835>] block_write_full_page+0x15/0x20 [<ffffffff811cb5f8>] blkdev_writepage+0x18/0x20 [<ffffffff811381a7>] __writepage+0x17/0x40 [<ffffffff8113946d>] write_cache_pages+0x1fd/0x4c0 [<ffffffff81138190>] ? __writepage+0x0/0x40 [<ffffffff811339d2>] ? free_pcppages_bulk+0x392/0x460 [<ffffffff81450097>] ? skb_dequeue+0x67/0x90 [<ffffffff81139754>] generic_writepages+0x24/0x30 [<ffffffff81139781>] do_writepages+0x21/0x40 [<ffffffff811bb07d>] writeback_single_inode+0xdd/0x290 [<ffffffff811bb47d>] writeback_sb_inodes+0xbd/0x170 [<ffffffff811bb5db>] writeback_inodes_wb+0xab/0x1b0 [<ffffffff811bb9d3>] wb_writeback+0x2f3/0x410 [<ffffffff81088092>] ? del_timer_sync+0x22/0x30 [<ffffffff811bbc95>] wb_do_writeback+0x1a5/0x240 [<ffffffff811bbd93>] bdi_writeback_task+0x63/0x1b0 [<ffffffff8109ea37>] ? bit_waitqueue+0x17/0xd0 [<ffffffff81148490>] ? bdi_start_fn+0x0/0x100 [<ffffffff81148516>] bdi_start_fn+0x86/0x100 [<ffffffff81148490>] ? bdi_start_fn+0x0/0x100 [<ffffffff8109e71e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109e680>] ? kthread+0x0/0xc0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task auditd:936 blocked for more than 120 seconds. Not tainted 2.6.32-504.16.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. auditd D 0000000000000000 0 936 1 0x00000000 ffff880102231c98 0000000000000086 0000000000000000 ffff880102231ca0 ffffc9000105b080 ffff8801007df520 ffff880102231c38 ffffffff810b231a ffff880102231eb8 ffff880102231c88 ffff8801007dfad8 ffff880102231fd8 Call Trace: [<ffffffff810b231a>] ? futex_wait_queue_me+0xba/0xf0 [<ffffffff8109eede>] ? prepare_to_wait+0x4e/0x80 [<ffffffff8119038c>] __sb_start_write+0xdc/0x120 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81126a79>] generic_file_aio_write+0x69/0x100 [<ffffffffa00e4e08>] ext4_file_write+0x58/0x190 [ext4] [<ffffffff8118e00a>] do_sync_write+0xfa/0x140 [<ffffffffa00ff04f>] ? ext4_statfs+0xef/0x200 [ext4] [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811c2e18>] ? do_statfs_native+0x98/0xb0 [<ffffffff8123aa6b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff8122d866>] ? security_file_permission+0x16/0x20 [<ffffffff8118e308>] vfs_write+0xb8/0x1a0 [<ffffffff8118ecd1>] sys_write+0x51/0x90 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task rs:main Q:Reg:999 blocked for more than 120 seconds. Not tainted 2.6.32-504.16.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rs:main Q:Reg D 0000000000000000 0 999 1 0x00000080 ffff8801028edc98 0000000000000082 0000000000000000 ffff8801028edca0 ffffc9000111b9c0 ffff8801007deab0 ffff8801028edc38 ffffffff810b231a ffff880101a3aae8 ffff8801028edc88 ffff8801007df068 ffff8801028edfd8 Call Trace: [<ffffffff810b231a>] ? futex_wait_queue_me+0xba/0xf0 [<ffffffff8109eede>] ? prepare_to_wait+0x4e/0x80 [<ffffffff8119038c>] __sb_start_write+0xdc/0x120 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81126a79>] generic_file_aio_write+0x69/0x100 [<ffffffffa00e4e08>] ext4_file_write+0x58/0x190 [ext4] [<ffffffff8118e00a>] do_sync_write+0xfa/0x140 [<ffffffff811d22ed>] ? fsnotify_add_notify_event+0x12d/0x280 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8123aa6b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff8122d866>] ? security_file_permission+0x16/0x20 [<ffffffff8118e308>] vfs_write+0xb8/0x1a0 [<ffffffff8118ecd1>] sys_write+0x51/0x90 [<ffffffff810e5b6e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task hv_vss_daemon:1203 blocked for more than 120 seconds. Not tainted 2.6.32-504.16.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. hv_vss_daemon D 0000000000000000 0 1203 1 0x00000080 ffff880100065ce8 0000000000000082 ffff88010065f900 ffff8801010d7314 ffff8800e2a28840 ffff8801010d7200 ffff880100a20088 000000009c2ccfe9 ffff880100afac00 000000010003f25b ffff880037c39ad8 ffff880100065fd8 Call Trace: [<ffffffff810aaad1>] ? ktime_get_ts+0xb1/0xf0 [<ffffffff811c51f0>] ? sync_buffer+0x0/0x50 [<ffffffff8152a613>] io_schedule+0x73/0xc0 [<ffffffff811c5230>] sync_buffer+0x40/0x50 [<ffffffff8152b0df>] __wait_on_bit+0x5f/0x90 [<ffffffff811c51f0>] ? sync_buffer+0x0/0x50 [<ffffffff8152b188>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8109ec30>] ? wake_bit_function+0x0/0x50 [<ffffffff811c51e6>] __wait_on_buffer+0x26/0x30 [<ffffffff811c5e51>] __sync_dirty_buffer+0x71/0xf0 [<ffffffff811c5ee3>] sync_dirty_buffer+0x13/0x20 [<ffffffffa0198511>] ext3_commit_super.clone.0+0x71/0x100 [ext3] [<ffffffffa0198e5e>] ext3_unfreeze+0x3e/0x70 [ext3] [<ffffffff811cb833>] thaw_bdev+0xb3/0x1e0 [<ffffffff811a3e2d>] do_vfs_ioctl+0x2cd/0x580 [<ffffffff811a4161>] sys_ioctl+0x81/0xa0 [<ffffffff810e5b6e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task queueprocd - qu:1513 blocked for more than 120 seconds. Not tainted 2.6.32-504.16.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. queueprocd - D 0000000000000000 0 1513 1 0x00000080 ffff8800e28e9ce8 0000000000000086 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff8800e28e9ca0 0000000100043a79 ffff8800e292b068 ffff8800e28e9fd8 Call Trace: [<ffffffff8109eede>] ? prepare_to_wait+0x4e/0x80 [<ffffffff8119038c>] __sb_start_write+0xdc/0x120 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811b0e14>] mnt_want_write+0x24/0x50 [<ffffffff8118f8a0>] ? get_empty_filp+0xa0/0x180 [<ffffffff811a0f79>] do_filp_open+0x2b9/0xd20 [<ffffffff810a3b63>] ? __hrtimer_start_range_ns+0x1a3/0x460 [<ffffffff810a3221>] ? lock_hrtimer_base+0x31/0x60 [<ffffffff8129935a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811ae272>] ? alloc_fd+0x92/0x160 [<ffffffff8118b0b7>] do_sys_open+0x67/0x130 [<ffffffff8118b1c0>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task anacron:1872 blocked for more than 120 seconds. Not tainted 2.6.32-504.16.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. anacron D 0000000000000000 0 1872 1 0x00000080 ffff8800d8957ce8 0000000000000086 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff88000001d6c0 000000010003fd17 ffff8800f6eb9068 ffff8800d8957fd8 Call Trace: [<ffffffff8109eede>] ? prepare_to_wait+0x4e/0x80 [<ffffffff8119038c>] __sb_start_write+0xdc/0x120 [<ffffffff8109ebb0>] ? autoremove_wake_function+0x0/0x40
Let me know if you need anything else!
Thanks,
- Luc
-
-
Hello,
It is still not working.
Here is what we've done so far.
Removing the LIS built into CentOS6.6:root@dev [~]# rpm -qa |grep hyperv hyperv-daemons-0-0.15.20130826git.el6.x86_64 hypervkvpd-0-0.15.20130826git.el6.x86_64 hypervfcopyd-0-0.15.20130826git.el6.x86_64 hypervvssd-0-0.15.20130826git.el6.x86_64 hyperv-daemons-license-0-0.15.20130826git.el6.noarch root@dev [~]# rpm -e hyperv-daemons hypervkvpd hypervfcopyd hypervvssd hyperv-daemons-license root@dev [~]# rpm -qa |grep hyperv root@dev [~]#
Installing the latest LIS:
root@dev [~]# cd /mnt/RHEL66/ root@dev [/mnt/RHEL66]# ./install.sh Installing the Linux Integration Services for Microsoft Hyper-V... Preparing... ########################################### [100%] 1:kmod-microsoft-hyper-v ########################################### [100%] Preparing... ########################################### [100%] 1:microsoft-hyper-v ########################################### [100%] Saving old initramfs Installing new initramfs Adding KVP Daemon to Chkconfig.... Starting KVP Daemon.... Adding VSS Daemon to Chkconfig.... Starting VSS Daemon.... Adding FCOPY Daemon to Chkconfig.... Starting FCOPY Daemon.... Linux Integration Services for Hyper-V has been installed. Please reboot your system.
We then rebooted the VM, started a backup and unfortunately, didn't work at all.
Processes are getting hung all around and the only way to make the VM reachable is to reboot it.
Unfortunately, we're not seeing any errors within the VM log files.
Please let me know if you need anything else.
Thanks,
- Edited by ltellier Thursday, July 02, 2015 6:47 PM
-
Let's verify that you are running VSS from LIS4. When you do
ps aux | grep vss
What daemon and path do you see? For LIS4 it should be only hv_vss_daemon.
In addition, can you provide more details about your VM confirmation? I understand it's a larger >100GB VHDX you are trying to back up, but it would help us with reproduction if we knew memory side, number of vCPUs, number and type of NICs, and storage details as well.
Thanks! --jrp
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
- Edited by Josh Poulson [MSFT]Microsoft employee Wednesday, July 08, 2015 6:10 PM
-
Hello Joshua,
Here you go:
root@dev [~]# ps aux | grep vss root 1256 0.0 0.0 4056 468 ? Ss 13:55 0:00 /usr/sbin/hv_vss_daemon root 1905 0.0 0.0 103248 876 pts/0 S+ 14:08 0:00 grep vss root@dev [~]# lsof -p 1256 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME hv_vss_da 1256 root cwd DIR 253,0 4096 2 / hv_vss_da 1256 root rtd DIR 253,0 4096 2 / hv_vss_da 1256 root txt REG 253,0 9744 1835719 /usr/sbin/hv_vss_daemon hv_vss_da 1256 root mem REG 253,0 1921176 392461 /lib64/libc-2.12.so hv_vss_da 1256 root mem REG 253,0 154624 392821 /lib64/ld-2.12.so hv_vss_da 1256 root 0u CHR 1,3 0t0 4165 /dev/null hv_vss_da 1256 root 1u CHR 1,3 0t0 4165 /dev/null hv_vss_da 1256 root 2u CHR 1,3 0t0 4165 /dev/null hv_vss_da 1256 root 3u unix 0xffff8801012fea40 0t0 11580 socket hv_vss_da 1256 root 4u sock 0,6 0t0 11581 can't identify protocol
As for the VM specs:
GEN: Version 1
CPU: 1x RAM: 4GB (static) HDD: 100GB (Configured on IDE Controller 0:0)
vNIC: 1x (Configured as non-Legacy, VMQ enabled, IPSec offloading enabled, Max Number offloaded 512, Static MAC address specified, Protected neetwork enabled)
Please let me know if you need further information, I will be happy to help!
- Luc
- Edited by ltellier Friday, July 10, 2015 6:18 PM
-
Hello Joshua,
Here you go:
root@dev [~]# ps aux | grep vss root 1256 0.0 0.0 4056 468 ? Ss 13:55 0:00 /usr/sbin/hv_vss_daemon root 1905 0.0 0.0 103248 876 pts/0 S+ 14:08 0:00 grep vss root@dev [~]# lsof -p 1256 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME hv_vss_da 1256 root cwd DIR 253,0 4096 2 / hv_vss_da 1256 root rtd DIR 253,0 4096 2 / hv_vss_da 1256 root txt REG 253,0 9744 1835719 /usr/sbin/hv_vss_daemon hv_vss_da 1256 root mem REG 253,0 1921176 392461 /lib64/libc-2.12.so hv_vss_da 1256 root mem REG 253,0 154624 392821 /lib64/ld-2.12.so hv_vss_da 1256 root 0u CHR 1,3 0t0 4165 /dev/null hv_vss_da 1256 root 1u CHR 1,3 0t0 4165 /dev/null hv_vss_da 1256 root 2u CHR 1,3 0t0 4165 /dev/null hv_vss_da 1256 root 3u unix 0xffff8801012fea40 0t0 11580 socket hv_vss_da 1256 root 4u sock 0,6 0t0 11581 can't identify protocol
As for the VM specs:
GEN: Version 1
CPU: 1x RAM: 4GB (static) HDD: 100GB (Configured on IDE Controller 0:0)
vNIC: 1x (Configured as non-Legacy, VMQ enabled, IPSec offloading enabled, Max Number offloaded 512, Static MAC address specified, Protected neetwork enabled)
Please let me know if you need further information, I will be happy to help!
- Luc
Hi,
Has there been any progress on this ?
Thanks !
-
-
Is SELINUX enabled? Does it work with SELINUX disabled?
Had similar issues before. If you need SELINUX, this may help.
yum -y install policycoreutils-python
semanage permissive -a hypervvssd_t
reboot
Yep, already done that but backups aren't still working.
Thanks,
-
Just FYI, I've updated to LIS 4.0.11 and still got the same issue; the VM just hangs and becomes unusable.
/var/log/messages:
Aug 15 21:43:14 dev HV_FCOPY: HV_FCOPY starting; pid is:1261 Aug 15 21:43:14 dev HV_FCOPY: open /dev/vmbus/hv_fcopy failed; error: 2 No such file or directory Aug 15 21:43:14 dev KVP: KVP starting; pid is:1273 Aug 15 21:43:14 dev Hyper-V VSS: VSS starting; pid is:1285 Aug 15 21:43:14 dev kernel: hv_utils: VSS daemon registered Aug 15 21:43:14 dev KVP: KVP LIC Version: 4.0.11 Aug 15 21:43:14 dev kernel: hv_utils: KVP: user-mode registering done.
dmesg:
hv_utils: VSS daemon registered hv_utils: KVP: user-mode registering done. hv_utils: VSS: timeout waiting for daemon to reply hv_utils: VSS: timeout waiting for daemon to reply [...]
RPMs:
# rpm -qa |grep hyper kmod-microsoft-hyper-v-4.0.11-20150728.x86_64 microsoft-hyper-v-4.0.11-20150728.x86_64
Please let me know if you guys need more information.
Thanks,
- Luc
- Edited by ltellier Sunday, August 16, 2015 1:54 AM
-
-
Not such if useful.
Looks like your hv_vss_daemon is not running. Query status and try to start if not running
/etc/init.d/hv_vss_daemon status
/etc/init.d/hv_vss_daemon stop
/etc/init.d/hv_vss_daemon start
Thanks Mike, unfortunately, it is still not working.
VSS is still running but for some reason, it times out:
hv_utils: VSS daemon registered hv_utils: KVP: user-mode registering done. hv_utils: VSS: timeout waiting for daemon to reply hv_utils: VSS: timeout waiting for daemon to reply
-
Are you backing up a VHD on C: to a location on C:? This is not supported and may be causing problems for you. You can back up to a different partition, or to a VHD, to get around is.
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
-
Are you backing up a VHD on C: to a location on C:? This is not supported and may be causing problems for you. You can back up to a different partition, or to a VHD, to get around is.
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
VHDx are stored on a different drive (E:).
We're backing up to our main drive (C:).
Also, please note that our other VMs are being backed up just fine.
Thanks,
-
Are you backing up a VHD on C: to a location on C:? This is not supported and may be causing problems for you. You can back up to a different partition, or to a VHD, to get around is.
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
VHDx are stored on a different drive (E:).
We're backing up to our main drive (C:).
Also, please note that our other VMs are being backed up just fine.
Thanks,
Is there any updates regarding this issue ?
Please let us know if we can help!
Thanks,
-
Are there any other messages in the logs? We are not able to reproduce this problem, and we regularly test VSS backups with our releases of LIS as well as our upstream kernel testing.
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
-
-
Hi Frnds,
Im also facing the same issue. I have a hyperV Windows 2012 R2 cluster and backup is taken using Veeam software. Seeing the same logs for some linux VMs running RHEL 6.3.
Oct 19 08:11:53 CSHSDVLSYMPOC01 Hyper-V VSS: VSS: freeze of /boot: Success Oct 19 08:11:53 CSHSDVLSYMPOC01 Hyper-V VSS: VSS: freeze of /: Success Oct 19 08:11:53 CSHSDVLSYMPOC01 kernel: hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x82 Oct 19 08:11:53 CSHSDVLSYMPOC01 kernel: hv_storvsc vmbus_0_1: stor pkt ffff8801fe56d180 autosense data valid - len 18 Oct 19 08:11:53 CSHSDVLSYMPOC01 kernel: storvsc: Sense Key : Unit Attention [current] Oct 19 08:11:53 CSHSDVLSYMPOC01 kernel: storvsc: Add. Sense: Changed operating definition Oct 19 08:11:53 CSHSDVLSYMPOC01 kernel: sd 0:0:0:0: [sda] Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters. Oct 19 08:11:53 CSHSDVLSYMPOC01 Hyper-V VSS: VSS: thaw of /boot: Success Oct 19 08:11:53 CSHSDVLSYMPOC01 Hyper-V VSS: VSS: thaw of /: Success Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: INFO: task auditd:2490 blocked for more than 120 seconds. Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: auditd D 0000000000000003 0 2490 1 0x00000080 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: ffff8801fe5bbcc8 0000000000000082 0000000000000000 0000000000000286 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: 0000000000000282 0000000000000003 0000000000000001 0000000000000282 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: ffff8801feb71098 ffff8801fe5bbfd8 000000000000fb88 ffff8801feb71098 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: Call Trace: Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811140c0>] ? sync_page+0x0/0x50 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff814fdfc3>] io_schedule+0x73/0xc0 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811140fd>] sync_page+0x3d/0x50 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff814fe97f>] __wait_on_bit+0x5f/0x90 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff81114333>] wait_on_page_bit+0x73/0x80 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff81092110>] ? wake_bit_function+0x0/0x50 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff8112a835>] ? pagevec_lookup_tag+0x25/0x40 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811147ab>] wait_on_page_writeback_range+0xfb/0x190 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811299e1>] ? do_writepages+0x21/0x40 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811148fb>] ? __filemap_fdatawrite_range+0x5b/0x60 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff81114978>] filemap_write_and_wait_range+0x78/0x90 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811a9fae>] vfs_fsync_range+0x7e/0xe0 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811aa07d>] vfs_fsync+0x1d/0x20 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811aa0be>] do_fsync+0x3e/0x60 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff811aa110>] sys_fsync+0x10/0x20 Oct 19 08:14:37 CSHSDVLSYMPOC01 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Oct 19 08:16:37 CSHSDVLSYMPOC01 kernel: INFO: task flush-8:0:445 blocked for more than 120 seconds. Oct 19 08:16:37 CSHSDVLSYMPOC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 19 08:16:37 CSHSDVLSYMPOC01 kernel: flush-8:0 D 0000000000000003 0 445 2 0x00000080 Oct 19 08:16:37 CSHSDVLSYMPOC01 kernel: ffff8802014d1560 0000000000000046 0000000000000000 0000000000000020 Oct 19 08:16:37 CSHSDVLSYMPOC01 kernel: ffff8801fde9b000 ffff880000029d80 ffff8802014d14f0 ffffffff81113ece
Any idea how this can be fixed? Tried installing linux integration service 4 but same issue persists :-(
-
-
For anyone still having this issue - please reply back posting your output from 'lsblk' from the command line
as root or with sudo on affected machines, and also the full output from the 'mount' command ('mount' on its own, no switches/parameters).
I have a feeling I know what is causing this.
-
For anyone still having this issue - please reply back posting your output from 'lsblk' from the command line
as root or with sudo on affected machines, and also the full output from the 'mount' command ('mount' on its own, no switches/parameters).
I have a feeling I know what is causing this.
Sure:
root@web [~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 977M 0 loop /tmp vda 252:0 0 30G 0 disk └─vda1 252:1 0 30G 0 part /
root@web [~]# mount /dev/vda1 on / type ext4 (rw,usrquota) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) /usr/tmpDSK on /tmp type ext3 (rw,noexec,nosuid,loop=/dev/loop0) /tmp on /var/tmp type none (rw,noexec,nosuid,bind) root@web [~]#
Thanks!
- Luc
-
I have the same issue
and my results are:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 127G 0 disk
ââsda1 8:1 0 512M 0 part /boot
ââsda2 8:2 0 2G 0 part /tmp
ââsda3 8:3 0 992M 0 part [SWAP]
ââsda4 8:4 0 1K 0 part
ââsda5 8:5 0 123.5G 0 part /
sdb 8:16 0 300G 0 disk
ââsdb1 8:17 0 300G 0 part /www
sdc 8:32 0 250G 0 disk
ââsdc1 8:33 0 250G 0 part /backup
mount:
/dev/sda5 on / type ext4 (rw,usrjquota=quota.user,jqfmt=vfsv0)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext3 (rw)
/dev/sda2 on /tmp type ext3 (rw,noexec,nosuid)
/dev/sdb1 on /www type ext4 (rw,nosuid,nodev,noatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/tmp on /var/tmp type none (rw,noexec,nosuid,bind)
/dev/sdc1 on /backup type ext4 (rw,nosuid,nodev,noatime)
Thanks
-
Similar issue...HyperV Windows 2012 R2 cluster and backup is taken using Veeam software. Backing up a linux VM running RHEL 5.11. LIS 4.0.11 installed. The backup was successful (apparently), but the VM rebooted immediately after:
Feb 15 23:10:51 Srvr01 Hyper-V VSS: VSS: op=FREEZE: succeeded
Feb 15 23:10:51 Srvr01 kernel: storvsc: Current: sense key: Unit Attention
Feb 15 23:10:51 Srvr01 kernel: Add. Sense: Changed operating definition
Feb 15 23:10:51 Srvr01 kernel:
Feb 15 23:10:51 Srvr01 kernel: sd 0:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters.
Feb 15 23:10:51 Srvr01 kernel: storvsc: Current: sense key: Unit Attention
Feb 15 23:10:51 Srvr01 kernel: Add. Sense: Changed operating definition
Feb 15 23:10:51 Srvr01 kernel:
Feb 15 23:10:51 Srvr01 kernel: sd 1:0:0:1: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters.
Feb 15 23:10:51 Srvr01 Hyper-V VSS: VSS: op=THAW: succeeded
Feb 15 23:13:35 Srvr01 kernel: storvsc: Current: sense key: Unit Attention
Feb 15 23:13:35 Srvr01 kernel: Add. Sense: Changed operating definition
Feb 15 23:13:35 Srvr01 kernel:
Feb 15 23:13:35 Srvr01 kernel: sd 0:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automatically adjust these parameters.
Feb 15 23:14:26 Srvr01 syslogd 1.4.1: restart.Any updates on this issue?
-
Thank you for reporting these VSS issues. Please try LIS 4.1 and see if that improves your situation. A number of storvsc fixes have gone into this release.
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
- Edited by Josh Poulson [MSFT]Microsoft employee Friday, March 25, 2016 6:31 PM
-
I've been having this problem too, since August last year when I move a Centos7.2 Gen2 guest from my Windows 8.1 HV into a production 2012R2 host. I've tried every version of LIS and moving hosts twice. Nothing has worked, I ended up going back to the built-in version of LIS and running without a image backup.
I was very pleased to see this last post about installing LIS 4.1 which came out a few weeks ago, however I have just deployed it and am still getting the same errors when backing up the guest OS using Windows Backup (which works for all other guests, Windows and Ubuntu. It seems to be just Centos 7.x servers affected)
I followed the instructions exactly, including the custom SELinux policy. It all went OK with no problems, everything still checks out OK after a reboot.
On the host (which also runs the backup) I see
Application backup
Writer Id: {66841CD4-6DED-4F4B-8F17-FD23F8DDC3DE}
Component: 327F722B-EFCC-48C3-BB18-E169B82A616B
Caption : Online\servername
Logical Path:
Error : 80780175
Error Message : Component was skipped from volume shadow copy.
Detailed Error : 800423F4
Detailed Error Message : The writer experienced a non-transient error. If the backup process is retried, the error is likely to reoccur.
On the guest I see
Mar 31 06:01:42 servername kernel: hv_utils: VSS: timeout waiting for daemon to reply
There are no other errors logged anywhere (nothing in audit/log), presumably as the filesystem is locked by VSS on the host. Although the system is still running, any process that attempts to write to disk will freeze. The only fix is to hit the reset button.
I would really like to be able to snapshot this server and use VSS/Windows Backup. Can Microsoft acknowledge this issue and give some kind of feedback?
Thanks.
For further system info below:
# lsmod | grep hv
hv_netvsc 39797 0
hv_utils 24309 2
hv_storvsc 22535 3
scsi_transport_fc 64056 1 hv_storvsc
hv_vmbus 398110 6 hyperv_keyboard,hv_netvsc,hid_hyperv,hv_utils,hyperv_fb,hv_storvsc
# modinfo hv_vmbus
filename: /lib/modules/3.10.0-327.4.4.el7.x86_64/weak-updates/microsoft-hyper-v/hv_vmbus.ko
version: 4.1.0
license: GPL
rhelversion: 7.2
srcversion: C978A35E28EF56CCF9D56F5
alias: acpi*:VMBus:*
alias: acpi*:VMBUS:*
depends:
vermagic: 3.10.0-327.el7.x86_64 SMP mod_unload modversions
#cat /etc/*release*
CentOS Linux release 7.2.1511 (Core)
Derived from Red Hat Enterprise Linux 7.2 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)".../snip
# service --status-all
snip
● hv_fcopy_daemon.service - LSB: hv_fcopy_daemon provides info to the host
Loaded: loaded (/etc/rc.d/init.d/hv_fcopy_daemon)
Active: active (exited) since Thu 2016-03-31 08:30:22 BST; 22min ago
Docs: man:systemd-sysv-generator(8)
Process: 2650 ExecStart=/etc/rc.d/init.d/hv_fcopy_daemon start (code=exited, status=0/SUCCESS)
Mar 31 08:30:22 servername systemd[1]: Starting LSB: hv_fcopy_daemon provides info to the host...
Mar 31 08:30:22 servername hv_fcopy_daemon[2650]: Starting Hyper-V FCOPY daemon [ OK ]
Mar 31 08:30:22 servername systemd[1]: Started LSB: hv_fcopy_daemon provides info to the host.
● hv_kvp_daemon.service - LSB: hv_kvp_daemon provides info to the host
Loaded: loaded (/etc/rc.d/init.d/hv_kvp_daemon)
Active: active (running) since Thu 2016-03-31 08:30:22 BST; 22min ago
Docs: man:systemd-sysv-generator(8)
Process: 2648 ExecStart=/etc/rc.d/init.d/hv_kvp_daemon start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/hv_kvp_daemon.service
└─2670 /usr/sbin/hv_kvp_daemon
Mar 31 08:30:22 servername systemd[1]: Starting LSB: hv_kvp_daemon provides info to the host...
Mar 31 08:30:22 servername systemd[1]: Started LSB: hv_kvp_daemon provides info to the host.
Mar 31 08:30:22 servername hv_kvp_daemon[2648]: Starting Hyper-V KVP daemon [ OK ]
Mar 31 08:30:22 servername KVP[2670]: KVP starting; pid is:2670
Mar 31 08:30:22 servername KVP[2670]: KVP LIC Version: 4.1.0
● hv_vss_daemon.service - LSB: hv_vss_daemon provides info to the host
Loaded: loaded (/etc/rc.d/init.d/hv_vss_daemon)
Active: active (running) since Thu 2016-03-31 08:30:22 BST; 22min ago
Docs: man:systemd-sysv-generator(8)
Process: 2649 ExecStart=/etc/rc.d/init.d/hv_vss_daemon start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/hv_vss_daemon.service
└─2669 /usr/sbin/hv_vss_daemon
Mar 31 08:30:22 servername systemd[1]: Starting LSB: hv_vss_daemon provides info to the host...
Mar 31 08:30:22 servername hv_vss_daemon[2669]: Hyper-V VSS: VSS starting; pid is:2669
Mar 31 08:30:22 servername hv_vss_daemon[2669]: Hyper-V VSS: VSS: kernel module version: 129
Mar 31 08:30:22 servername systemd[1]: Started LSB: hv_vss_daemon provides info to the host.
Mar 31 08:30:22 servername hv_vss_daemon[2649]: Starting Hyper-V VSS daemon [ OK ] /snip
# ll /usr/sbin | grep hv
-rwxr-xr-x. 1 root root 11600 Mar 15 00:27 hv_fcopy_daemon
-rwxr-xr-x. 1 root root 895 Mar 15 00:27 hv_get_dhcp_info
-rwxr-xr-x. 1 root root 622 Mar 15 00:27 hv_get_dns_info
-rwxr-xr-x. 1 root root 28632 Mar 15 00:27 hv_kvp_daemon
-rwxr-xr-x. 1 root root 1853 Mar 15 00:27 hv_set_ifconfig
-rwxr-xr-x. 1 root root 11600 Mar 15 00:27 hv_vss_daemon
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 127G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 126.5G 0 part
├─centos_servername-swap 253:0 0 2G 0 lvm [SWAP]
├─centos_servername-root 253:1 0 50G 0 lvm /
└─centos_servername-home 253:3 0 74.5G 0 lvm /home
sdb 8:16 0 1000G 0 disk
└─sdb1 8:17 0 1000G 0 part
└─centos_servername_storage-centos_servername_storage--volume 253:2 0 999.9G 0 lvm /opt/servername/storagenote : sdb is new and has been added since the problem started.
- Edited by RichardSGriffiths Thursday, March 31, 2016 8:07 AM
-
-
-
-
-
Richard,
You mention that you are going back to the built-in LIS and not running backups. Don't forget that when using built-in LIS on CentOS the "hyperv-daemons" package will need to be installed to get the daemons. You should still be able to backup CentOS with built-in LIS.
Thanks, --jrp
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
-
Hello Team,
Did anyone got any solution, I am facing similar issue where RHEL servers are getting hang when backup starts? below is the System info
[root@SERVERNAME ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.7 (Santiago) [root@SERVERNAME ~]# /Var/Log/Messages == Aug 21 22:24:10 SERVERNAME kernel: hv_utils: VSS: timeout waiting for daemon to reply Aug 22 07:30:27 SERVERNAME kernel: imklog 5.8.10, log source = /proc/kmsg started. << SERVER REBOOTED Aug 22 07:30:27 SERVERNAME rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1460" x-info="http://www.rsyslog.com"] start ==
Regards, Milind Koyande
- Edited by Milind Koyande Monday, August 22, 2016 2:22 PM
-
-
Hello Josh,
Thanks for your reply. I have checked and found that <g class="gr_ gr_166 gr-alert gr_spell gr_disable_anim_appear ContextualSpelling ins-del multiReplace" data-gr-id="166" id="166">Hyperv</g>-daemons is not installed on the server
[root@SERVERNAME ~]# rpm -ivh | grep hyperv rpm: no packages given for install [root@SERVERNAME ~]#
I might sound stupid but I want to ask a basic question they are:
- HyperV-Daemons are something which needs to be installed separately from Linux Integration Tools?
- From where I can download the HyperV-Daemons rpm (the server don't have internet connection or local repo for Yum installation)
- When we uncheck the Backup for the VM from integration services setting (Hyper-V Manager) the VM doesn't get hung and backup gets completed as well. Any specific reason for it?
Thanks again for reply and help.
Regards, Milind Koyande
- Edited by Milind Koyande Wednesday, August 24, 2016 12:41 PM
-
-
Milind,
"hyperv-daemons" is in the Red Hat repository starting with RHEL 6.6 and you install it by "yum install hyperv-daemons" as root. It is Red Hat's version of the daemons for Linux Integration Services that matches the kernel you are running.
If you install the Linux integration Services download it will install new versions of the Hyper-V kernel modules as well as the daemons, but Red Hat will not support your system if your kernel is tainted by this package. This is why there's two different ways to get LIS on Red Hat, CentOS, or Oracle Linux with the RHCK: built-in or download.
If you do not have Red Hat installation media, a local yum installation, you might have to go get the file from access.redhat.com. Go to Downloads, look for Packages, choose your version, search for "hyperv" and "hyperv-daemons" should be on the list.
Red Hat announced the inclusion of this package here: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/6.6_Technical_Notes/RHEA-2014-1439.html
Joshua R. Poulson, Program Manager, Microsoft Open Source Technology Center
-
-
Hello Josh,
Thanks again for your reply and I apologize for <g class="gr_ gr_76 gr-alert gr_gramm gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" data-gr-id="76" id="76">delay</g> in reply.
We have checked with <g class="gr_ gr_97 gr-alert gr_gramm gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" data-gr-id="97" id="97">backup</g> vendor and found that there was an old checkpoint which was created a long time ago. Vendor suggested to delete it and post which we are not facing any Hung issue but now CPU utilization on the VM increases when we perform SnapShot.
Any specific reason for this behavior?
Regards, Milind Koyande
-