locked
CentOS 6.3 & Fast Clock (>1 sec per min) RRS feed

  • Question

  • I hope to get some assistance as I have been searching the web high and low and just can't seem to find an answer that works for me. I find plenty of folks with similar problems, but all the combinations of solutions I have tried just don't seem to make a difference on my system.

    A little background. I am running CentOS 6.3 on Windows 2008 R2 Hyper-V

    uname -a:
    Linux linux01.home 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

    My system clock seems to gain 1 second every minute, which is well beyond what ntpd can keep in check. The forum searches, I have tried several different clock sources including:
    tsc
    acpi_pm
    hyperv_clocksource - I would expect this one to work.

    I am running the latest version of the Hyper-V Integration Components from MS downloads (Linux Integration Services Version 3.4 for Hpyer-V)
    I have tried removing them in combination to the various clock sources.

    I also tried several other kernel options (in various combinations based on forum posts):
    notsc divider=10 noapm acpi.power_nocheck=1 nousb 

    The DMESG posted when trying to force tsc is funny...
    [root@linux01 ~]# dmesg |grep clock
    Command line: ro root=/dev/mapper/vg_linux01-lv_root rd_NO_LUKS rd_LVM_LV=vg_linux01/lv_swap LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_linux01/lv_root SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb divider=10 nousb clocksource=tsc
    Kernel command line: ro root=/dev/mapper/vg_linux01-lv_root rd_NO_LUKS rd_LVM_LV=vg_linux01/lv_swap LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=vg_linux01/lv_root SYSFONT=latarcyrheb-sun16 crashkernel=129M@0M KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb divider=10 nousb clocksource=tsc
    * this clock source is slow. Consider trying other clock sources
    Switching to clocksource jiffies
    Switching to clocksource acpi_pm
    rtc_cmos 00:02: setting system clock to 2012-12-19 22:45:21 UTC (1355957121)
    Refined TSC clocksource calibration: 3072.584 MHz.
    Override clocksource tsc is not HRT compatible. Cannot switch while in HRT/NOHZ mode
    Switching to clocksource tsc
    hv_timesource: Registering HyperV clock source
    Switching to clocksource hyperv_clocksource

    If I could only find a slower clock 

    PHYSICAL HARDWARE:
    ASUS P6X58D-E MotherBoard
    Intel i7-950 Quad Core (HT) @ 3.07GHz
    24GB RAM

    VIRTUAL HARDWARE:
    2 CPU (SMP)
    2GB RAM

    [root@linux01 ~]# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
    tsc acpi_pm 

    [root@linux01 ~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
    hyperv_clocksource

    Output of ntpdate showing how consistently fast she is 
    19 Dec 22:22:06 ntpdate[14773]: step time server 208.87.107.28 offset -1.035373 sec
    19 Dec 22:23:06 ntpdate[14823]: step time server 209.114.111.1 offset -1.035550 sec
    19 Dec 22:24:06 ntpdate[14873]: step time server 208.68.36.196 offset -1.016289 sec
    19 Dec 22:25:06 ntpdate[14927]: step time server 69.65.40.29 offset -1.023044 sec
    19 Dec 22:26:06 ntpdate[14977]: step time server 173.255.224.22 offset -1.041329 sec
    19 Dec 22:27:06 ntpdate[15027]: step time server 209.114.111.1 offset -1.032835 sec
    19 Dec 22:28:06 ntpdate[15077]: step time server 72.26.198.233 offset -1.012450 sec
    19 Dec 22:29:06 ntpdate[15127]: step time server 72.26.198.233 offset -1.045312 sec
    19 Dec 22:30:06 ntpdate[15181]: step time server 209.114.111.1 offset -1.036409 sec
    19 Dec 22:31:06 ntpdate[15231]: step time server 72.26.198.233 offset -1.022421 sec
    19 Dec 22:32:06 ntpdate[15281]: step time server 208.68.36.196 offset -1.024732 sec
    19 Dec 22:33:06 ntpdate[15331]: step time server 173.255.224.22 offset -1.035198 sec
    19 Dec 22:34:06 ntpdate[15381]: step time server 209.177.158.233 offset -1.015114 sec

    [root@linux01 ~]# hwclock ; date
    Wed 19 Dec 2012 11:29:47 PM EST -1.014702 seconds
    Wed Dec 19 23:29:14 EST 2012


    The clock on the host is rock solid, and other guests (including a FreeBSD one) don't seem to have this issue. I also installed a second CentOS guest, plan jane and unpatched, and had the same issue right out of the box.

    Any help would be appreciated. I would be glad to send more data.
    Thursday, December 20, 2012 5:00 AM

All replies

  • I also have this problem on the same setup. I did a fresh install of CentOS 6.3 running on Hyper-V 2008R2 with LIC 3.4. The CentOS VM has about the same clock gains as you are describing so i'm guessing there's a bug here.

    If anyone has an idea or need some more information then i would also be happy to supply it.


    Monday, January 14, 2013 6:29 AM
  • Glad to see I'm not the only one with the problem.  Hopefully it will be marked as a bug and someone will look into it.  Seems to me the host integration services should take care of ensuring that the guest clock stays in sync with the host.

    I was able to find a workaround, but I'm not sure if it will affect stability of the system.  So far it seems to be working for me but I have to keep close watch on the clock as it still goes out of control on some occasions. 

    The short answer is I used 'tickadj' to slow down my clock.  You will need to play with values to find what works for you.  The default starts at 10000.  To slow your clock, reduce this number. 

    This forum won't let me post links, but if you do a google search on NTP-s-trouble.htm#Q-CORRECT-TICK the first link will show details on where I found my fix, and also teach you more than you ever wanted to know about NTP.  Scroll down to the section called "8.2.5.1. How do I set the correct value for tick?"

    To give it a try, follow these steps:

    service ntpd stop
    rm /var/lib/ntp/drift
    tickadj 9985
    ntpdate pool.ntp.org
    hwclock -w
    service ntpd start


    Since the tickadj command is not persistent, I added it to my ntpd startup script:

    # Start daemons.
     echo -n $"Starting $prog: "
    tickadj 9985
     daemon $prog $OPTIONS
     RETVAL=$?
     echo
     [ $RETVAL -eq 0 ] && touch $lockfile
     return $RETVAL

    The other thing you are going to want to do is add the following to your /etc/ntp.conf file:

    tinker panic 0
    server 0.centos.pool.ntp.org iburst
    server 1.centos.pool.ntp.org iburst
    server 2.centos.pool.ntp.org iburst
    server 3.centos.pool.ntp.org iburst
    

    The first command will let NTP make wild adjustments to your clock.  Be aware this is not good at all, but otherwisw NTP will just give up once your clock drifts 500ms. 

    The "iburst" parameter tells NTP to do an aggressive startup to get the clock aligned at startup (or something like that :-).

    ***Use at your own risk*** Some blogs say that messing with the tickadj value could lead to system instability.  Hopefully a real fix will be released soon.

    To monitor, I used 'ntpdate -q pool.ntp.org' on an interval of about once a minute for a few hours to see what direction the clock was moving and refine adjustment.  Monitor your drift file, and once it sets a sensible value (between -500 and 500) you should be good.  If you change the value, repeat the steps to reset everything including deleting the drift file.

    Good luck!

    Monday, January 14, 2013 3:56 PM
  • same problem running centos 5.8 on hyper-v 2008 r2

    Friday, January 18, 2013 11:33 AM
  • Hi there! I am from the Linux Integration Services team at Microsoft. We noticed your post. Apologies for the delay on replying to this. Our first guess is that it could be a setup issue or something related to the hardware you are using. In our testing we have run CentOS6.3x64 VM (with LIS 3.4, NTP configured ) 80% CPU utilized continuously for 20hrs without losing a single second. So we will have to work a bit with you to sort out the problem.

    We have a couple of follow up questions:

    a) What workload are you testing? and how many vcpu’s are attached to the VM? 

    b) Is it possible for you to run an emulated (no LIS) test on CentOS6.3 with one cpu and no NTP? Later could you repeat the test with LIS drivers installed with one CPU and no NTP? A final test we would like to do is to repeat the test with LIS drivers installed with one CPU and NTP enabled. Once all the tests are run please could you summarize the differences you see?

    Thank your for helping us with this effort.

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation


    Wednesday, January 23, 2013 8:12 PM
  • @Abhishek

    NTP was NOT required in the older versions of LIC for time sync yet I see recommendations from Microsoft to use NTP these days.  The only reason for this advice that I can think of is that the current time sync driver does not work effectively

    My findings are with openSUSE but the problem is also confirmed in SLES which is a supported OS like centOS

    openSUSE 11.4 + LIC 2.1 time sync perfect uptime 119 days

    openSUSE 12.2 (drivers in kernel) have to use NTP, gaining about 40 secs/day

    The above on same host  OS is 2008R2 up to date patches etc

    Also confirmed by Olaf Hering

    sles11sp2: uptime 23:45 hours, its 38 seconds ahead
    12.2: uptime 1 day, 5:17 hours, its 47 seconds ahead

    which of course breaks kerberos after  a while.

    Thanks Mike

    Wednesday, January 23, 2013 10:32 PM
  • Hi Mike, Thanks for replying. Let me check with my team to see if this was indeed the case.

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation

    Wednesday, January 23, 2013 10:59 PM
  • Hi again, is it possible for either of you to post the output of cat /proc/cpuinfo for your system? It is likely that your systems do not support invariant TSC and that is why you are seeing the time drift. Please paste the output of this command so we can make a determination.

    Thanks,

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation

    Monday, January 28, 2013 11:02 PM
  • Hi again, is it possible for either of you to post the output of cat /proc/cpuinfo for your system? It is likely that your systems do not support invariant TSC and that is why you are seeing the time drift. Please paste the output of this command so we can make a determination.

    Thanks,

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation

    My apologies, I have not been able to run through the test scenario you provided, but I hope to get to it this week.  Here is the output from cpuinfo:

     cat /proc/cpuinfo
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 26
    model name      : Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz
    stepping        : 5
    cpu MHz         : 3072.547
    cache size      : 8192 KB
    physical id     : 0
    siblings        : 2
    core id         : 0
    cpu cores       : 2
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 11
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
    bogomips        : 6145.09
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:

    processor       : 1
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 26
    model name      : Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz
    stepping        : 5
    cpu MHz         : 3072.547
    cache size      : 8192 KB
    physical id     : 0
    siblings        : 2
    core id         : 1
    cpu cores       : 2
    apicid          : 1
    initial apicid  : 1
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 11
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good unfair_spinlock pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
    bogomips        : 6145.09
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:

    Monday, January 28, 2013 11:09 PM
  • Hi again, is it possible for either of you to post the output of cat /proc/cpuinfo for your system? It is likely that your systems do not support invariant TSC and that is why you are seeing the time drift. Please paste the output of this command so we can make a determination.

    Thanks,

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation

    Please note a guest on the same host keeps perfect time LIC 2.1.  I only have this problem in later hyper-v drivers.

    Many Thanks Mike (cpuinfo below)

    aeslinux01:~ # cat /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 44
    model name : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
    stepping : 2
    microcode : 0xffffffff
    cpu MHz : 2393.372
    cache size : 12288 KB
    physical id : 0
    siblings : 1
    core id : 0
    cpu cores : 1
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 11
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc up rep_good nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm
    bogomips : 4786.74
    clflush size : 64
    cache_alignment : 64
    address sizes : 40 bits physical, 48 bits virtual
    power management:

    aeslinux01:~ # 

    Tuesday, January 29, 2013 10:31 PM
  • Hi Abhishek It's been a while just wondering if anyone had any ideas on this Thanks mike
    Monday, February 4, 2013 11:27 PM
  • Hi guys, Sorry for the delay. We are looking for a machine inside Microsoft to repro the issue. We have tried with various machines in our possession but no luck so far in reproing the issue. Seems like this is some new hardware. Is it possible for either of you to participate in an Easy Assist session? We will have one of our developers take a look at your box and possibly load a VM with a customer kernel to see if he can repro the problem. Please let me know your comfort level on this and I will try to arrange a debug session.

    Many thanks for your patience!

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation.

    Tuesday, February 5, 2013 10:09 PM
  • Hi abhishek I have a opensuse 12.2 you can use or load a different os/kernel if you like as it is not in production. i am GMT time zone though. My email is mike @ surcoufdot co uk if you want to arrange. Thanks Mike
    Tuesday, February 5, 2013 10:50 PM
  • Thanks Mike! I will follow up with you in a day or two.

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation

    Wednesday, February 6, 2013 8:07 PM
  • Hi folks,

    I need some more information. Are you guys trying out this scenario on a HP Z400 desktop machine? Can you get the output of msinfo32.exe from host partition (instead of VM)?. We have found a machine similar to the configuration you describe within Microsoft but would like to be sure. If we still cannot repro the issue then I will get in touch with one of you to setup an Easy Assist this week. Once again, apologies for the significant delay in the resolution of this issue.

    Thanks,

    Abhishek Gupta

    PM, Linux Integration Services,

    Microsoft Corporation


    Monday, February 11, 2013 6:50 PM
  • Hi Abhishek

    My machine is s Dell Poweredge R710.

    OS Name Microsoft Windows Server 2008 R2 Enterprise
    Version 6.1.7601 Service Pack 1 Build 7601
    Other OS Description  Not Available
    OS Manufacturer Microsoft Corporation
    System Name AESVIRTUAL01
    System Manufacturer Dell Inc.
    System Model PowerEdge R710
    System Type x64-based PC
    Processor Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz, 2261 Mhz, 4 Core(s), 8 Logical Processor(s)
    Processor Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz, 2261 Mhz, 4 Core(s), 8 Logical Processor(s)
    BIOS Version/Date Dell Inc. 2.1.9, 21/05/2010
    SMBIOS Version 2.6
    Windows Directory C:\Windows
    System Directory C:\Windows\system32
    Boot Device \Device\HarddiskVolume1
    Locale United Kingdom
    Hardware Abstraction Layer Version = "6.1.7601.17514"
    User Name Not Available
    Time Zone GMT Standard Time
    Installed Physical Memory (RAM) 48.0 GB
    Total Physical Memory 48.0 GB
    Available Physical Memory 2.24 GB
    Total Virtual Memory 52.0 GB
    Available Virtual Memory 5.59 GB
    Page File Space 4.00 GB
    Page File C:\pagefile.sys

    Tuesday, February 12, 2013 11:59 AM
  • Hi Abhishek

    Its been almost 2 weeks since I posted the info you require.

    Sorry for chasing but did you get any further with this

    Many Thanks

    Mike

    Monday, February 25, 2013 12:52 PM
  • Hi Mike, Sorry for the delay. My support personnel will get to you right away. I am forwarding your email to him.

    Abhishek

    Thursday, February 28, 2013 8:19 PM
  • Hi,

    Is there any news on this issue? I've been following this thread for a while in hope of af public hotfix becomming available, can you tell me if that's likely to happen?

    Monday, March 18, 2013 7:35 AM
  • Try following link:

    Correcting Clock Drift On A CentOS VM Under Hyper-V On Server 2008 R2

    http://hardanswers.net/correct-clock-drift-in-centos-hyper-v

    also have this problem ,

    and resolved successfully  with information above.

    Thursday, March 28, 2013 8:36 AM
  • Abhishek,

    We are experiencing this issue with Hyper-V 2008 R2 and CentOS 6.4 on an HP server.  Is there any progress on this issue?

    Thanks,

    Dan

    Monday, April 29, 2013 7:06 PM