Bluescreen on NETIO.SYS after upgrade to Server 2008 R2 SP1 on UAG DA Cluster

Răspuns Bluescreen on NETIO.SYS after upgrade to Server 2008 R2 SP1 on UAG DA Cluster

  • Tuesday, March 15, 2011 9:38 AM
     
     

    Hi there,

    We upgraded our UAG DA Cluster to R2 SP1 last week (the UAG servers are HyperV guests) and use the dynamic memory feature. Now the array master bluescreens multiple times a day with a stop d1 on netio.sys (IRQ not less or equal). Anyone familiar with this problem?

    Regards,

    Alfred


    Regards, Alfred

All Replies

  • Tuesday, March 15, 2011 10:44 AM
     
     

    We have the same issue. Dump details below:

    DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high.  This is usually
    caused by drivers using improper addresses.
    If kernel debugger is available get stack backtrace.
    Arguments:
    Arg1: 00000008000000e1, memory referenced
    Arg2: 0000000000000002, IRQL
    Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
    Arg4: fffff88001001bd0, address which referenced memory


    STACK_COMMAND:  kb

    FOLLOWUP_IP:
    NETIO!WfpNblInfoGet+0
    fffff880`01001bd0 488b81e0000000  mov     rax,qword ptr [rcx+0E0h]
    SYMBOL_STACK_INDEX:  3
    SYMBOL_NAME:  NETIO!WfpNblInfoGet+0
    FOLLOWUP_NAME:  MachineOwner
    MODULE_NAME: NETIO
    IMAGE_NAME:  NETIO.SYS
    DEBUG_FLR_IMAGE_TIMESTAMP:  4ce79381
    FAILURE_BUCKET_ID:  X64_0xD1_NETIO!WfpNblInfoGet+0
    BUCKET_ID:  X64_0xD1_NETIO!WfpNblInfoGet+0

  • Tuesday, March 15, 2011 10:51 AM
     
     

    Well here's mine, seems the same issue indeed.

    DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high.  This is usually
    caused by drivers using improper addresses.
    If kernel debugger is available get stack backtrace.
    Arguments:
    Arg1: 0000000b000000e8, memory referenced
    Arg2: 0000000000000002, IRQL
    Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
    Arg4: fffff88000e01bd0, address which referenced memory

    Debugging Details:
    ------------------

    READ_ADDRESS: GetPointerFromAddress: unable to read from fffff800018c70e8
     0000000b000000e8

    CURRENT_IRQL:  2

    FAULTING_IP:
    NETIO!WfpNblInfoGet+0
    fffff880`00e01bd0 488b81e0000000  mov     rax,qword ptr [rcx+0E0h]

    CUSTOMER_CRASH_COUNT:  1

    DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

    BUGCHECK_STR:  0xD1

    PROCESS_NAME:  System

    STACK_COMMAND:  kb

    FOLLOWUP_IP:
    NETIO!WfpNblInfoGet+0
    fffff880`00e01bd0 488b81e0000000  mov     rax,qword ptr [rcx+0E0h]

    SYMBOL_STACK_INDEX:  3
    SYMBOL_NAME:  NETIO!WfpNblInfoGet+0
    FOLLOWUP_NAME:  MachineOwner
    MODULE_NAME: NETIO
    IMAGE_NAME:  NETIO.SYS
    DEBUG_FLR_IMAGE_TIMESTAMP:  4ce79381
    FAILURE_BUCKET_ID:  X64_0xD1_NETIO!WfpNblInfoGet+0
    BUCKET_ID:  X64_0xD1_NETIO!WfpNblInfoGet+0
    Followup: MachineOwner


    Regards, Alfred
  • Tuesday, March 15, 2011 11:26 AM
     
     

    And we are using dynamic memory... next time it goes will turn it off for the VM and see if any change.

    Marcus

  • Tuesday, March 15, 2011 11:55 AM
     
     

    Nope, BSOD with static memory configuration for the VM.

  • Tuesday, March 15, 2011 11:58 AM
     
     
    Ok, thanks, that saves me some time testing the same. Have the server in drain mode now, see if it keeps running....
    Regards, Alfred
  • Tuesday, March 15, 2011 1:36 PM
     
     

    Hi All. I had an issue with netio.sys but it was due to an antivirus running in the UAG server that conflicted with TMG packet filter (the antivirus had a network protection feature). To recover the BSOD I had to start safe mode and manually uninstall the antivirus

    Hope it helps


    // Raúl - I love this game
  • Tuesday, March 15, 2011 1:40 PM
     
     

    Have ruled out AV on the server. Now nothing on it.

    Although when I turned dynamic memory off I had a BSOD soon after I havent had one since - EDIT: actully ignore that, its just gone again.

    Only seems to crash when directaccess is in use. Is fine overnight, but starts crashing near 9am each day.

    Marcus

  • Tuesday, March 15, 2011 1:58 PM
     
     
    Same here, in drain mode it seems to keep running. As soon as we take the server into production the bsod's start.
    Regards, Alfred
  • Thursday, March 17, 2011 8:32 AM
     
     
    Well in drain mode it keeps running without any bsod's. I'll open a case with MS support.
    Regards, Alfred
  • Thursday, March 17, 2011 10:05 AM
     
     

    Let me know how you get on. If you need another instance to compare and confirm its not your setup let me know.

    Marcus

  • Thursday, March 17, 2011 11:28 AM
     
     
    Ok, thanks, I'll keep you posted.
    Regards, Alfred
  • Tuesday, March 22, 2011 9:10 AM
     
     

    Any progress? May have to open a case...

    Marcus

  • Tuesday, March 22, 2011 2:45 PM
     
     

    Well, opened a case last friday, still no response.... I'll call them tomorrow.


    Regards, Alfred
  • Thursday, March 24, 2011 12:59 PM
     
     

    Well, seems to be quite busy at the UAG support department, we're third on the priority list....

     


    Regards, Alfred
  • Friday, March 25, 2011 10:36 AM
     
     

    Not good support!!! Some days it doesnt go at all. Yesterday was fine all day. Today gone twice in 20mins.

    Marcus

  • Tuesday, March 29, 2011 2:23 PM
     
     

    Ok, our case is under investigation right now. The engineer suggested a possible workaround to maintain high availability which we are testing right now.

    Hereby the suggested workaround;

    Run the following command from an elevated command line:

    netsh tmg set global name=disablendisregistration value=1 persistent

     

    Restart all TMG services (including fweng driver) – “Net Stop fweng”, “Net Start fwsrv”.  There is no need to reboot.

     

    This workaround may reduce network performance on the UAG server, but will hopefully at least let you run your full array as a short term workaround while we analyse the data.

    I'll keep you posted.


    Regards, Alfred
  • Wednesday, March 30, 2011 8:59 AM
     
     

    Hi,

    I had the same issue, taking a detailed look into the dump told me that not netio.sys was the cause but fweng is causing the issue. Even if netio.sys is the one that is 'seen' on the BSOD.

    After some research i've seen that the files of the TMG aren't up to date, especially fweng.sys was too old. Manually installing this Update Rollup http://support.microsoft.com/kb/2498770/en-us solved my problem, the UAG ist not crashing any more!

  • Tuesday, May 03, 2011 9:51 AM
     
     

    Just received an update from MS support. This issue is confirmed as a bug. A fix would require both a Windows fix as a TMG fix so guess this will take some time.

    As a temporary workaround:

    "netsh tmg set global name=disablendisregistration value=1 persistent"


    Regards, Alfred
  • Tuesday, May 03, 2011 2:40 PM
     
     

    We have implemented this technology at my company.  We have already had two servers BSOD with the NETIO.sys and D1 issue.  Can you give me any more information about the bug fix you are working with MS on?  I would like to engage my MS resources to investigate this as well.

    Thanks,

    Barry

  • Wednesday, May 04, 2011 2:52 PM
     
     

    Hi Barry,

    From the latest contact I had with MS I can tell you that there won't be a fix shortly and probably there won't be a fix at all except from the workaround disabling ndisregistration. This because of the complexity of the cause of the issue. We are currently testing the suggested workaround.

     

     


    Regards, Alfred
  • Thursday, May 05, 2011 3:56 PM
     
     

    Alfred,  Is there any other information you can give me so my MS Tech that is currently here on site look up the case.  He has looked in the bug fix area and a couple of other areas and is not finding this case.  Just need something so he can find it and report our issues as well, I hate to create a case and start all over with troubleshooting this issue if you already have a working case.

    Thanks,

    Barry

  • Friday, May 06, 2011 12:58 PM
     
     
    Sure its case "111031747139416". Currently assigned to the TMG development team.
    Regards, Alfred
  • Monday, May 09, 2011 1:41 PM
     
     Answered

    Hello,

    bug check code 0xD1: http://msdn.microsoft.com/fr-fr/library/ff560244(v=VS.85).aspx

    That means a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.

    Please use Microsoft Skydrive to upload dump files (c:\windows\minidumps) and post a link here.

    Start by updating all possible drivers.

    You can also contact Microsoft CSS.

     


    This posting is provided "AS IS" with no warranties or guarantees , and confers no rights.

    Microsoft Student Partner
    Microsoft Certified Professional
    Microsoft Certified Systems Administrator: Security
    Microsoft Certified Systems Engineer: Security
    Microsoft Certified Technology Specialist: Windows Server 2008 Active Directory, Configuration
    Microsoft Certified Technology Specialist: Windows Server 2008 Network Infrastructure, Configuration

    Microsoft Certified Technology Specialist: Windows Server 2008 Applications Infrastructure, Configuration

  • Wednesday, May 11, 2011 11:31 AM
     
     

    Hello,

    We had exactly same symptoms after 2008 R2 SP1 installation yesterday(three random BSOD's in 6 hours). BSOD said NETIO.sys(from full dump) and fweng.sys(from minidump) was causing this issue. We didn't have Software Update 1 Rollup 3 for TMG installed. After installing Software Update 1 Rollup 3 for TMG, server has not crashed anymore. I keep my fingers crossed that BSOD problem is gone.

    Our setup is:

    - Single Windows 2008 R2 SP1 and all latest Windows updates
    - UAG SP1 with KB2475733 hotfix
    - TMG SP1 with Software Update 1 Rollup 3 for SP1


    -Hude-

  • Tuesday, May 24, 2011 4:06 PM
     
     

    "netsh tmg set global name=disablendisregistration value=1 persistent"

     

    How do we reverse this fix?

     

    Thanks

  • Tuesday, May 24, 2011 5:18 PM
     
     

    Have you had any other BSOD's since impletmenting Software Update 1 Rollup 3 for TMG?

    Thanks,

    Barry

  • Tuesday, May 24, 2011 5:18 PM
     
     
    Just curious, did this fix not work for you?  Why are you wanting to reverse it?
  • Tuesday, May 24, 2011 6:23 PM
     
     

    Hi,

    Yes our BSOD's are gone after Update 1 Rollup 3.

    To reverse this disablendisregistration fix run "netsh tmg set global name=disablendisregistration value=0 persistent".

    To check what is the current status of this setting run "netsh tmg show global name=disablendisregistration".

     

    I never even tried this disablendisregistration fix because it didn't sound like a real fix.


    -Hude-

  • Wednesday, May 25, 2011 12:39 PM
     
     

    Hi,

    This fix allowed me to be able to boot the server to be able to run the Update 1 Rollup 3. From what is stated here, the fix degrades performance, so I would like to reverse the fix to see if Rollup 3 fixed the original issue without degrading performance.

    Thanks for the reply...

  • Thursday, May 26, 2011 8:07 AM
     
     

    Initially a MS engineer suggested that implementing the fix reduces network performance, we actually have a confirmed statement from the TMG developement team that there is no performance impact when disabling ndisregistration on UAG. From their explanation our issue was a know bug and will not be resolved bij any fix other then disabling ndisregistration.


    Regards, Alfred
  • Thursday, May 26, 2011 12:56 PM
     
     

    Alfred,

    Thank you for the information. It is nice to be able to find such quick responses from people who have experienced these issues.  I find it interesting that after installing TMG on another drive in the same server using Windows Server 2008 vs. 2008R2 I have not experienced the same issues.

    I also have unresolved issues with reports that do not appear in the older Server 2008 version.  I was just hoping that these issues would be resolved quickly.

    Thanks again for your feedback.

    Mike

     

     

  • Thursday, May 26, 2011 12:58 PM
     
     

    We have a 6 server farm all Windows 2008 R2 servers, all the same H/W,  and this issue only affected 2 of the servers, the other four servers have never experienced this issue.

    Barry

  • Monday, June 06, 2011 6:17 PM
     
     

    What service needs to be restarted to make this take effect? 

    Thanks,

    Barry

  • Sunday, June 19, 2011 11:28 AM
     
     
    Stop/Start all Microsoft Forefront TMG services apparently, this workaround disables the 6TO4 interface so no good if running an IPv6 only directaccess and not UAG which we are (wrong forum section I know but did not find any other refs).  MS. any news on a permanent fix?
    BobK ;)
  • Monday, June 20, 2011 11:26 AM
     
     

    Server has now been up 24 + hours, no apparent problems...

    History:

    Installed SP1 and 2 further updates.

    After 15 mins Blue screen 0x000000D1, then again after 10 minutes

    Removed latest security update post SP1, blue screen again after 15 minutes

    Removed second security update post sp1, blue screen after 9 hours and again after 1 hour

    Disabled ndis registration (as above), reboot, this disabled 6TO4 interface and thus DirectAccess'.

    Re-enabled ndis registration and re-running DirectAccess setup, no reboot.

    Server still up...


    BobK ;)
  • Tuesday, October 04, 2011 4:19 PM
     
     Proposed

    Hello Everyone,

    this is Balint from MS support. Although we have not finished investigation, I would like to provide some information on this issue.

    In case you are running into the above referenced blue screen, please try updating  to the latest bits first, i.e. as of now apply:

    - Forefront UAG SP1 Rollup 1 KB2475733

    - Forefront TMG 2010 Rollup 4 KB2517957

    Would you see that the issue is not fixed, you can anytime try disabling ndis integration as mentioned before "netsh tmg set global name=disablendisregistration value=1 persistent".  As stated by others,  after this (perhaps only after reboot) you might see that the 6to4 adapter is disconnected and therefore you will also get Ipsec DosP errors.This seems to be caused by a boot-time timing issue, the workaround of which is to schedule an "advanced task" which runs 15 minutes after reboot and restarts the iphelper service. Bottom line is though that the first step (updating to the latest bits) should eliminate the issue and disabling ndis integration should only be a last resort.

    • Proposed As Answer by Ronny de Jong Tuesday, October 04, 2011 10:15 PM
    •  
  • Tuesday, October 04, 2011 10:17 PM
     
     

    Hello Everyone,

    this is Balint from MS support. Although we have not finished investigation, I would like to provide some information on this issue.

    In case you are running into the above referenced blue screen, please try updating  to the latest bits first, i.e. as of now apply:

    - Forefront UAG SP1 Rollup 1 KB2475733

    - Forefront TMG 2010 Rollup 4 KB2517957

    Would you see that the issue is not fixed, you can anytime try disabling ndis integration as mentioned before "netsh tmg set global name=disablendisregistration value=1 persistent".  As stated by others,  after this (perhaps only after reboot) you might see that the 6to4 adapter is disconnected and therefore you will also get Ipsec DosP errors.This seems to be caused by a boot-time timing issue, the workaround of which is to schedule an "advanced task" which runs 15 minutes after reboot and restarts the iphelper service. Bottom line is though that the first step (updating to the latest bits) should eliminate the issue and disabling ndis integration should only be a last resort.


    By applying both Forefront UAG and TMG updates mentioned by Balint the situation is stable for at least a day. This compared with approx 100 BSOD for 2 days when the problem occurred.
    Ronny de Jong | inovativ.nl | Blog: donnystyle.wordpress.com | Twitter: twitter.com/ronnydejong
  • Tuesday, December 06, 2011 3:40 PM
     
     
    If implementing - Forefront UAG SP1 Rollup 1 KB2475733 - Forefront TMG 2010 Rollup 4 KB2517957 Do you have to reverse the disablendisregistration setting? "netsh tmg set global name=disablendisregistration value=1 persistent". If so is the value supposed to be 0?
  • Wednesday, February 08, 2012 8:41 PM
     
     
     

    Hello,

    Have experienced the same problem today with Windows Server 2008 R2 SP1 with
    UAG 2010 SP1 Update 1
    TMG 2010 SP2

    We got consequently Blue Screen (BSOD). If we started the machine in Safe Mode and configured
    the Microsoft TMG Firewall service to manual, we were successfully able to bring the server online.
    If we started the Microsoft TMG Firewall service, the machine got BSOD.

    Thanks to good support from Balint, we tested his workaround, which works.

    But we also discovered a new possible better workaround, try upgrading the NIC Drivers.


    We had very old NIC HP NC532i Dual Port 10GbE drivers, with version 5.0.13.0 with date 30.07.2009. Upgraded them to version 6.2.9.0 with date 04.02.2011. This resolved our issue.

    We have yet to test this a couple of days in production before I can guarantee that this works stabile, but the BSOD disappeared after the NIC driver upgrade

    Those of you using Balint's great workaround, check out if you have old NIC drivers.

    That might help :)


    Best Regards Anders Horgen

    • Proposed As Answer by Anders Horgen Wednesday, February 08, 2012 8:41 PM
    • Unproposed As Answer by Anders Horgen Thursday, February 09, 2012 11:38 AM
    •  
  • Thursday, February 09, 2012 11:39 AM
     
     

    Hello,

    ref my post above.
    The workaround by upgrading the NIC drivers to the latest and greatest,
    worked unforuntately only 12hrs, before BSOD returned.

    We have now using the suggested wokrarond:

    netsh tmg set global name=disablendisregistration value=1 persistent


    Best Regards Anders Horgen

  • Thursday, February 09, 2012 1:42 PM
     
     

    I don't believe the above are the latest updates.  According to my records and what I am getting ready to update to,  the following are the latest updates:

    -Forefront UAG SP1 Update 1 KB2585140

    -Forefront TMG 2010 SP2 KB2555840


    • Edited by barmour88 Thursday, February 09, 2012 4:36 PM
    •  
  • Thursday, April 12, 2012 8:53 AM
     
     

    Just updated to Windows 2k8 R2 sp1 and i have same issue looks like.

    UAG version : sp1  without rollup 

    TMG version : sp2 without rollup

    Should i patch as well both rollup updates ?

    at the moment, i use the work around provided by Balint_PSS

    • Proposed As Answer by Michael Joss Thursday, April 12, 2012 3:16 PM
    • Unproposed As Answer by Michael Joss Thursday, April 12, 2012 3:16 PM
    •  
  • Thursday, April 12, 2012 3:16 PM
     
     Proposed

    for your informations, i applied this fix and i'm not seeing BSOD again.

    http://support.microsoft.com/?id=2664888

    Regards,

    Mike

    • Proposed As Answer by Michael Joss Thursday, April 12, 2012 3:16 PM
    •  
  • Thursday, April 12, 2012 4:59 PM
     
     

    I applied the following and have not had a BSOD on any of our 5 servers

    TMG 2010 SP2

    Forefront UAG SP1 Update 1

    Rollup 1 for Forefront Unified Access Gateway (UAG) 2010 Service Pack 1 Update 1

    Barry

  • Tuesday, June 05, 2012 7:10 AM
     
     

    Hi there,

    also BSOD here, last this morning. TMG and UAG latest patches and updates. I cannot install http://support.microsoft.com/?id=2664888 it's saying update is not applicable to your computer.

    Any ideas? I did not try the workaround yet...

    Marcus

  • Friday, August 10, 2012 1:07 AM
     
     

    Marcus,

    You need to install SP1 and Sp1 update 1 before you can install Sp2.

    All,

    Fresh build, Win2k8r2 - VMware

    BSOD.

    1. Safe Mode - apply netsh command as above (reboot)

    2. Reg mode - install SP1, sp1 update 1, then Sp2.

    3. Then installed the HOTFIX above and rebooted.

    4. Booted good.  Tried to reverse the netsh as above back out.  (Reboot into BSOD)

    5. (Safe Mode/No Networking) and put netsh BACK IN.  Reboot.

    6. Working now - and i'm leaving the crazy nesh command in there, screw it.


    if my post is helpful - please click on the green arrow. (please excuse, in advance, any perceived sarcasm/humor - as I often forget it does not translate through text) :)

  • Monday, September 17, 2012 8:47 AM
     
     

    Hi all,

    Te following article has been released documenting this problem:

    http://support.microsoft.com/kb/2732485

    Thanks

    Balint