locked
Direct Access & SSTP VPN Server (2012 R2) with random BSODs / Blue Screens RRS feed

  • Question

  • Hey All,

    we have a virtualized (hyper-v server 2012 r2) Server 2012 R2 Datacenter with GUI which hosts DirectAccess and SSTP VPN.

    The VM radomly throws BSODs every few days.

    I attached two minidumps: https://www.dropbox.com/s/guzppeo2jeayq1c/TPGDA01-010814-16109-01.zip

    I believe the BSODs occure since we installed the machine - but I'm not sure. Probably the BSODs started after the VM was restored: When the VM was restored the virtual NIC has changed and we had to re-assign the Network Location Server's certificate with the new NIC.

    The NIC's (Hyper-V Network Adapter) driver version is 6.3.9600.16384

    Additional infos needed?

    Things we tried already:

    • move the VM to a different hyper-v host without success
    • disable QoS Packet Scheduler in the properties of the NIC

    Thx a lot!

    Monday, January 20, 2014 1:07 PM

Answers

All replies

  • anyone,

    a few more dumps: https://www.dropbox.com/s/46850gu6s1kz4y6/TPGDA01-020314-22171-01.dmp

    The eventlog shows a few Schannel errors like this, 3 minutes before the reboot/boot:

    A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 40. The Windows SChannel error state is 1205.

    This is the output of the minidump:

    Loading Dump File [C:\Users\adm_floriano\Desktop\Minidump\TPGDA01-020314-22171-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available
    
    Symbol search path is: srv*http://msdl.microsoft.com/download/symbols
    Executable search path is: 
    Windows 8 Kernel Version 9600 MP (2 procs) Free x64
    Product: Server, suite: TerminalServer DataCenter SingleUserTS
    Built by: 9600.16452.amd64fre.winblue_gdr.131030-1505
    Machine Name:
    Kernel base = 0xfffff801`aac74000 PsLoadedModuleList = 0xfffff801`aaf38990
    Debug session time: Mon Feb  3 20:22:59.224 2014 (UTC + 1:00)
    System Uptime: 15 days 16:53:01.514
    Loading Kernel Symbols
    ...............................................................
    ................................................................
    ........
    Loading User Symbols
    Loading unloaded module list
    ....
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    Use !analyze -v to get detailed debugging information.
    
    BugCheck D1, {10, 2, 0, fffff800011338b0}
    
    Probably caused by : wfplwfs.sys ( wfplwfs!WfpRioNblSendComplete+28 )
    
    Followup: MachineOwner
    ---------
    
    1: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high.  This is usually
    caused by drivers using improper addresses.
    If kernel debugger is available get stack backtrace.
    Arguments:
    Arg1: 0000000000000010, memory referenced
    Arg2: 0000000000000002, IRQL
    Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
    Arg4: fffff800011338b0, address which referenced memory
    
    Debugging Details:
    ------------------
    
    
    READ_ADDRESS: fffff801aaf26340: Unable to get special pool info
    fffff801aaf26340: Unable to get special pool info
    GetUlongFromAddress: unable to read from fffff801aafc1208
     0000000000000010 
    
    CURRENT_IRQL:  2
    
    FAULTING_IP: 
    wfplwfs!WfpRioNblSendComplete+28
    fffff800`011338b0 488b4910        mov     rcx,qword ptr [rcx+10h]
    
    CUSTOMER_CRASH_COUNT:  1
    
    DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT_SERVER
    
    BUGCHECK_STR:  AV
    
    PROCESS_NAME:  System
    
    TRAP_FRAME:  ffffd000207fefa0 -- (.trap 0xffffd000207fefa0)
    NOTE: The trap frame does not contain all registers.
    Some register values may be zeroed or incorrect.
    rax=ffffd000207ff198 rbx=0000000000000000 rcx=0000000000000000
    rdx=ffffe00003643020 rsi=0000000000000000 rdi=0000000000000000
    rip=fffff800011338b0 rsp=ffffd000207ff130 rbp=0000000000000000
     r8=0000000000000000  r9=ffffe000009a21a0 r10=ffffe000007f9600
    r11=ffffe00003644226 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0         nv up ei pl zr na po nc
    wfplwfs!WfpRioNblSendComplete+0x28:
    fffff800`011338b0 488b4910        mov     rcx,qword ptr [rcx+10h] ds:00000000`00000010=????????????????
    Resetting default scope
    
    LAST_CONTROL_TRANSFER:  from fffff801aadcd7e9 to fffff801aadc1ca0
    
    STACK_TEXT:  
    ffffd000`207fee58 fffff801`aadcd7e9 : 00000000`0000000a 00000000`00000010 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
    ffffd000`207fee60 fffff801`aadcc03a : 00000000`00000000 00000000`00000000 ffffd000`00000000 ffffd000`207fefa0 : nt!KiBugCheckDispatch+0x69
    ffffd000`207fefa0 fffff800`011338b0 : ffffe000`009bce20 ffffe000`00ae71c0 ffffe000`00a96020 ffffe000`00a9a310 : nt!KiPageFault+0x23a
    ffffd000`207ff130 fffff800`017dce6b : 00000000`00000000 00000000`00000002 00000000`0007c000 fffff800`00d5de0c : wfplwfs!WfpRioNblSendComplete+0x28
    ffffd000`207ff1a0 fffff800`00677120 : ffffe000`03643020 ffffe000`009a21a0 fffff800`01011370 ffffe000`00878d70 : rassstp! ?? ::KNDHLJHB::`string'+0x2b97
    ffffd000`207ff1d0 fffff800`0067709d : 00000000`00000001 ffffe000`00ae7001 00000000`00000000 00000000`00000001 : NDIS!ndisInvokeNextReceiveCompleteHandler+0x30
    ffffd000`207ff260 fffff800`00676f42 : ffffe000`009a21a0 00000000`00000000 ffffe000`03643020 ffffe000`01142b80 : NDIS!ndisReturnNetBufferListsInternal+0x12d
    ffffd000`207ff2c0 fffff800`0045f649 : 00000000`00000000 ffffe000`03751390 ffffe000`03643020 ffffe000`00991910 : NDIS!NdisReturnNetBufferLists+0x72
    ffffd000`207ff320 fffff800`00677120 : ffffe000`02917030 00000000`00000002 ffffe000`009a41a0 ffffe000`009a41a0 : ndiswan!MPReturnNetBufferListChain+0x109
    ffffd000`207ff370 fffff800`0067709d : 00000000`00000000 ffffe000`00a9b418 ffffe000`009bce20 ffffe000`00a9a020 : NDIS!ndisInvokeNextReceiveCompleteHandler+0x30
    ffffd000`207ff400 fffff800`00676f42 : ffffe000`009a41a0 00000000`00000000 ffffe000`02917030 ffffe000`01143590 : NDIS!ndisReturnNetBufferListsInternal+0x12d
    ffffd000`207ff460 fffff800`0167c0a0 : ffffe000`03d4a310 00000000`00000000 00000000`00000000 00000000`00000000 : NDIS!NdisReturnNetBufferLists+0x72
    ffffd000`207ff4c0 fffff800`00555501 : 00000000`00000000 00000000`00000000 ffffe000`02917030 00000000`00000000 : wanarp!FlpReturnNetBufferListChain+0x90
    ffffd000`207ff4f0 fffff800`005523e7 : 00000000`00000000 00000000`00000000 00000000`00000000 ffffe000`02917030 : NETIO!NetioDereferenceNetBufferList+0xb1
    ffffd000`207ff560 fffff800`00e54d28 : 00000000`00000000 ffffe000`023f0500 00000000`00000000 00000000`00000000 : NETIO!NetioDereferenceNetBufferListChain+0x2a7
    ffffd000`207ff620 fffff800`00eea4b9 : ffffe000`03dabc38 ffffe000`023f0590 ffffe000`03980540 ffffe000`00860010 : tcpip!IppCompleteAndFreePacketList+0xf8
    ffffd000`207ff6b0 fffff800`00ee5967 : 00000000`00000000 ffffd000`207ff810 00000000`002967db 00000000`00000001 : tcpip!IppCleanupMfe+0x39
    ffffd000`207ff6e0 fffff800`00e92372 : 00000000`00000001 ffffd000`207ff810 00000000`002967db ffffe000`00860010 : tcpip!IppDereferenceMfe+0x1f
    ffffd000`207ff710 fffff801`aad21f64 : ffffd000`207ffc60 fffff800`0101b880 ffffd000`207d5180 00000000`00000000 : tcpip!IppTimeout+0x96c
    ffffd000`207ffa00 fffff801`aad22478 : 00000000`00000001 ffffe000`00a81328 ffffd000`207d5180 ffffd000`207d8ae0 : nt!KiProcessExpiredTimerList+0x1d8
    ffffd000`207ffae0 fffff801`aad7e478 : ffffd000`207d5180 00000000`00989680 ffffd000`0315c033 00000000`0315c04b : nt!KiExpireTimerTable+0x218
    ffffd000`207ffb80 fffff801`aacceabc : ffffe000`00000000 00001f80`00000001 00000000`00000000 00000000`00000002 : nt!KiTimerExpiration+0x148
    ffffd000`207ffc30 fffff801`aadc57ea : ffffd000`207d5180 ffffd000`207d5180 00000000`00000000 ffffd000`207e11c0 : nt!KiRetireDpcList+0x19c
    ffffd000`207ffda0 00000000`00000000 : ffffd000`20800000 ffffd000`207fa000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a
    
    
    STACK_COMMAND:  kb
    
    FOLLOWUP_IP: 
    wfplwfs!WfpRioNblSendComplete+28
    fffff800`011338b0 488b4910        mov     rcx,qword ptr [rcx+10h]
    
    SYMBOL_STACK_INDEX:  3
    
    SYMBOL_NAME:  wfplwfs!WfpRioNblSendComplete+28
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: wfplwfs
    
    IMAGE_NAME:  wfplwfs.sys
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  5259df5e
    
    BUCKET_ID_FUNC_OFFSET:  28
    
    FAILURE_BUCKET_ID:  AV_wfplwfs!WfpRioNblSendComplete
    
    BUCKET_ID:  AV_wfplwfs!WfpRioNblSendComplete
    
    Followup: MachineOwner
    ---------
    

    Tuesday, February 4, 2014 12:05 AM
  • UPDATE:

    I watched a behavior that the blue screen occurs when many SSTP VPN client disconnect or were kicked by the admin's console.

    Any ideas?

    Wednesday, April 2, 2014 1:33 PM
  • Any Microsoft employees?
    Monday, April 7, 2014 8:36 AM
  • Still no update ?
    Tuesday, June 10, 2014 11:27 AM
  • Having the exact same issue (except not virtualizing).

    There typically aren't very many simultaneous VPN connections (maybe like 2-3 at most). The crashes generally happen in the night between 10PM and 6AM when there is not much use, although there may be some idle connections that users haven't terminated.

    At most, its been maybe a week without a crash... sometimes it happens within 24 hrs.

    Win Server 2012 R2 Essentials is running on a HP DL320e server with Broadcom NICs.



    • Edited by Murst Tuesday, June 17, 2014 2:50 PM
    Tuesday, June 17, 2014 2:46 PM
  • the BSODs happen randomly here:

    Date Entity Message
    2014-06-12 08:08:44 System Device rebooted after 2 days, 21m 4s
    2014-06-12 08:08:44 System Device status changed to Up
    2014-06-12 08:04:09 System Device status changed to Down (ping)
    2014-06-10 07:42:58 System Device rebooted after 26 days, 17h 42m 56s
    2014-06-09 02:33:10 System Device status changed to Up
    2014-06-09 02:28:14 System Device status changed to Down (ping)
    2014-05-27 02:48:18 System Device status changed to Up
    2014-05-27 02:43:22 System Device status changed to Down (ping)
    2014-05-25 02:13:17 System Device status changed to Up
    2014-05-25 02:08:31 System Device status changed to Down (ping)
    2014-05-22 10:08:46 System Device status changed to Up
    2014-05-21 13:46:29 System Device status changed to Down (ping)
    2014-05-14 13:58:43 System Device rebooted after 5h 11m 37s
    2014-05-14 08:43:24 System Device rebooted after 17h 48m 15s

    Since last BSOD i set the hardware-acceleration and IPSEC offload in Hyper-v for the virtual NIC off. we will see what it brings:

    Infos to the host

    srv2k12r2 datacenter

    HP Proliant ML110 G6  

    Intel Xeon X3430

    Windows NIC Team (1x HP NC107i, 1x Intel Pro 1000 PT < not connected at the moment)

    VirtualSwitch for VMs with ManagementOS allowed

    Tuesday, June 17, 2014 3:00 PM
  • forget about virtual machine queue - BSOD happend just a minute ago. Is there a chance to open a call and do a remote session? Is any MS employe watching this forums?

    Tuesday, June 17, 2014 3:09 PM
  • i moved the VM to a completely different hardware/hyper-v host. we will see :)
    Wednesday, June 18, 2014 10:34 AM
  • The host hardware move didn't change anything. Bluescreened 2 days ago.
    Monday, June 23, 2014 8:36 PM
  • The networking card was replaced in our HP server with an Intel i210 card. Since the errors seemed to be related to networking, the hope was that a different card would fix the problem.

    However, that did not fix the problem as the system just crashed. This seems to be a software bug with Anywhere Access / VPN.

    Friday, June 27, 2014 6:09 PM
  • The networking card was replaced in our HP server with an Intel i210 card. Since the errors seemed to be related to networking, the hope was that a different card would fix the problem.

    However, that did not fix the problem as the system just crashed. This seems to be a software bug with Anywhere Access / VPN.

    i agree, a complete hardware change of the hyper-v host didn't help.
    Saturday, June 28, 2014 7:16 AM
  • Suffering from the exactly same issue, as you had. Have you managed to resolve it? I don't have DA set up, it's just SSTP VPN server, R2, virtualised on R2 as well, thus, completely alike situation. Please, let me know if you succeeded in resolving the trouble.
    Tuesday, August 19, 2014 1:42 PM
  • Hey - no, still same issue!


    Any Ideas on this Microsoft?

    Tuesday, August 19, 2014 2:09 PM
  • Same Error here.

    VM 2012 R2 Essentials with SSTP VPN configured. I do have Nic Teaming.

    Couple of extra lines but the same otherwise.

    STACK_COMMAND:  kb

    FOLLOWUP_IP: 
    wfplwfs!WfpRioNblSendComplete+28
    fffff801`73cf0838 488b4910        mov     rcx,qword ptr [rcx+10h]

    SYMBOL_STACK_INDEX:  3

    SYMBOL_NAME:  wfplwfs!WfpRioNblSendComplete+28

    FOLLOWUP_NAME:  MachineOwner

    MODULE_NAME: wfplwfs

    IMAGE_NAME:  wfplwfs.sys

    DEBUG_FLR_IMAGE_TIMESTAMP:  531ae165

    BUCKET_ID_FUNC_OFFSET:  28

    FAILURE_BUCKET_ID:  AV_wfplwfs!WfpRioNblSendComplete

    BUCKET_ID:  AV_wfplwfs!WfpRioNblSendComplete

    ANALYSIS_SOURCE:  KM

    FAILURE_ID_HASH_STRING:  km:av_wfplwfs!wfprionblsendcomplete

    FAILURE_ID_HASH:  {acd39db7-d991-0a63-cc4b-f64ce58b7083}

    Followup: MachineOwner

    ---------



    • Edited by Jim Forde Sunday, September 14, 2014 12:47 AM
    Sunday, September 14, 2014 12:45 AM
  • Same problem here. 2012R2, SSTP VPN, virtualized on VMware ESXi 5.5 Update 1.


    Microsoft (R) Windows Debugger Version 6.3.9600.17237 AMD64
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [D:\Temp\MEMORY.DMP]
    Kernel Bitmap Dump File: Only kernel address space is available


    ************* Symbol Path validation summary **************
    Response                         Time (ms)     Location
    Deferred                                       http://msdl.microsoft.com/download/symbols
    Symbol search path is: http://msdl.microsoft.com/download/symbols
    Executable search path is: 
    Windows 8 Kernel Version 9600 UP Free x64
    Product: Server, suite: TerminalServer SingleUserTS
    Built by: 9600.17238.amd64fre.winblue_gdr.140723-2018
    Machine Name:
    Kernel base = 0xfffff800`9aa80000 PsLoadedModuleList = 0xfffff800`9ad4a350
    Debug session time: Mon Sep 29 09:09:15.205 2014 (UTC + 2:00)
    System Uptime: 0 days 11:22:29.000
    Loading Kernel Symbols
    ...............................................................
    ................................................................
    ......
    Loading User Symbols

    Loading unloaded module list
    ....
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck 1E, {ffffffffc0000005, fffff80106577838, 0, ffffffffffffffff}

    Probably caused by : wfplwfs.sys ( wfplwfs!WfpRioNblSendComplete+28 )

    Followup: MachineOwner
    ---------

    Monday, September 29, 2014 9:26 AM
  • We gave up on waiting for MS to solve this problem. The server is too critical to let it crash randomly like this. We switched to SonicWALL VPN. The server hasn't crashed since (over 2 weeks now).

    Anywhere Access is still enabled, but since no one is actually using VPN on the server, the server doesn't seem to crash.



    • Edited by Murst Monday, October 13, 2014 2:02 PM
    Monday, October 13, 2014 1:58 PM
  • Hi, Microsoft is full of surprises, indeed. I had to downgrade to Windows Server 2012 and, voila, for a month - not a BSOD. They've changed smth in stack of R2 edition and made it worse. Checked: WS2012 RRAS works flawlessly.
    • Proposed as answer by chatski Monday, November 17, 2014 8:19 AM
    Monday, November 17, 2014 8:19 AM
  • good to know, but a not acceptable solution for us.
    Monday, November 17, 2014 10:28 AM
  • Same for me . Server 2012 Essentials R2, ESXi 5.5 Update 2, SSTP VPN

    STACK_COMMAND:  kb

    FOLLOWUP_IP: 
    wfplwfs!WfpRioNblSendComplete+28
    fffff801`fc93d838 488b4910        mov     rcx,qword ptr [rcx+10h]

    SYMBOL_STACK_INDEX:  7

    SYMBOL_NAME:  wfplwfs!WfpRioNblSendComplete+28

    FOLLOWUP_NAME:  MachineOwner

    MODULE_NAME: wfplwfs

    IMAGE_NAME:  wfplwfs.sys

    DEBUG_FLR_IMAGE_TIMESTAMP:  531ae165

    BUCKET_ID_FUNC_OFFSET:  28

    FAILURE_BUCKET_ID:  0x1E_c0000005_R_wfplwfs!WfpRioNblSendComplete

    BUCKET_ID:  0x1E_c0000005_R_wfplwfs!WfpRioNblSendComplete

    Followup: MachineOwner

    Monday, November 17, 2014 11:04 AM
  • I opened a case / ticket at Microsoft, I'll let you know any step i walk through with them ;)
    Tuesday, November 25, 2014 10:42 AM
  • I opened a case / ticket at Microsoft, I'll let you know any step i walk through with them ;)

    Tuesday, November 25, 2014 10:46 AM
  • We're still putting up with random reboots... Did you make any headway with Microsoft?
    Thursday, December 4, 2014 3:12 PM
  • yes, no bluescreen since 7days, but that says nothing, we had times with >14days uptime.

    Microsoft says, the latest november update rollup has a fix:

    Thursday, December 4, 2014 3:15 PM
  • Thanks, I'll give this a try - If I get more than one weeks uptime I'd be confident that it has worked
    Friday, December 5, 2014 12:36 PM
  • still no BSOD, seems to be fixed? whats your status?
    Thursday, December 18, 2014 4:00 PM