none
Reinicios no deseados RRS feed

  • Pregunta

  • Hola a todos

    Tenemos un servidor Windows 2003 sp2 con 4Gb de ram el cual se reinicia una vez al día por motivos desconocidos entre las 15:40-20:00. Una vez comprobado que ningún software, SAI o corte eléctrico apagaba el servidor buscamos otros motivos.

    Comprobamos que en la carpeta c:\windows\minidump no había ningun fichero de dump, por lo cual pensamos que el problema era de HW, comprobamos que hacia 4 meses se había ampliado la memoria del servidor y decidimos quitarle los módulos y probar. El resultado no fue bueno y probamos con los otros módulos originales sin variar los bancos, tampoco desaparecio el reinicio.

    Finalmente, en los siguientes reinicios el sistema si ha empezado a generar ficheros de dump los cuales hemos analizado:

    El primero hace referencia a un fichero de un software de backup a través de la red:

    Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\WINDOWS\Minidump\Mini022511-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available

    Symbol search path is: SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols
    Executable search path is:
    Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
    Product: LanManNt, suite: TerminalServer
    Built by: 3790.srv03_sp2_gdr.101019-0340
    Machine Name:
    Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8
    Debug session time: Fri Feb 25 16:50:30.328 2011 (GMT+1)
    System Uptime: 1 days 2:29:45.312
    Loading Kernel Symbols
    ...............................................................
    ..................................................
    Loading User Symbols
    Loading unloaded module list
    ...................
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck A, {a050052, d000001b, 0, 8082e1c2}

    Unable to load image \SystemRoot\system32\DRIVERS\lassobackup.sys, Win32 error 0n2
    *** WARNING: Unable to verify timestamp for lassobackup.sys
    *** ERROR: Module load completed but symbols could not be loaded for lassobackup.sys
    Probably caused by : lassobackup.sys ( lassobackup+6354 )

    Followup: MachineOwner
    ---------

    1: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high.  This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 0a050052, memory referenced
    Arg2: d000001b, IRQL
    Arg3: 00000000, bitfield :
        bit 0 : value 0 = read operation, 1 = write operation
        bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
    Arg4: 8082e1c2, address which referenced memory

    Debugging Details:
    ------------------


    READ_ADDRESS:  0a050052

    CURRENT_IRQL:  1b

    FAULTING_IP:
    nt!KiInsertQueueApc+7c
    8082e1c2 8b4104          mov     eax,dword ptr [ecx+4]

    CUSTOMER_CRASH_COUNT:  1

    DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

    BUGCHECK_STR:  0xA

    PROCESS_NAME:  SVFwdServer.exe

    TRAP_FRAME:  f4900630 -- (.trap 0xfffffffff4900630)
    ErrCode = 00000000
    eax=00000000 ebx=00000000 ecx=0a05004e edx=00000001 esi=89cbed18 edi=8848c230
    eip=8082e1c2 esp=f49006a4 ebp=f49006bc iopl=0         nv up ei pl zr na pe nc
    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
    nt!KiInsertQueueApc+0x7c:
    8082e1c2 8b4104          mov     eax,dword ptr [ecx+4] ds:0023:0a050052=????????
    Resetting default scope

    LAST_CONTROL_TRANSFER:  from 8082e1c2 to 8088c9eb

    STACK_TEXT: 
    f4900630 8082e1c2 badb0d00 00000001 8a165ec8 nt!KiTrap0E+0x2a7
    f49006bc 8082ab87 8848c230 8848c1f0 00000000 nt!KiInsertQueueApc+0x7c
    f49006dc 8081e318 8848c230 88604910 00000000 nt!KeInsertQueueApc+0x43
    f4900710 f717c1dc 00000000 e1690380 f4900804 nt!IopfCompleteRequest+0x2c2
    f4900720 f71fdca8 00000000 8848c1f0 00000000 Ntfs!NtfsCompleteRequest+0xc8
    f4900804 f71fe9b6 f4900ac0 8848c1f0 00000000 Ntfs!NtfsReadUsnJournal+0x620
    f4900848 f71fd194 f4900ac0 e1690380 00000000 Ntfs!NtOfsPostNewLength+0x9d
    f4900864 f71d0301 f4900ac0 00000000 8a169100 Ntfs!NtfsProcessNewLengthQueue+0x23
    f4900894 f71c1f77 f4900ac0 e2987be0 e2987ca8 Ntfs!NtfsCommitCurrentTransaction+0x241
    f49008a8 f71ca905 f4900ac0 00000000 f4900bc8 Ntfs!NtfsCheckpointCurrentTransaction+0x21
    f4900aa4 f71be8d9 f4900ac0 8833e008 8a16ace0 Ntfs!NtfsCommonCleanup+0x220a
    f4900c14 8081df85 8a169020 8833e008 8a319860 Ntfs!NtfsFsdCleanup+0xcf
    f4900c28 f7235c45 8a319860 883c7118 0000000b nt!IofCallDriver+0x45
    f4900c50 8081df85 8a16ace0 8833e008 89dfa2d0 fltmgr!FltpDispatch+0x6f
    f4900c64 f393e354 8081df85 89caf020 8833e008 nt!IofCallDriver+0x45
    WARNING: Stack unwind information not available. Following frames may be wrong.
    f4900c7c 808f980e 883c7100 8a3f0040 883c7118 lassobackup+0x6354
    f4900cac 80934caa 885858e8 89caf020 0012019f nt!IopCloseFile+0x2ae
    f4900cdc 809345ab 885858e8 00000001 8a3f0040 nt!ObpDecrementHandleCount+0xcc
    f4900d04 80934644 e10fdd40 883c7118 0000070c nt!ObpCloseHandleTableEntry+0x131
    f4900d48 80934761 0000070c 00000001 f4900d64 nt!ObpCloseHandle+0x82
    f4900d58 808897ec 0000070c 0012f300 7c82847c nt!NtClose+0x1b
    f4900d58 7c82847c 0000070c 0012f300 7c82847c nt!KiFastCallEntry+0xfc
    0012f300 00000000 00000000 00000000 00000000 0x7c82847c


    STACK_COMMAND:  kb

    FOLLOWUP_IP:
    lassobackup+6354
    f393e354 ??              ???

    SYMBOL_STACK_INDEX:  f

    SYMBOL_NAME:  lassobackup+6354

    FOLLOWUP_NAME:  MachineOwner

    MODULE_NAME: lassobackup

    IMAGE_NAME:  lassobackup.sys

    DEBUG_FLR_IMAGE_TIMESTAMP:  4511bb90

    FAILURE_BUCKET_ID:  0xA_lassobackup+6354

    BUCKET_ID:  0xA_lassobackup+6354

    Followup: MachineOwner
    ---------

    El segundo hace referencia a un fichero del driver de la tarjeta de red:

    Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\WINDOWS\Minidump\Mini022711-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available

    Symbol search path is: SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols
    Executable search path is:
    Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
    Product: LanManNt, suite: TerminalServer
    Built by: 3790.srv03_sp2_gdr.101019-0340
    Machine Name:
    Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8
    Debug session time: Sun Feb 27 16:43:29.046 2011 (GMT+1)
    System Uptime: 0 days 11:41:52.656
    Loading Kernel Symbols
    ...............................................................
    ...................................................
    Loading User Symbols
    Loading unloaded module list
    .........
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck A, {104, d000001b, 0, 8082e1c2}

    Unable to load image \SystemRoot\system32\DRIVERS\b57xp32.sys, Win32 error 0n2
    *** WARNING: Unable to verify timestamp for b57xp32.sys
    *** ERROR: Module load completed but symbols could not be loaded for b57xp32.sys
    Probably caused by : b57xp32.sys ( b57xp32+5faa )

    Followup: MachineOwner
    ---------

    3: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high.  This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 00000104, memory referenced
    Arg2: d000001b, IRQL
    Arg3: 00000000, bitfield :
        bit 0 : value 0 = read operation, 1 = write operation
        bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
    Arg4: 8082e1c2, address which referenced memory

    Debugging Details:
    ------------------


    READ_ADDRESS:  00000104

    CURRENT_IRQL:  1b

    FAULTING_IP:
    nt!KiInsertQueueApc+7c
    8082e1c2 8b4104          mov     eax,dword ptr [ecx+4]

    CUSTOMER_CRASH_COUNT:  1

    DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

    BUGCHECK_STR:  0xA

    PROCESS_NAME:  Idle

    TRAP_FRAME:  f78c2588 -- (.trap 0xfffffffff78c2588)
    ErrCode = 00000000
    eax=00000000 ebx=00000000 ecx=00000100 edx=00000002 esi=88ed3b40 edi=88494e40
    eip=8082e1c2 esp=f78c25fc ebp=f78c2614 iopl=0         nv up ei pl zr na pe nc
    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
    nt!KiInsertQueueApc+0x7c:
    8082e1c2 8b4104          mov     eax,dword ptr [ecx+4] ds:0023:00000104=????????
    Resetting default scope

    LAST_CONTROL_TRANSFER:  from 8082e1c2 to 8088c9eb

    STACK_TEXT: 
    f78c2588 8082e1c2 badb0d00 00000002 c07ac498 nt!KiTrap0E+0x2a7
    f78c2614 8082ab87 88494e40 88494e00 00000000 nt!KiInsertQueueApc+0x7c
    f78c2634 8081e318 88494e40 89c2ab18 00000000 nt!KeInsertQueueApc+0x43
    f78c2668 f33f8a59 f78c2768 88edf848 00000000 nt!IopfCompleteRequest+0x2c2
    f78c2690 f3e144b1 88ef0008 f3e144b1 88edf848 afd!AfdReceiveDatagramEventHandler+0x204
    f78c2734 f3e08862 88ef0008 02fda8c0 0000d008 tcpip!UDPDeliver+0x1be
    f78c278c f3e0d101 89e71b68 0d17a8c0 02fda8c0 tcpip!UDPRcv+0x164
    f78c27ec f3e0b326 00000020 89e71b68 f3e14105 tcpip!DeliverToUser+0x189
    f78c287c f3e0b54c 89e71b68 8975222c 00000039 tcpip!IPRcvPacket+0x686
    f78c28bc f3e0b3df 00000000 89fa6eb0 8975220a tcpip!ARPRcvIndicationNew+0x167
    f78c2914 f7166208 89c3c008 89fa6eb0 00000000 tcpip!ARPRcvPacket+0x2f9
    f78c2968 f5791faa 8a0d3130 f78c2b18 00000001 NDIS!ethFilterDprIndicateReceivePacket+0x385
    WARNING: Stack unwind information not available. Following frames may be wrong.
    f78c2cb0 f57a4f69 00000001 0000003e 89b58000 b57xp32+0x5faa
    00000000 00000000 00000000 00000000 00000000 b57xp32+0x18f69


    STACK_COMMAND:  kb

    FOLLOWUP_IP:
    b57xp32+5faa
    f5791faa ??              ???

    SYMBOL_STACK_INDEX:  c

    SYMBOL_NAME:  b57xp32+5faa

    FOLLOWUP_NAME:  MachineOwner

    MODULE_NAME: b57xp32

    IMAGE_NAME:  b57xp32.sys

    DEBUG_FLR_IMAGE_TIMESTAMP:  422de9f1

    FAILURE_BUCKET_ID:  0xA_b57xp32+5faa

    BUCKET_ID:  0xA_b57xp32+5faa

    Followup: MachineOwner
    ---------

    3: kd> lmvm b57xp32
    start    end        module name
    f578c000 f57ab580   b57xp32  T (no symbols)          
        Loaded symbol image file: b57xp32.sys
        Image path: \SystemRoot\system32\DRIVERS\b57xp32.sys
        Image name: b57xp32.sys
        Timestamp:        Tue Mar 08 19:07:45 2005 (422DE9F1)
        CheckSum:         00021923
        ImageSize:        0001F580
        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

    En este punto pensamos que el problema era del driver de la tarjeta de red que causaba algún problema y hacia fallar el backup de red, por lo cual desinstalamos el software de backup e instalamos el último driver de la tarjeta de red y colocamos toda la memoria.

    El servidor ha pasado dos días sin ningún reinicio pero al final se ha reiniciado solo con el siguiente fichero de dump:

    Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\WINDOWS\Minidump\Mini030111-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available

    Symbol search path is: SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols
    Executable search path is:
    Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (4 procs) Free x86 compatible
    Product: LanManNt, suite: TerminalServer
    Built by: 3790.srv03_sp2_gdr.101019-0340
    Machine Name:
    Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8
    Debug session time: Tue Mar  1 20:00:08.328 2011 (GMT+1)
    System Uptime: 1 days 5:33:56.313
    Loading Kernel Symbols
    ...............................................................
    ...................................................
    Loading User Symbols
    Loading unloaded module list
    ....
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck 1000008E, {c0000005, 80934490, f4e09c84, 0}

    Probably pool corruption caused by Tag:  NpFr

    Followup: MachineOwner
    ---------

    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)
    This is a very common bugcheck.  Usually the exception address pinpoints
    the driver/function that caused the problem.  Always note this address
    as well as the link date of the driver/image that contains this address.
    Some common problems are exception code 0x80000003.  This means a hard
    coded breakpoint or assertion was hit, but this system was booted
    /NODEBUG.  This is not supposed to happen as developers should never have
    hardcoded breakpoints in retail code, but ...
    If this happens, make sure a debugger gets connected, and the
    system is booted /DEBUG.  This will let us see why this breakpoint is
    happening.
    Arguments:
    Arg1: c0000005, The exception code that was not handled
    Arg2: 80934490, The address that the exception occurred at
    Arg3: f4e09c84, Trap Frame
    Arg4: 00000000

    Debugging Details:
    ------------------


    EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - La instrucci n en "0x%08lx" hace referencia a la memoria en "0x%08lx". La memoria no se puede "%s".

    FAULTING_IP:
    nt!ObpCloseHandleTableEntry+16
    80934490 8b80a8000000    mov     eax,dword ptr [eax+0A8h]

    TRAP_FRAME:  f4e09c84 -- (.trap 0xfffffffff4e09c84)
    ErrCode = 00000000
    eax=00000000 ebx=e2741e20 ecx=00000000 edx=88c97ce1 esi=88c97ce0 edi=e3122660
    eip=80934490 esp=f4e09cf8 ebp=f4e09d04 iopl=0         nv up ei ng nz na po nc
    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010282
    nt!ObpCloseHandleTableEntry+0x16:
    80934490 8b80a8000000    mov     eax,dword ptr [eax+0A8h] ds:0023:000000a8=00000000
    Resetting default scope

    CUSTOMER_CRASH_COUNT:  1

    DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

    BUGCHECK_STR:  0x8E

    CURRENT_IRQL:  0

    CORRUPTING_POOL_ADDRESS:  88c97c40

    CORRUPTING_POOL_TAG:  NpFr

    LAST_CONTROL_TRANSFER:  from 80934644 to 80934490

    STACK_TEXT: 
    f4e09d04 80934644 e2741e20 e3122660 00002330 nt!ObpCloseHandleTableEntry+0x16
    f4e09d48 80934761 00002330 00000001 f4e09d64 nt!ObpCloseHandle+0x82
    f4e09d58 808897ec 00002330 07a2f9e0 7c82847c nt!NtClose+0x1b
    f4e09d58 7c82847c 00002330 07a2f9e0 7c82847c nt!KiFastCallEntry+0xfc
    WARNING: Frame IP not in any known module. Following frames may be wrong.
    07a2f9e0 00000000 00000000 00000000 00000000 0x7c82847c



    STACK_COMMAND:  kb

    FOLLOWUP_IP:
    nt!ObpCloseHandleTableEntry+16
    80934490 8b80a8000000    mov     eax,dword ptr [eax+0A8h]

    SYMBOL_STACK_INDEX:  0

    SYMBOL_NAME:  nt!ObpCloseHandleTableEntry+16

    FOLLOWUP_NAME:  MachineOwner

    MODULE_NAME: nt

    IMAGE_NAME:  ntkrpamp.exe

    DEBUG_FLR_IMAGE_TIMESTAMP:  4cbda48f

    FAILURE_BUCKET_ID:  CORRUPTING_POOLTAG_NpFr

    BUCKET_ID:  CORRUPTING_POOLTAG_NpFr

    Followup: MachineOwner
    ---------

    0: kd> lmvm nt
    start    end        module name
    80800000 80a5b000   nt       # (pdb symbols)          c:\localsymbols\ntkrpamp.pdb\B30640A84F1046C880BA546891EC17211\ntkrpamp.pdb
        Loaded symbol image file: ntkrpamp.exe
        Mapped memory image file: c:\localsymbols\ntkrpamp.exe\4CBDA48F25b000\ntkrpamp.exe
        Image path: ntkrpamp.exe
        Image name: ntkrpamp.exe
        Timestamp:        Tue Oct 19 16:00:47 2010 (4CBDA48F)
        CheckSum:         0024A530
        ImageSize:        0025B000
        File version:     5.2.3790.4789
        Product version:  5.2.3790.4789
        File flags:       0 (Mask 3F)
        File OS:          40004 NT Win32
        File type:        1.0 App
        File date:        00000000.00000000
        Translations:     040e.04b0
        CompanyName:      Microsoft Corporation
        ProductName:      Microsoft® Windows® operációs rendszer
        InternalName:     ntkrpamp.exe
        OriginalFilename: ntkrpamp.exe
        ProductVersion:   5.2.3790.4789
        FileVersion:      5.2.3790.4789 (srv03_sp2_gdr.101019-0340)
        FileDescription:  NT - kernel és rendszer
        LegalCopyright:   © Microsoft Corporation. Minden jog fenntartva.

    Parece que el problema es que la tarjeta de red esta rota por esto:

    WARNING: Frame IP not in any known module. Following frames may be wrong.
    07a2f9e0 00000000 00000000 00000000 00000000 0x7c82847c

    Arguments:
    Arg1: c0000005, The exception code that was not handled
    Arg2: 80934490, The address that the exception occurred at
    Arg3: f4e09c84, Trap Frame
    Arg4: 00000000

    No tengo claro si esta rota la memoria de la tarjeta en la que procesa las tramas o es la memoria principal del sistema.

    CORRUPTING_POOL_TAG:  NpFr

    NpFr - npfs.sys   - DATA_ENTRY records (read/write buffers)

    ¿Me podéis ayudar?

     

     

     

    miércoles, 2 de marzo de 2011 10:07

Respuestas

  • Hola jctortola

    Lo que parece es que la tarjeta es la causadora de los errores.

    Pruebe cambiar de tarjeta a ver si vuelve los errores.

    Saludos


    Ismael Borche - LATAM Forum Support Engineer
    Microsoft Corporation
    • Propuesto como respuesta Ismael Borche lunes, 14 de marzo de 2011 18:46
    • Marcado como respuesta Ismael Borche viernes, 18 de marzo de 2011 14:56
    miércoles, 9 de marzo de 2011 18:17