SQL 2005 Cluster with win2k3 Ent OS gets hang frequently
Windows 2003 x64 bit cluster one of the node getting hang randomly [No specific time frame]
Issue in brief
We have 2 Active Passive SQL cluster environment for our client, node 2 [SQL Named Instance] hangs.
When access on server we face
# Keyboard and mouse do not respond at the time.
# We have to hard reboot the server to recover from the hang mode.
Operating System Details
Windows 2003 Enterprise Edition SP2 with MS Cluster installed
Cluster is configured in Active-Active setup with SQL resource installed
Resource installed in cluster
SQL 2005 SP2 is installed
Points
No changes have been made on both of the cluster nodes
Issue is occurring from last 13 days.
There are approximately 20 jobs scheduled to run in the night
There are about 1 to 5 jobs that run between 1.00 AM to 4.0 AM.
Few of the jobs fail and few run for about 1 to 1 ½ hours
All the jobs perform function related to SQL like indexing, data updation and data shrinking
When the issue occurs event id 19019 source MSSQL is logged. The description states that the SQL driver connection failed.
Information to be gathered at the time of hang
While troubleshooting Server Hangs, there are some things we check when troubleshooting:
1. Are you able to ping the server remotely? Yes
2. Consult your hardware vendor to run diagnostics on the server to ensure that there is no underlying hardware issue. Reported to DELL as per their instruction we did update all the firmware of that server.
3. At the console, are you able to use the NumLock or CapsLock keys? Not able to use both the keys.
4. At the console, are you able to bring up the GINA screen using Ctrl+Alt+Del? No.
We have tried below steps to reduce downtime:-
1] Check the cluster administrator, found Resource in failed mode
2] Try to bring the SQL instance online, but it couldn’t.
3] If we try to move group of resource, failed to move onto other node, it gets hang.
It will be great help if some one can guide us.
All Replies
- What are the physical memory settings for the nodes and for the SQL Instances? It could be that the machine has overcommitted memory and simply cannot respond while it resolves the issue.
You need to leave 2-3 GB free on machines up to 32 GB and 3-4 GB free on machines over 32GB for the OS and non-SQL tasks (SQL Agent jobs count as non-SQL).
Please post back here with the physical RAM and the MAX SQL Server memory settings.
Geoff N. Hiten Principal Consultant Microsoft SQL Server MVP - We have 32GB physical memory on the server.
Max SQL sever memory is not set previously, so it was taking maximum memory which is available on server.
Now we set 12 GB Max SQL server memory.

