We've got a 2003 server, 64-bit, 64 gb of ram, with a connection to an EMC VNX san via a QLogic HBA. The san storage is 15k rpm sas drives. Frequently logged in the system event log is event ID 2021, "the server was unable to allocate a work item x times in the last 60 seconds," where x varies, generally between 1 and 10, sometimes between 10 and 20.
This is our main SQL server, and though I'm unaware of any complaints about performance that correspond to the times these events are logged, I do know that they occur with great regularity during the period when the server is being backed up, and the backup performance on this server is abysmal. A similar server with a copy of the production SQL database backs up 5-6 times faster than this one.
The only official MS reference I can find on this issue (kb317429) relates to 32-bit version of server 2003, and I am hesitant to make those registry mods to this server, as it is not 32-bit.
I have looked at disk performance counters for the internal HDDs and the storage in the san (queue length, disk sec/read and disk sec/write), and I don't see any glaring performance issues in the disk subsystem. The server is heavily used, but its resources do not appear to be overburdened.
Can anyone suggest a possible solution, or anything else I should be investigating?
We have a similar platform and problems, in our case with smb shares. We have improved the situation to some extent . I'm with you - 317429 tuning advice is not appropriate for 2003 x64 on today's servers since the x32 limits on nonpaged pool are greatly relaxed, to 40% of real memory by default. If you have 8GB x64 real then non-paged pool max default would be 3.2GB, whereas 317429 recommends 8kx64k = 480MB total at most. We are experimenting by still setting a fixed maxworkitems , but making it 48kx16k = 800mb so far. This has eliminated the 2021s , but we still experience very occasional failures to map. Poolmon then shows non-paged pool use near the limit so we are raising it again.
So, I would install X64 poolmon, leave it running and get a look at where storage is going when this happens.