VMQ + Broadcom crashes servers

Yesterday I rebooted one of my hosts, and another host just crashed out of nowhere.
After some diggin’ this event shows in the eventviewer:

Event ID 113
Failed to allocate VMQ for NIC C777500C-AA7A-4F61-8862-0B8D09A2E967--A6F60352-0C31-4F30-9F27-F3FC8C1D5F87 (Friendly Name: VM) on switch DA1BEF5D-95DD-46FE-9E9E-43DE05FFCDDB (Friendly Name: Local). Reason - Maximum number of VMQs supported on the Protocol NIC is exceeded. Status = Insufficient system resources exist to complete the API.

Log Name:      System
Source:        Microsoft-Windows-Hyper-V-VmSwitch
Date:          8/22/2013 3:59:10 PM
Event ID:      113
Task Category: None
Level:         Error
Keywords:      
User:          S-1-5-83-1-3346485260-1331800698-2366333576-1743364617
Computer:      Hv01
Description:
Failed to allocate VMQ for NIC C777500C-AA7A-4F61-8862-0B8D09A2E967--A6F60352-0C31-4F30-9F27-F3FC8C1D5F87 (Friendly Name: VM) on switch DA1BEF5D-95DD-46FE-9E9E-43DE05FFCDDB (Friendly Name: Local). Reason - Maximum number of VMQs supported on the Protocol NIC is exceeded. Status = Insufficient system resources exist to complete the API.

After a second,thirth and fourth crash I was able to get a bluescreen captured and google the Errorcode.

Full_DPC_Watchdog_Violation

Found this knowlegde base article / hotfix (December 2012!) http://support.microsoft.com/kb/2789962 which says “Assume that you have a Windows Server 2012-based computer that has many third-party drivers installed. ” and yes we have some broadcom drivers!
After installing this hotfix, the server still crashed. After disabling VMQ (Disable-networkadaptervmq) the crashing stopped.

Dumpfile:

Probably caused by : tcpip.sys ( tcpip+6b869 )


DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
    component can usually be identified with a stack trace.
Arg2: 0000000000000504, The DPC time count (in ticks).
Arg3: 0000000000000503, The DPC time allotment (in ticks).
Arg4: 0000000000000000

So it seems like a combonation with VMQ and Broadcom NIC’s.
When the Maximum number of VMQ’s is reached, problems occur.

6 thoughts to “VMQ + Broadcom crashes servers”

  1. Thanks for sharing this, we found out about that bug the hard way while live migrating VM from one host to another. End result; 70 VMs failed over and caused a nice boot storm…

  2. Hi,
    I have two processor sockets/server and each with 18 cores and I have multiple networks setup on dual port 40gig converged mellanox NIC. I have tried to configure RSS and vmq as explained in this post

    https://www.darrylvanderpeijl.com/windows-server-2016-networking-optimizing-network-settings/#comment-398709

    and I have been having issues when i setup as described in that article. First of all, setting a numanode does not change the group, at least for me, so the way i managed to do is as shown below
    with hyper threading enabled, 18 cores /socket, gives 36 cores and 72 logical cpus. I have 6 converged networks on two nic ports(single NIC with dual ports) assigned to SET vswitch. The networks are Management,LM,Cluster,Management,SMB1,SMB2. Can you please take a look and let me know if this is OK?
    Set-NetAdapterrss -Name “vEthernet (SMB1)” -NumaNode 0 -BaseProcessorNumber 18 -MaxProcessorNumber 32 -BaseProcessorGroup 0
    Set-NetAdapterrss -Name “vEthernet (SMB2)” -NumaNode 1 -BaseProcessorNumber 18 -MaxProcessorNumber 32 -BaseProcessorGroup 1

    Set-NetAdapterRss -Name NIC1 -BaseProcessorNumber 2 -NumaNode 0 -MaxProcessors 8 -MaxProcessorNumber 16 -BaseProcessorGroup 0
    Set-NetAdapterRss -Name NIC2 -BaseProcessorNumber 2 -NumaNode 1 -MaxProcessors 8 -MaxProcessorNumber 16 -BaseProcessorGroup 1

    Set-NetAdapterVmq -Name “NIC1” -NumaNode 0 -BaseProcessorNumber 2 -MaxProcessors 8
    Set-NetAdapterVmq -Name “NIC2” -NumaNode 1 -BaseProcessorNumber 2 -MaxProcessors 8

    Set-NetAdapterrss -Name “vEthernet (LM)” -NumaNode 1 -BaseProcessorNumber 34 -BaseProcessorGroup 1
    Also, how to reset everything back to defaults?

    1. Raj, this config looks OK to me indeed. Do you see the traffic nicely spread among the processors? Thats the only way to know for sure.

      Also, how to reset everything back to defaults?

      Good question 🙂 Uninstalling the NIC?

  3. Do you still have the hotfix patchfile by any chance that you can share? Microsoft appears to have broken their website, goes in to an infinite redirect loop – can’t download the file.

Leave a Reply

Your email address will not be published. Required fields are marked *