Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Npcap's pcap_sendqueue_transmit Causes Windows10 Blue Screen #374

Closed
WangChengZhang opened this issue Nov 27, 2018 · 29 comments
Closed

Npcap's pcap_sendqueue_transmit Causes Windows10 Blue Screen #374

WangChengZhang opened this issue Nov 27, 2018 · 29 comments

Comments

@WangChengZhang
Copy link

I need to let the gigabit ethernet adapter work with the full speed(about 950Mbit/s). So I use the pcap_sendqueue_transmit function.
But:
The npcap causes bluescreen.
The wpcap works well but lose some packets probably.

@WangChengZhang WangChengZhang changed the title pcap_sendqueue_transmit Causes Windows10 Blue Screen Npcap's pcap_sendqueue_transmit Causes Windows10 Blue Screen Nov 27, 2018
@dmiller-nmap
Copy link
Contributor

Thanks for the report! We'll look into it. Does this happen every time you use pcap_sendqueue_transmit, or only when you approach the limits of your network device?

@dmiller-nmap
Copy link
Contributor

I'm doing some initial testing, and while I am running into problems getting it to work, I have not seen any blue screen errors. Can you get a minidump crash dump and DiagReport output so we can diagnose the problem? https://nmap.org/npcap/guide/npcap-users-guide.html#npcap-issues

@dmiller-nmap
Copy link
Contributor

@WangChengZhang Would you mind sharing a bit of your sample code that uses pcap_sendqueue_transmit? I suspect there is a problem with the order in which the sendqueue functions are being called. Obviously, we don't want Npcap to crash no matter how the user program is written, so this would help us find a reproducible case for your blue screen crash.

@WangChengZhang
Copy link
Author

WangChengZhang commented Dec 4, 2018

Okay Daniel...

Some explanations:

1.For this sender,I use 1 circular linked list with 3 pcap_send_queue to store data. And I use 2 threads filling and sending data. Within the main thread,I fill the queue by calling pcap_sendqueue_queue. Within the sending thread,I send the queue by calling pcap_sendqueue_transmit.

2.Each time before I calling pcap_sendqueue_transmit I will call pcap_sendpacket first. Because I need reopen the adapter in the case of an error. Unfortunately,pcap_sendqueue_transmit won't return false in some cases(like computer wakes up from sleep and restarts the adapter). So I need to call pcap_sendpacket judging the state of adapter.

@dmiller-nmap
Copy link
Contributor

Thanks for the update. I'll see what I can figure out. I wonder if there's a better way to determine if the adapter handle is still valid?

@dmiller-nmap
Copy link
Contributor

Still investigating. I think that in the case where there is a power event that causes Npcap handles to be invalidated, GetLastError can be used to get the error set by the DeviceIoControl function within Packet.dll. You can also check pcap_geterr to see if there is an error set. In this case, pcap_sendqueue_transmit will return a number less than the length of the sendqueue, which I see you are already checking for.

We will work on improving the errors that are returned, but also look for a cause on the BSOD crash.

@WangChengZhang
Copy link
Author

Thanks, I'll try it.

@WangChengZhang
Copy link
Author

@dmiller-nmap , I have checked the code of npcap recently.
I didn't find any difference of NPF_IoControl function in Packet.c between winpcap and npcap projects. Maybe the blue screen is cause by the difference of vs2015 Compiler?:

case BIOCSENDPACKETSSYNC:
		SyncWrite = TRUE;

	case BIOCSENDPACKETSNOSYNC:
		TRACE_MESSAGE(PACKET_DEBUG_LOUD, "BIOCSENDPACKETSNOSYNC");

		NdisAcquireSpinLock(&Open->WriteLock);
		if (Open->WriteInProgress)
		{
			NdisReleaseSpinLock(&Open->WriteLock);
			//
			// Another write operation is currently in progress
			//
			SET_FAILURE_UNSUCCESSFUL();
			break;
		}
		else
		{
			Open->WriteInProgress = TRUE;
		}
		NdisReleaseSpinLock(&Open->WriteLock);

		WriteRes = NPF_BufferedWrite(Irp,
			(PUCHAR) Irp->AssociatedIrp.SystemBuffer,
			IrpSp->Parameters.DeviceIoControl.InputBufferLength,
			SyncWrite);

		NdisAcquireSpinLock(&Open->WriteLock);
		Open->WriteInProgress = FALSE;
		NdisReleaseSpinLock(&Open->WriteLock);

		if (WriteRes != -1)
		{
			SET_RESULT_SUCCESS(WriteRes);
		}
		else
		{
			SET_FAILURE_UNSUCCESSFUL();
		}
		break;

And I found that: pcap_sendqueue_transmit will call PacketSendPackets in Packet32.cpp to send packets. But it will break and return error once DeviceIoControl return FALSE. Can you consider of adding a interface which can resend the packet till succeed or time out?

do{
			// Send the data to the driver
			//TODO Res is NEVER checked, this is REALLY bad.
			Res = (BOOLEAN)DeviceIoControl(AdapterObject->hFile,
				(Sync)?BIOCSENDPACKETSSYNC:BIOCSENDPACKETSNOSYNC,
				(PCHAR)PacketBuff + TotBytesTransfered,
				Size - TotBytesTransfered,
				NULL,
				0,
				&BytesTransfered,
				NULL);

			// Exit from the loop on error
			if(Res != TRUE)
				break;

			TotBytesTransfered += BytesTransfered;

			// Exit from the loop if we have transferred everything
			if(TotBytesTransfered >= Size)
				break;

			// calculate the time interval to wait before sending the next packet
			TargetTicks.QuadPart = StartTicks.QuadPart +
			(LONGLONG)
			((((struct timeval*)((PCHAR)PacketBuff + TotBytesTransfered))->tv_sec - BufStartTime.tv_sec) * 1000000 +
			(((struct timeval*)((PCHAR)PacketBuff + TotBytesTransfered))->tv_usec - BufStartTime.tv_usec)) *
			(TimeFreq.QuadPart) / 1000000;
			
			// Wait until the time interval has elapsed
			while( CurTicks.QuadPart <= TargetTicks.QuadPart )
				QueryPerformanceCounter(&CurTicks);

		}
		while(TRUE);

@dmiller-nmap
Copy link
Contributor

In the crash dumps @PiAil provided on #370, the backtrace comes directly from rt640x64.sys, a RealTek NIC driver. This doesn't mean that it's not a problem with Npcap, just that the crash dumps don't point to any particular place within Npcap. The crash is DRIVER_IRQL_NOT_LESS_OR_EQUAL with a read operation.

Pure speculation: Maybe Npcap is injecting NBLs allocated from a pageable pool, and when the NIC driver gets around to sending them, they've been paged out?

@dmiller-nmap
Copy link
Contributor

I've tried to reproduce this crash and have not been able to. My testing was done on a Windows 10 VM in Hyper-V, with Npcap 0.99-r9 running with standard Driver Verifier settings. I captured a speed test on fast.com, resulting in a 13.8 MB pcap file with 27573 packets. Then I played that back on the network with sendcap.exe. It reported a packet rate of 28000 packets per second, so significantly faster than the original packets were received.

@WangChengZhang are you able to submit a crash minidump file that we could inspect?

@PiAil Does the crash happen on different hardware, especially a network card from a different manufacturer?

@PiAil
Copy link

PiAil commented Feb 4, 2019

I tried with two different network card (ASIX AX88179 USB 3.0 to Gigabit Ethernet Adapter and Intel(R) Ethernet Connection (2) I219-V). I also tried with both my dll called with ctypes and with sendcap.exe and got similar results.

With the first card, I got a BSoD, here a minidump:
020419-15937-01.zip

With the second one, a big freeze, a reboot, and a minidup:
020119-7328-01.zip

The file I'm trying to send is bigger than yours (~150 MB). I don't know the limit but it seems that with small file, the packets are sent without any problem.

An exemple of my file:
test.zip

Note: It is an UDP flood with spoofed IP for learning purpose

@dmiller-nmap
Copy link
Contributor

Thanks for the additional info! I'm still not able to trigger with your file example, but that may be because of mismatching MAC addresses. The first minidump has a short stack trace without Npcap in it, but the second one shows a crash in the NIC driver with Npcap functions in the stack trace. I do have a further question based on this stack trace, though: Are you sending the packets on the Npcap Loopback Adapter? The function in the stack trace is NPF_LoopbackSendNetBufferLists, which ought to be called only when the adapter being used is the Npcap Loopback Adapter, so that might indicate a problem.

@PiAil
Copy link

PiAil commented Feb 4, 2019

Nop, I'm not using the Npcap Loopback Adapter, but the one linked to my NIC (found with iflist.exe).

@dmiller-nmap
Copy link
Contributor

Ok, I've audited all the places the internal Loopback flag is used and set, as well as all the code paths that lead to that function, and I don't see anything wrong. This might not be even relevant to your issue, so I'm going to suggest a partial workaround to see if we can get a better crash dump:

Please reinstall the latest Npcap, but install without loopback capture support. Then try to send your packet dump. If it crashes, we've narrowed it down to a different portion of the code (and maybe we will get a helpful crash dump). If it does not crash, then we've proven the problem is related to the loopback code, and we can investigate further there.

@PiAil
Copy link

PiAil commented Feb 25, 2019

Tried to install Npcap again without the loopback adapter and nothing changed.

I couldn't access the machine which printed the error relative to the loopback adapter again, but here a mindump with the ASIX AX88179 USB 3.0 to Gigabit Ethernet Adapter. It doesn't look really helpful but it was the best I could do.

021819-16843-01.zip

@dmiller-nmap
Copy link
Contributor

@PiAil Thanks for the dump. I agree, there's nothing specifically implicating Npcap in this one either, but it certainly does seem suspicious that entirely different drivers both crash with a page fault under these circumstances. Thanks for your patience while we try to sort this out.

@dmiller-nmap
Copy link
Contributor

Ok, it looks like this will likely be fixed in the next release. The issue is that when Npcap sends packets, it has to create a NET_BUFFER_LISTS structure and send it through the stack. But the memory for the network data buffers has to be allocated from non-paged memory, otherwise it could get paged out and the pointers would be invalid when the NIC driver tries to read them. For ordinary single-packet write operations, NDIS handles the memory allocation due to the adapter device being created for Direct I/O. But for buffered writes, the data is sent in an IoCtl input buffer which is allocated from user space. Npcap (and WinPcap before it) just used the address of that buffer in the packet description structure. The fix was to allocate a new buffer from non-paged memory and copy the contents of the user buffer into it. This might end up slowing down sends a little bit, but it's better to not crash, of course!

@dmiller-nmap
Copy link
Contributor

Npcap 0.991 was just released and should fix this problem. We would be very grateful if @WangChengZhang or @PiAil can confirm that the problem is solved.

@PiAil
Copy link

PiAil commented Mar 20, 2019

Thanks for your work on the problem !

Unfortunately, I tried the new version, with fewer packets (10 * 1500 bytes) to begin, and it crashed with a new error message BAD_POOL_CALLER.

Note that it didn't crash previously with this little number of packets. I didn't try with bigger files, in order to save a bit my laptop !

Minidump:
032019-9625-01.zip

Edit: Tested with another NIC, same crash, same error message.

@ghost
Copy link

ghost commented Mar 22, 2019

I would like to also report the same problem with npcap-0.991.
BAD_POOL_CALLER It's happens only for pcap_sendqueue_transmit()

Minidump:
032219-9046-01.zip

@gvanem
Copy link

gvanem commented Mar 22, 2019

From the Bugcheck Analysis of your .dmp file:

BAD_POOL_CALLER (c2)
The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 0000000000000007, Attempt to free pool which was already freed
Arg2: 00000000000919b7, Pool tag value from the pool header
Arg3: 000000005c3ef920, Contents of the first 4 bytes of the pool header
Arg4: ffffaf84f8c00010, Address of the block of pool being deallocated

I'm not surprised.

@dmiller-nmap
Copy link
Contributor

Thanks for the updates. We have this additional crash fixed and will release shortly. See 8ecd3d8 for the fix.

@dmiller-nmap
Copy link
Contributor

This should be fixed in Npcap 0.992. Thanks!

@PiAil
Copy link

PiAil commented Mar 25, 2019

Problem still there for me (BAD_POOL_CALLER)!

@dmiller-nmap
Copy link
Contributor

@PiAil I'm sorry to hear it didn't work for you. For some reason, I'm not able to get sendqueue to work on any of my test systems right now, so I'm trying to debug that. I do get some crashes when trying to send on Loopback Adapter, which I'm investigating also, but if you have a crash dump for Npcap 0.992 on an ordinary network adapter, I'd be very grateful if you could send it to me.

@dmiller-nmap dmiller-nmap reopened this Mar 25, 2019
@PiAil
Copy link

PiAil commented Mar 26, 2019

Here it is

032619-9296-01.zip

@dmiller-nmap
Copy link
Contributor

Ok, I think I have the fix in now. I have some more testing to do and some improvements to the returned error values for buffered writes, but this should be taken care of in the next release. Thanks again for responding so quickly.

@dmiller-nmap
Copy link
Contributor

This ought to have been fixed in Npcap 0.993. We have included testing of pcap_sendqueue_transmit() in our test suite for new releases since then and have not seen any crashes. Please let us know if Npcap 0.9983 still exhibits the problem.

@PiAil
Copy link

PiAil commented Sep 5, 2019

The bug seems fixed for me, at least for the 0.9983 version, thanks for your great work !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants