Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PacketGetStats() sometimes fails with ERROR_OPERATION_ABORTED #1650

Closed
guyharris opened this issue Jun 30, 2019 · 14 comments
Closed

PacketGetStats() sometimes fails with ERROR_OPERATION_ABORTED #1650

guyharris opened this issue Jun 30, 2019 · 14 comments

Comments

@guyharris
Copy link

guyharris commented Jun 30, 2019

A number of Wireshark bugs have been filed due to a

Can't get packet-drop statistics: PacketGetStats error: The I/O operation has been aborted because of either a thread exit or an application request. (995)

error.

That comes from a pcap_stats() call in the dumpcap program (the program that does traffic capture for Wireshark and TShark). That ultimately turns into a BIOCGSTATS DeviceIoControl() call; that call isn't overlapped, so:

  • I'm not sure how it'd be aborted by an application request unless another thread closes the HANDLE in the pcap_t, and I don't see any path to close the pcap_t before the pcap_stats() call;
  • the thread making the call hasn't exited as 1) it's in the middle of the call and 2) when the call fails it continues and reports the error.

Looking at the driver (from current Git master), I don't see any obvious way that ERROR_OPERATION_ABORTED would be returned.

I can reproduce this with Wireshark 3.0.2 and Npcap 0.995, running on Windows 7 in a virtual machine on my Mac, with a USB Wi-Fi adapter attached to the VM, by starting a capture on the Wi-Fi adapter and then suspending and restarting the virtual machine.

The first error I get is

"The network adapter on which the capture was being done is no longer running; the capture has stopped."

which is probably dumpcap's translation of a "read error: PacketReceivePacket failed" error reported by Npcap (dumpcap maps various errors reported by libpcap/WinPcap/Npcap for an interface going down into that message; I need to arrange for a way for pcap to return a common "interface went down, capture stopped" error).

When I dismiss that dialog, I get the

"Can't get packet-drop statistics: PacketGetStats error: The I/O operation has been aborted because of either a thread exit or an application request. (995)"

error.

@guyharris
Copy link
Author

I could also reproduce it by capturing on the (virtual) Bluetooth device.

@guyharris
Copy link
Author

I suspect this is a suspend/resume issue of some sort.

@dmiller-nmap
Copy link

When Windows suspends, NDIS filter driver modules are detached from the adapters, which means any open pcap handles ought to be invalid. I'm not sure how to communicate this to libpcap-using software. Maybe there's a useful error (GetLastError) from the PacketReceivePacket call? That basically returns the result of ReadFile on the handle, but I don't see the pcap's errbuf being filled in pcap_read_npf. Basically, once there's been a suspend, you have to re-open the handle with PacketCloseAdapter and PacketOpenAdapter.

@guyharris
Copy link
Author

When Windows suspends, NDIS filter driver modules are detached from the adapters, which means any open pcap handles ought to be invalid.

"Invalid" in the sense that the HANDLE stored in the pcap_t is still a valid device handle but many calls passing that HANDLE to a Windows API will cause the driver for that device to return errors, or in the sense that the HANDLE itself will be invalid and errors will be returned at the Windows API layer before it even makes it to the driver?

I'm not sure how to communicate this to libpcap-using software. Maybe there's a useful error (GetLastError) from the PacketReceivePacket call? That basically returns the result of ReadFile on the handle,

If either 1) the USB Wi-Fi device is unplugged from the system or 2) I do a suspend and resume on the virtual machine, ERROR_GEN_FAILURE is returned by PacketReceivePacket().

Given that NPF_IoControl() returns STATUS_CANCELLED if NPF_StartUsingOpenInstance(Open) returns FALSE, I'm guessing that the HANDLE is still valid, so calls still make it to your driver, but your driver returns an error. In that particular case, it might be a good idea to still allow BIOCGSTATS ioctls, so that if a capture is forcibly terminated due to the adapter going away, you can still get the statistics on packets captured up to the point at which the device became unavailable.

but I don't see the pcap's errbuf being filled in pcap_read_npf.

It is, but it's not filled in with a string for the error code, it's just filled in with a generic error; I changed that to find out what error was returned from PacketReceivePacket() in the device-removed and suspend/resume cases, but haven't checked that in yet, because I wanted to treat that particular error specially, as the equivalent errors are treated in the *BSDs and macOS.

@guyharris
Copy link
Author

Also, is there some way for some part of the NPF kernel code to catch suspends or resumes and re-plumb the filter so that capturing can continue in the suspend/resume case (it obviously can't continue if you unplug the device from the system)?

@dmiller-nmap
Copy link

Ok, this is starting to make sense! The Packet API gets NPF device handles with CreateFile, reads them with ReadFile, writes packets with WriteFile, and closes with CloseFile. These are all delivered to the driver as IRPs, and handled appropriately. NDIS binds the filter driver modules to adapters, detaches them on power events or unplug events, reattaches them on resume, etc. Npcap is getting mixed up between the two sides of things (device driver and filter driver), so even though the device is still open by processes, it burns down the filter driver side and returns inconsistent things like STATUS_CANCELLED.

I think there may be a way to preserve state on the filter driver side and re-attach the open device handles to the appropriate adapters when they are restored, but it would require a lot of changes and still we'd need a way to tell the owning process that the handle isn't available right now. I'm not sure that's worth it.

I will be checking in a change that might help a bit: in the NPF_ReleaseOpenInstanceResources routine that basically tears down an "Open instance" (device handle associated with a filter module instance on an adapter), I'm setting the OpenSignature to 0 so that NPF_IsOpenInstance returns false. In the places this is checked, it results in the pending IRP returning with STATUS_INVALID_HANDLE, which ought to make more sense.

@guyharris
Copy link
Author

As long as the device driver is open, a DeviceIoControl() call with BIOCGSTATS should be made to work, even if this means storing the statistics in a structure that's associated with the device driver rather than the filter driver; otherwise, if a capture fails due to the device being unplugged, you can't get the packet statistics, such as packet drops, from the capture.

It would also be good if the packet buffer also remained available, so that ReadFile() call on a non-empty packet buffer can return the last of the packets captured; once the buffer is empty, if the filter driver is no longer attached, an appropriate error should be returned.

Replumbing the filter driver is less important - an application doing a long-running capture should try to prevent sleep during the capture.

@guyharris
Copy link
Author

The appropriate error for the interface going away would probably be ERROR_DEV_NOT_EXIST - "The specified network resource or device is no longer available." That's correct for the "hot-pluggable interface was unplugged" case; it's not ideal for the "sleep/wakeup dismantled the plumbing" case, but if that can't be distinguished from the "hot-pluggable device was unplugged" case, so it goes (and "sleep/unplug/wakeup" is an example of the "hot-pluggable interface was unplugged" case).

@dmiller-nmap
Copy link

Yes, that seems doable. We have a release in testing already, so it won't be in 0.997, but I hope to work on it for 0.998. And one of these days, I suppose we'll have to have a 1.0 release :)

@guyharris
Copy link
Author

Is there anything else - other than a sleep/wakeup pair or the removal of the device - that would cause 1) ReadFile() on the device to fail and 2) a subsequent BIOCGSTATS DeviceIoControl() call to fail with the "cancelled" error? Somebody claims that a capture that Just Stopped 2 minutes after it started and that the machine didn't go to sleep.

dmiller-nmap pushed a commit to nmap/npcap that referenced this issue Jul 9, 2019
…ale ones. See nmap/nmap#1650"

This reverts commit 34fd597.

Because NPF_Close (IRP_MJ_CLOSE handler) checks NPF_IsOpenInstance, this
resulted in Driver Verifier bugcheck due to not freeing the
OPEN_INSTANCE. Reverting this change until a more complete solution is
done, probably after the next release.
@dmiller-nmap
Copy link

This should be fixed in Npcap 0.9983, released today. Statistics and some of the other functions can still be called on a paused or removed network adapter, and it may be possible to continue capturing after a sleep, though I haven't tested that particularly.

If there are additional changes related to this issue that you would like to see, please open a new issue to describe them. Thanks again!

@guyharris
Copy link
Author

Thanks! I've updated Wireshark bug 14101 to be a bug for updating the version of Npcap that Wireshark bundles into the installer to 0.9983 (and to tell users seeing this problem to update to 0.9983).

@guyharris
Copy link
Author

guyharris commented Sep 16, 2019

...and it may be possible to continue capturing after a sleep, though I haven't tested that particularly.

Wireshark bug 14101 is now RESOLVED/FIXED; one person reporting this problem says:

I recently upgraded to Wireshark Version 3.0.4 (v3.0.4-0-g71591544b8d6 and
Npcap 0.9983. I'm no longer able to reproduce the issue (I previously
commented on bug #15555) -- I've been leaving Wireshark running a capture for
several days through numerous sleep/wake cycles, and the issue has not
reoccurred.

so 0.9983 seems to have made a major improvement here. Kudos!

@fyodor
Copy link
Member

fyodor commented Sep 16, 2019

Thanks for the update, @guyharris. We're delighted to hear it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants