Do we still need all the time stamping modes? #1829

guyharris · 2019-11-17T03:47:11Z

While looking at #1407 and the fix for it in pull request nmap/npcap/#19, and working on changes to make the time stamp type per-instance rather than global (as per my comments in the pull request), I was looking at packetWin7/npf/npf/time_calls.h, and wondered whether all the code there is still needed, given that Npcap either doesn't support anything prior to Vista or doesn't support Vista (the home page says both "Npcap works on Windows 7 and later by making use of the new NDIS 6 Light-Weight Filter (LWF) API." and "Npcap 0.9984 installer for Windows Vista/2008, 7/2008R2, 8/2012, 8.1/2012R2, 10/2016 (x86 and x64).").

The Windows System Information page "Acquiring high-resolution time stamps" has a bunch of information on time stamps; it's oriented towards userland, but some applies to kernel-mode code as well.

It says, for the following platforms ("QPC" means "QueryPerformanceCounter()"; presumably what it says also applies to KeQueryPerformanceCounter() in kernel mode):

"Windows Vista and Windows Server 2008

All computers that shipped with Windows Vista and Windows Server 2008 used a platform counter (High Precision Event Timer (HPET)) or the ACPI Power Management Timer (PM timer) as the basis for QPC. Such platform timers have higher access latency than the TSC and are shared between multiple processors. This limits scalability of QPC if it is called concurrently from multiple processors."

So it sounds as if, on Vista/WS2008, the performance counter is in sync on all CPUs, but that may impose a latency slowing down fetches, and may further slow down if multiple processors are accessing the QPC.

"Windows 7 and Windows Server 2008 R2

The majority of Windows 7 and Windows Server 2008 R2 computers have processors with constant-rate TSCs and use these counters as the basis for QPC. TSCs are high-resolution per-processor hardware counters that can be accessed with very low latency and overhead (in the order of 10s or 100s of machine cycles, depending on the processor type). Windows 7 and Windows Server 2008 R2 use TSCs as the basis of QPC on single-clock domain systems where the operating system (or the hypervisor) is able to tightly synchronize the individual TSCs across all processors during system initialization. On such systems, the cost of reading the performance counter is significantly lower compared to systems that use a platform counter.

Furthermore, there is no added overhead for concurrent calls and user-mode queries often bypass system calls, which further reduces overhead. On systems where the TSC is not suitable for timekeeping, Windows automatically selects a platform counter (either the HPET timer or the ACPI PM timer) as the basis for QPC."

So it sounds as if, on W7/WS2008R2, the performance counter is in sync on all CPUs, and on newer systems, the performance hit for using it may be reduced.

"Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2

Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2 use TSCs as the basis for the performance counter. The TSC synchronization algorithm was significantly improved to better accommodate large systems with many processors. In addition, support for the new precise time-of-day API was added, which enables acquiring precise wall clock time stamps from the operating system. For more info, see GetSystemTimePreciseAsFileTime. ..."

So it sounds as if, on Windows 8/WS2012 and later, all machines use TSCs, so the performance hit on constant-rate TSC-less machines goes away as those machines aren't supported, and the multiprocessor overhead may be further reduced. In addition, there's a kernel equivalent to GetSystemTimePreciseAsFileTime(), namely KeQuerySystemTimePrecise().

All the extra work here may date back to Windows NT 3.x, Windows NT 4.0, and Windows 2000/Windows XP/Windows Server 2003, where they had to deal with machines that didn't have high-resolution counters that would deliver results synchronized across processors, so they offered timers based on:

raw KeQueryPerformanceCounter() - high precision, but doesn't
necessarily give consistent results on an MP machine;
KeQuerySystemTime() - low precision, but gives consistent results on an MP machine (as it fetches a shared clock value updated by clock interrupts);
KeQueryPerformanceCounter() with separate per-CPU time bases - high precision, may give less inconsistent results on an MP machine if different per-CPU performance counters have different time bases; may give out-of-order time stamps;
KeQueryPerformanceCounter() with separate per-CPU time bases and a check to make sure time doesn't go backwards from the last result - high precision, may give less inconsistent results on an MP machine if different per-CPU performance counters have different time bases; won't give out-of-order time stamps;

I couldn't find any mail on any WinPcap mailing lists about this (maybe I'll ask Loris Degioanni to see if he remembers any of that), but Loris sent a message to winpcap-users and another message to winpcap-bugs (scroll down to Loris Degioanni's mail from 2005-04-27) talking about time stamping.

At least in Windows 7 and later, it may be possible to just use KeQueryPerformanceCounter(), without the extra per-CPU stuff (TIMESTAMPMODE_SINGLE_SYNCHRONIZATION), to get high-precision time stamps that, on most machines, are synchronized across CPUs and are reasonably quick to fetch. On Vista/WS2008, it appears that it'll give time stamps synchronized across CPUs but they might not be quick to fetch (and less so the more CPUs/cores you have).

TIMESTAMPMODE_SYNCHRONIZATION_ON_CPU_WITH_FIXUP and TIMESTAMPMODE_SYNCHRONIZATION_ON_CPU_NO_FIXUP, neither of which appear to have been documented, may have been experiments to try to synchronize KeQueryPerformanceCounter() across CPUs, but may have been abandoned.

I'm not sure TIMESTAMPMODE_RDTSC is useful any more. It was only supported on 32-bit x86, it didn't deal with cross-CPU synchronization, and it may not have dealt with variable CPU speed machines (hence the reference to "SpeedStep machines"); KeQueryPerformanceCounter() may have at least worked around the latter, and, on Windows 7, is apparently highly likely to be running on machines where the time stamp counters acn be synchronized across machines without too much pain (and falls back on a slower-to-access per-system timer for machines where that can't be done). Perhaps its only advantage was speed, because it just went straight to the (current) CPU's time stamp counter.

So perhaps we can leave just TIMESTAMPMODE_SINGLE_SYNCHRONIZATION and TIMESTAMPMODE_QUERYSYSTEMTIME. If there's a backwards compatibility concern, we could also leave TIMESTAMPMODE_RDTSC for 32-bit x86.

So what you'd get for time stamps with those three modes is:

TIMESTAMPMODE_SINGLE_SYNCHRONIZATION: high-precision, guaranteed to be monotonic (never going backwards) and advancing at a constant rate(?), not guaranteed to be synchronized with the system clock, may be slow to fetch on Windows Vista and on some machines with Windows 7, guaranteed(?) to be consistent between CPUs;
TIMESTAMPMODE_QUERYSYSTEMTIME: low-precision, not guaranteed to be monotonic (the system clock can be turned backwards), not guaranteed to advance at a constant rate (if time adjustments to sync with UTC are done either by slowing down or speeding up the clock), guaranteed to by synchronized with the system clock, not sure how fast it'd be to fetch relative to the others, guaranteed to be consistent between CPUs;
TIMESTAMPMODE_RDTSC: high-precision, guaranteed to be monotonic (never going backwards) and advancing at a constant rate(?), not guaranteed to be synchronized with the system clock, probably pretty fast to fetch, not guaranteed to be consistent between CPUs.

A fourth mode could be added for #1407, and we could work on allowing per-instance modes rather than a single global mode set from the Registry (I'm in the middle of that; most of this text was extracted from a huge comment I was putting into npf/time_calls.h), adding an ioctl to set the per-instance mode, adding a packet.dll routine to do that ioctl, and using that routine in packet-npc.c to support pcap_set_tstamp_type().

The text was updated successfully, but these errors were encountered:

Nazardo · 2019-12-02T15:17:16Z

Not sure if you already got any feedback about this but to me it looks like a great idea that will remove much old (unused? experimental? undocumented) code from time_calls.h.

One suggestion: I think that TIMESTAMPMODE_QUERYSYSTEMTIME should be low-precise only on systems that do not support the precise time-of-day API. On newer systems it must default to the high-precision timestamp calls (KeQuerySystemTimePrecise). What would be the point of having low precision timestamps?

guyharris · 2019-12-02T22:17:52Z

One suggestion: I think that TIMESTAMPMODE_QUERYSYSTEMTIME should be low-precise only on systems that do not support the precise time-of-day API. On newer systems it must default to the high-precision timestamp calls (KeQuerySystemTimePrecise). What would be the point of having low precision timestamps?

Those #defines would be part of the interface to the packet.dll library; that's not the library that most software would use - they'd go through the pcap API. An additional mode would be added, TIMESTAMPMODE_QUERYSYSTEMTIMEPRECISE, when support for KeQuerySystemTimePrecise() would be added.

The pcap API has pcap_list_tstamp_types() and pcap_set_tstamp_type(); the set of time stamp types includes:

PCAP_TSTAMP_HOST_LOWPREC - this corresponds to TIMESTAMPMODE_QUERYSYSTEMTIME;
PCAP_TSTAMP_HOST_HIPREC_UNSYNCED - this would correspond to one or more of TIMESTAMPMODE_SINGLE_SYNCHRONIZATION and TIMESTAMPMODE_RDTSC;
PCAP_TSTAMP_HOST_HIPREC - this would correspond to a new TIMESTAMPMODE_QUERYSYSTEMTIMEPRECISE mode.

If the type isn't set, it's not guaranteed to be any particular type, so it could default to:

PCAP_TSTAMP_HOST_HIPREC if available and PCAP_TSTAMP_HOST_LOWPREC otherwise;
the value corresponding to the time stamp type specified by the registry key, and something determined by the OS's capabilities if the registry key isn't present.

dmiller-nmap · 2020-03-13T22:47:34Z

This is very interesting! I'm removing the TIMESTAMPMODE_SYNCHRONIZATION_ON_CPU_* modes, which has the added benefit of reducing the size of the data structure used to keep synchronization info. I'll revisit this issue after the next release to see what other changes might be warranted.

guyharris · 2020-03-14T03:35:42Z

One suggestion: I think that TIMESTAMPMODE_QUERYSYSTEMTIME should be low-precise only on systems that do not support the precise time-of-day API. On newer systems it must default to the high-precision timestamp calls (KeQuerySystemTimePrecise). What would be the point of having low precision timestamps?

On newer systems (note that Windows Vista isn't supported in Npcap any more, and that Windows 7 either isn't supported by Microsoft any more or is only supported with some magical special super-extended support) the system should 1) default to KeQuerySystemTimePrecise() if the Registry key for the time stamp type isn't present at all and 2) use KeQuerySystemTimePrecise()if it's set toTIMESTAMPMODE_QUERYSYSTEMTIME`.

If the Registry key is set to some value other than TIMESTAMPMODE_QUERYSYSTEMTIME, should it still use KeQuerySystemTimePrecise() if it's available?

guyharris · 2020-03-14T07:03:53Z

One suggestion: I think that TIMESTAMPMODE_QUERYSYSTEMTIME should be low-precise only on systems that do not support the precise time-of-day API. On newer systems it must default to the high-precision timestamp calls (KeQuerySystemTimePrecise). What would be the point of having low precision timestamps?

The only potential benefit would be if there were a performance overhead for KeQuerySystemTimePrecise() sufficient to make it worth using KeQuerySystemTime() if synchronization is important but precision isn't or using the performance counter if precision is important but synchronization isn't.

Note that most UN*X kernels' internal API for drivers, system calls, etc. is like KeQuerySystemTimePrecise(), and that's what's used for time stamping packets in capture mechanisms, so its performance is good enough for those purposes. I suspect that's true of KeQuerySystemTimePrecise() as well.

Use of RDTSC instruction is discouraged by Microsoft (https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps). If the TimeStampMode Registry entry is set to 3, it will default back to the TIMESTAMPMODE_SINGLE_SYNCHRONIZATION, which uses KeQueryPerformanceCounter() with similar characteristics to the RDTSC instruction.

guyharris · 2020-03-20T20:22:06Z

OK, they're all gone, so this is done.

dmiller-nmap pushed a commit to nmap/npcap that referenced this issue Mar 13, 2020

Remove TIMESTAMPMODE_SYNCHRONIZATION_ON_CPU* modes. See nmap/nmap#1829

bc57ba3

guyharris closed this as completed Mar 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we still need all the time stamping modes? #1829

Do we still need all the time stamping modes? #1829

guyharris commented Nov 17, 2019 •

edited

Nazardo commented Dec 2, 2019

guyharris commented Dec 2, 2019 •

edited

dmiller-nmap commented Mar 13, 2020

guyharris commented Mar 14, 2020

guyharris commented Mar 14, 2020

guyharris commented Mar 20, 2020

Do we still need all the time stamping modes? #1829

Do we still need all the time stamping modes? #1829

Comments

guyharris commented Nov 17, 2019 • edited

Nazardo commented Dec 2, 2019

guyharris commented Dec 2, 2019 • edited

dmiller-nmap commented Mar 13, 2020

guyharris commented Mar 14, 2020

guyharris commented Mar 14, 2020

guyharris commented Mar 20, 2020

guyharris commented Nov 17, 2019 •

edited

guyharris commented Dec 2, 2019 •

edited