Empathy List Archives

TN

Trevor N.

Tue, Feb 21, 2017 2:26 AM

SA6CID wrote:

....
So, I thought actually of the jitter added on the way between our
accurate source (GPS rx), until we can capture our timer. How much can
this be? As far as I see we don't have a capture mode for the HPET. But,
if we have to do it in software, we get more than 100 ns jitter. I just
measured 60-80 ns for a instruction cache miss, with Intels mlc software.
Overall I would guess > 500 ns, are there measurements on this?

This then defines some lower bound of what can be archived for
synchronizing the clock off the OS. Also hardware time stamping on a
dedicated PPS card (or PTP ethernet card) does not help unless the clock
on the card is synchronized to the clock used by the OS.
....

On Intel processors newer than the core2 it appears that there are no
input pins that can be polled or used as an interrupt source. Current
AMD processors still have LINT0 and LINT1 pins that can be polled
through the XAPIC interface, but using them would require modifying
the motherboard to temporarily disconnect them from their usual
sources. It seems likely that a transition on those pins could be
timestamped within a few tens of ns.

An option for Linux kernel 4.6 or newer on Skylake processors when
using PTP on the chipset ethernet would be to use the
PTP_SYS_OFFSET_PRECISE ioctl to eliminate the problem you mentioned in
the quoted text. See
https://patchwork.kernel.org/patch/8497611/
https://patchwork.kernel.org/patch/8497621/
https://lwn.net/Articles/670081/
It looks like the chipset and CPU share an "Always Running Timer"
which the chipset can sample in sync with the ethernet PTP, USB[2] and
audio timing registers, and it has a defined relationship to the CPU
TSC. I didn't spot a way to timestamp a GPIO pin transition in the
chipset datasheet, so this sort of kicks the can down the road when
using PTP. It might be possible to use this to improve PPS over USB.

Skylake chipsets have three 16550-compatible UARTS, so depending on
how they are connected internally (might be on the LPC bus) the line
status signals may have very low read latency. It looks like they are
supported in Linux 4.3 and newer; has anyone tested them?

For now I've focused on using parallel port cards for PPS capture. The
machine I'm using has a Q67 chipset and i5-2500 processor (the Q67 is
one of the last chipsets with native parallel PCI). There is no option
in the BIOS to disable clock spread-spectrum so it's probably active.
The system XO was replaced with an IDT525 so I can use 10MHz sources.
I added a polling mode and PPS echo to the Linux 4.1 pps_parport
driver.

Using a Lava parallel PCI card, port reads and writes take an average
of 962ns when done in a calibration loop with the processor TSC. For
measurements I'm using a TIC hooked to the PPS input and echo output,
and Miroslav Lichavar's ppsallan program[3]. The counter I'm using
only has GPIB and I don't yet have an interface card, so I can only do
short tests with it at the moment since I have to watch it. I've
listed some results below with a test time of 1 minute. The system
oscillator and PPS source is an Endrun Technologies Praecis Cf, so all
the data below should be showing only the PPS capture error.

--Driver in polling mode with no system load:
The minimum observed interval between the PPS input and echo was
1.3us, maximum interval was 2.3us, eyeball-average 1.9us, and the 1t
adev was 513ns. attachment:
adev-praecis-lavanew-poll-noload-1min-04.plog

--Driver in polling mode with high system load:
min 1.3us, max 2.4us, avg 1.9us, 1t adev 549ns
adev-praecis-lavanew-poll-load-1min-01.plog

--Driver in interrupt mode with no system load:
min 2.4us, max 3.1us, avg 2.7us, 1t adev 467ns
adev-praecis-lavanew-int-noload-1min-01.plog

--Driver in interrupt mode with high system load:
min 2.4us, max 4.1us avg 2.8us, 1t adev 533ns
adev-praecis-lavanew-int-load-1min-01.plog

After subtracting the echo's port write time, polling mode can go
below 400ns delay. This is below the 962ns read time, but is probably
valid since the PPS edge could occur between when the card has decoded
the read command and when it samples the port pins to place on the
bus. 962ns latency seems very high for parallel PCI -- it should be
able to achieve below 300ns. The lava card may be using wait cycles to
throttle down to standard LPT speeds. The card uses an FPGA so it
might be possible to improve its performance. It's also likely the
processor-to-chipset link is adding significant latency.

I also tested a PCIe parallel card that uses an OXPCIe952 chip.
When attached to a PCH PCIe lane, read time was 1614ns and write time
was 1626ns. Minimum obverved PPS echo delay was 1.9us when polling.
When attached to a CPU lane, input&output time was 832ns.

Over a 1-minute test in polling mode with no system load I observed a
min. echo delay of 1.1us, max of 2.0us, average 1.6us
adev-praecis-pciecpu-poll-noload-1min-01.plog

24-hours with no system load :
adev-praecis-pciecpu-poll-noload-24h-01.plog
I'll show some longer-term results from the counter once I get a GPIB
card and write a logging program.

[1] "Motivating Future Interconnects: A Differential Measurement
Analysis of PCI Latency", Miller 2009

[2] 100-series-chipset-datasheet-vol-2.pdf section 18.2.83

[3] https://github.com/mlichvar/ppsallan

SA6CID wrote: >.... >So, I thought actually of the jitter added on the way between our >accurate source (GPS rx), until we can capture our timer. How much can >this be? As far as I see we don't have a capture mode for the HPET. But, >if we have to do it in software, we get more than 100 ns jitter. I just >measured 60-80 ns for a instruction cache miss, with Intels mlc software. >Overall I would guess > 500 ns, are there measurements on this? >This then defines some lower bound of what can be archived for >synchronizing the clock off the OS. Also hardware time stamping on a >dedicated PPS card (or PTP ethernet card) does not help unless the clock >on the card is synchronized to the clock used by the OS. >.... On Intel processors newer than the core2 it appears that there are no input pins that can be polled or used as an interrupt source. Current AMD processors still have LINT0 and LINT1 pins that can be polled through the XAPIC interface, but using them would require modifying the motherboard to temporarily disconnect them from their usual sources. It seems likely that a transition on those pins could be timestamped within a few tens of ns. An option for Linux kernel 4.6 or newer on Skylake processors when using PTP on the chipset ethernet would be to use the PTP_SYS_OFFSET_PRECISE ioctl to eliminate the problem you mentioned in the quoted text. See https://patchwork.kernel.org/patch/8497611/ https://patchwork.kernel.org/patch/8497621/ https://lwn.net/Articles/670081/ It looks like the chipset and CPU share an "Always Running Timer" which the chipset can sample in sync with the ethernet PTP, USB[2] and audio timing registers, and it has a defined relationship to the CPU TSC. I didn't spot a way to timestamp a GPIO pin transition in the chipset datasheet, so this sort of kicks the can down the road when using PTP. It might be possible to use this to improve PPS over USB. Skylake chipsets have three 16550-compatible UARTS, so depending on how they are connected internally (might be on the LPC bus) the line status signals may have very low read latency. It looks like they are supported in Linux 4.3 and newer; has anyone tested them? For now I've focused on using parallel port cards for PPS capture. The machine I'm using has a Q67 chipset and i5-2500 processor (the Q67 is one of the last chipsets with native parallel PCI). There is no option in the BIOS to disable clock spread-spectrum so it's probably active. The system XO was replaced with an IDT525 so I can use 10MHz sources. I added a polling mode and PPS echo to the Linux 4.1 pps_parport driver. Using a Lava parallel PCI card, port reads and writes take an average of 962ns when done in a calibration loop with the processor TSC. For measurements I'm using a TIC hooked to the PPS input and echo output, and Miroslav Lichavar's ppsallan program[3]. The counter I'm using only has GPIB and I don't yet have an interface card, so I can only do short tests with it at the moment since I have to watch it. I've listed some results below with a test time of 1 minute. The system oscillator and PPS source is an Endrun Technologies Praecis Cf, so all the data below should be showing only the PPS capture error. --Driver in polling mode with no system load: The minimum observed interval between the PPS input and echo was 1.3us, maximum interval was 2.3us, eyeball-average 1.9us, and the 1t adev was 513ns. attachment: adev-praecis-lavanew-poll-noload-1min-04.plog --Driver in polling mode with high system load: min 1.3us, max 2.4us, avg 1.9us, 1t adev 549ns adev-praecis-lavanew-poll-load-1min-01.plog --Driver in interrupt mode with no system load: min 2.4us, max 3.1us, avg 2.7us, 1t adev 467ns adev-praecis-lavanew-int-noload-1min-01.plog --Driver in interrupt mode with high system load: min 2.4us, max 4.1us avg 2.8us, 1t adev 533ns adev-praecis-lavanew-int-load-1min-01.plog After subtracting the echo's port write time, polling mode can go below 400ns delay. This is below the 962ns read time, but is probably valid since the PPS edge could occur between when the card has decoded the read command and when it samples the port pins to place on the bus. 962ns latency seems very high for parallel PCI -- it should be able to achieve below 300ns. The lava card may be using wait cycles to throttle down to standard LPT speeds. The card uses an FPGA so it might be possible to improve its performance. It's also likely the processor-to-chipset link is adding significant latency. I also tested a PCIe parallel card that uses an OXPCIe952 chip. When attached to a PCH PCIe lane, read time was 1614ns and write time was 1626ns. Minimum obverved PPS echo delay was 1.9us when polling. When attached to a CPU lane, input&output time was 832ns. Over a 1-minute test in polling mode with no system load I observed a min. echo delay of 1.1us, max of 2.0us, average 1.6us adev-praecis-pciecpu-poll-noload-1min-01.plog 24-hours with no system load : adev-praecis-pciecpu-poll-noload-24h-01.plog I'll show some longer-term results from the counter once I get a GPIB card and write a logging program. [1] "Motivating Future Interconnects: A Differential Measurement Analysis of PCI Latency", Miller 2009 [2] 100-series-chipset-datasheet-vol-2.pdf section 18.2.83 [3] https://github.com/mlichvar/ppsallan

BC

Bob Camp

Tue, Feb 21, 2017 12:37 PM

Hi

On Feb 20, 2017, at 9:26 PM, Trevor N. qb4@comcast.net wrote:

SA6CID wrote:

....
So, I thought actually of the jitter added on the way between our
accurate source (GPS rx), until we can capture our timer. How much can
this be? As far as I see we don't have a capture mode for the HPET. But,
if we have to do it in software, we get more than 100 ns jitter. I just
measured 60-80 ns for a instruction cache miss, with Intels mlc software.
Overall I would guess > 500 ns, are there measurements on this?

This then defines some lower bound of what can be archived for
synchronizing the clock off the OS. Also hardware time stamping on a
dedicated PPS card (or PTP ethernet card) does not help unless the clock
on the card is synchronized to the clock used by the OS.
....

On Intel processors newer than the core2 it appears that there are no
input pins that can be polled or used as an interrupt source. Current
AMD processors still have LINT0 and LINT1 pins that can be polled
through the XAPIC interface, but using them would require modifying
the motherboard to temporarily disconnect them from their usual
sources. It seems likely that a transition on those pins could be
timestamped within a few tens of ns.

An option for Linux kernel 4.6 or newer on Skylake processors when
using PTP on the chipset ethernet would be to use the
PTP_SYS_OFFSET_PRECISE ioctl to eliminate the problem you mentioned in
the quoted text. See
https://patchwork.kernel.org/patch/8497611/
https://patchwork.kernel.org/patch/8497621/
https://lwn.net/Articles/670081/
It looks like the chipset and CPU share an "Always Running Timer"
which the chipset can sample in sync with the ethernet PTP, USB[2] and
audio timing registers, and it has a defined relationship to the CPU
TSC. I didn't spot a way to timestamp a GPIO pin transition in the
chipset datasheet, so this sort of kicks the can down the road when
using PTP. It might be possible to use this to improve PPS over USB.

Skylake chipsets have three 16550-compatible UARTS, so depending on
how they are connected internally (might be on the LPC bus) the line
status signals may have very low read latency. It looks like they are
supported in Linux 4.3 and newer; has anyone tested them?

For now I've focused on using parallel port cards for PPS capture. The
machine I'm using has a Q67 chipset and i5-2500 processor (the Q67 is
one of the last chipsets with native parallel PCI). There is no option
in the BIOS to disable clock spread-spectrum so it's probably active.
The system XO was replaced with an IDT525 so I can use 10MHz sources.
I added a polling mode and PPS echo to the Linux 4.1 pps_parport
driver.

Using a Lava parallel PCI card, port reads and writes take an average
of 962ns when done in a calibration loop with the processor TSC. For
measurements I'm using a TIC hooked to the PPS input and echo output,
and Miroslav Lichavar's ppsallan program[3]. The counter I'm using
only has GPIB and I don't yet have an interface card, so I can only do
short tests with it at the moment since I have to watch it. I've
listed some results below with a test time of 1 minute. The system
oscillator and PPS source is an Endrun Technologies Praecis Cf, so all
the data below should be showing only the PPS capture error.

Some 1588 chip sets have (or had, I haven’t looked recently) external sync pins.
This does get into the whole, what’s a motherboard / what’s a peripheral
debate. Plugging in a 1588 card to get that pin probably no longer counts
as a simple solution. If plugging in a card does count then that opens up
a lot of possible options.

Bob