SA6CID wrote:
....
So, I thought actually of the jitter added on the way between our
accurate source (GPS rx), until we can capture our timer. How much can
this be? As far as I see we don't have a capture mode for the HPET. But,
if we have to do it in software, we get more than 100 ns jitter. I just
measured 60-80 ns for a instruction cache miss, with Intels mlc software.
Overall I would guess > 500 ns, are there measurements on this?
This then defines some lower bound of what can be archived for
synchronizing the clock off the OS. Also hardware time stamping on a
dedicated PPS card (or PTP ethernet card) does not help unless the clock
on the card is synchronized to the clock used by the OS.
....
On Intel processors newer than the core2 it appears that there are no
input pins that can be polled or used as an interrupt source. Current
AMD processors still have LINT0 and LINT1 pins that can be polled
through the XAPIC interface, but using them would require modifying
the motherboard to temporarily disconnect them from their usual
sources. It seems likely that a transition on those pins could be
timestamped within a few tens of ns.
An option for Linux kernel 4.6 or newer on Skylake processors when
using PTP on the chipset ethernet would be to use the
PTP_SYS_OFFSET_PRECISE ioctl to eliminate the problem you mentioned in
the quoted text. See
https://patchwork.kernel.org/patch/8497611/
https://patchwork.kernel.org/patch/8497621/
https://lwn.net/Articles/670081/
It looks like the chipset and CPU share an "Always Running Timer"
which the chipset can sample in sync with the ethernet PTP, USB[2] and
audio timing registers, and it has a defined relationship to the CPU
TSC. I didn't spot a way to timestamp a GPIO pin transition in the
chipset datasheet, so this sort of kicks the can down the road when
using PTP. It might be possible to use this to improve PPS over USB.
Skylake chipsets have three 16550-compatible UARTS, so depending on
how they are connected internally (might be on the LPC bus) the line
status signals may have very low read latency. It looks like they are
supported in Linux 4.3 and newer; has anyone tested them?
For now I've focused on using parallel port cards for PPS capture. The
machine I'm using has a Q67 chipset and i5-2500 processor (the Q67 is
one of the last chipsets with native parallel PCI). There is no option
in the BIOS to disable clock spread-spectrum so it's probably active.
The system XO was replaced with an IDT525 so I can use 10MHz sources.
I added a polling mode and PPS echo to the Linux 4.1 pps_parport
driver.
Using a Lava parallel PCI card, port reads and writes take an average
of 962ns when done in a calibration loop with the processor TSC. For
measurements I'm using a TIC hooked to the PPS input and echo output,
and Miroslav Lichavar's ppsallan program[3]. The counter I'm using
only has GPIB and I don't yet have an interface card, so I can only do
short tests with it at the moment since I have to watch it. I've
listed some results below with a test time of 1 minute. The system
oscillator and PPS source is an Endrun Technologies Praecis Cf, so all
the data below should be showing only the PPS capture error.
--Driver in polling mode with no system load:
The minimum observed interval between the PPS input and echo was
1.3us, maximum interval was 2.3us, eyeball-average 1.9us, and the 1t
adev was 513ns. attachment:
adev-praecis-lavanew-poll-noload-1min-04.plog
--Driver in polling mode with high system load:
min 1.3us, max 2.4us, avg 1.9us, 1t adev 549ns
adev-praecis-lavanew-poll-load-1min-01.plog
--Driver in interrupt mode with no system load:
min 2.4us, max 3.1us, avg 2.7us, 1t adev 467ns
adev-praecis-lavanew-int-noload-1min-01.plog
--Driver in interrupt mode with high system load:
min 2.4us, max 4.1us avg 2.8us, 1t adev 533ns
adev-praecis-lavanew-int-load-1min-01.plog
After subtracting the echo's port write time, polling mode can go
below 400ns delay. This is below the 962ns read time, but is probably
valid since the PPS edge could occur between when the card has decoded
the read command and when it samples the port pins to place on the
bus. 962ns latency seems very high for parallel PCI -- it should be
able to achieve below 300ns. The lava card may be using wait cycles to
throttle down to standard LPT speeds. The card uses an FPGA so it
might be possible to improve its performance. It's also likely the
processor-to-chipset link is adding significant latency.
I also tested a PCIe parallel card that uses an OXPCIe952 chip.
When attached to a PCH PCIe lane, read time was 1614ns and write time
was 1626ns. Minimum obverved PPS echo delay was 1.9us when polling.
When attached to a CPU lane, input&output time was 832ns.
Over a 1-minute test in polling mode with no system load I observed a
min. echo delay of 1.1us, max of 2.0us, average 1.6us
adev-praecis-pciecpu-poll-noload-1min-01.plog
24-hours with no system load :
adev-praecis-pciecpu-poll-noload-24h-01.plog
I'll show some longer-term results from the counter once I get a GPIB
card and write a logging program.
[1] "Motivating Future Interconnects: A Differential Measurement
Analysis of PCI Latency", Miller 2009
[2] 100-series-chipset-datasheet-vol-2.pdf section 18.2.83
Hi
On Feb 20, 2017, at 9:26 PM, Trevor N. qb4@comcast.net wrote:
SA6CID wrote:
....
So, I thought actually of the jitter added on the way between our
accurate source (GPS rx), until we can capture our timer. How much can
this be? As far as I see we don't have a capture mode for the HPET. But,
if we have to do it in software, we get more than 100 ns jitter. I just
measured 60-80 ns for a instruction cache miss, with Intels mlc software.
Overall I would guess > 500 ns, are there measurements on this?
This then defines some lower bound of what can be archived for
synchronizing the clock off the OS. Also hardware time stamping on a
dedicated PPS card (or PTP ethernet card) does not help unless the clock
on the card is synchronized to the clock used by the OS.
....
On Intel processors newer than the core2 it appears that there are no
input pins that can be polled or used as an interrupt source. Current
AMD processors still have LINT0 and LINT1 pins that can be polled
through the XAPIC interface, but using them would require modifying
the motherboard to temporarily disconnect them from their usual
sources. It seems likely that a transition on those pins could be
timestamped within a few tens of ns.
An option for Linux kernel 4.6 or newer on Skylake processors when
using PTP on the chipset ethernet would be to use the
PTP_SYS_OFFSET_PRECISE ioctl to eliminate the problem you mentioned in
the quoted text. See
https://patchwork.kernel.org/patch/8497611/
https://patchwork.kernel.org/patch/8497621/
https://lwn.net/Articles/670081/
It looks like the chipset and CPU share an "Always Running Timer"
which the chipset can sample in sync with the ethernet PTP, USB[2] and
audio timing registers, and it has a defined relationship to the CPU
TSC. I didn't spot a way to timestamp a GPIO pin transition in the
chipset datasheet, so this sort of kicks the can down the road when
using PTP. It might be possible to use this to improve PPS over USB.
Skylake chipsets have three 16550-compatible UARTS, so depending on
how they are connected internally (might be on the LPC bus) the line
status signals may have very low read latency. It looks like they are
supported in Linux 4.3 and newer; has anyone tested them?
For now I've focused on using parallel port cards for PPS capture. The
machine I'm using has a Q67 chipset and i5-2500 processor (the Q67 is
one of the last chipsets with native parallel PCI). There is no option
in the BIOS to disable clock spread-spectrum so it's probably active.
The system XO was replaced with an IDT525 so I can use 10MHz sources.
I added a polling mode and PPS echo to the Linux 4.1 pps_parport
driver.
Using a Lava parallel PCI card, port reads and writes take an average
of 962ns when done in a calibration loop with the processor TSC. For
measurements I'm using a TIC hooked to the PPS input and echo output,
and Miroslav Lichavar's ppsallan program[3]. The counter I'm using
only has GPIB and I don't yet have an interface card, so I can only do
short tests with it at the moment since I have to watch it. I've
listed some results below with a test time of 1 minute. The system
oscillator and PPS source is an Endrun Technologies Praecis Cf, so all
the data below should be showing only the PPS capture error.
Some 1588 chip sets have (or had, I haven’t looked recently) external sync pins.
This does get into the whole, what’s a motherboard / what’s a peripheral
debate. Plugging in a 1588 card to get that pin probably no longer counts
as a simple solution. If plugging in a card does count then that opens up
a lot of possible options.
Bob
--Driver in polling mode with no system load:
The minimum observed interval between the PPS input and echo was
1.3us, maximum interval was 2.3us, eyeball-average 1.9us, and the 1t
adev was 513ns. attachment:
adev-praecis-lavanew-poll-noload-1min-04.plog
--Driver in polling mode with high system load:
min 1.3us, max 2.4us, avg 1.9us, 1t adev 549ns
adev-praecis-lavanew-poll-load-1min-01.plog
--Driver in interrupt mode with no system load:
min 2.4us, max 3.1us, avg 2.7us, 1t adev 467ns
adev-praecis-lavanew-int-noload-1min-01.plog
--Driver in interrupt mode with high system load:
min 2.4us, max 4.1us avg 2.8us, 1t adev 533ns
adev-praecis-lavanew-int-load-1min-01.plog
After subtracting the echo's port write time, polling mode can go
below 400ns delay. This is below the 962ns read time, but is probably
valid since the PPS edge could occur between when the card has decoded
the read command and when it samples the port pins to place on the
bus. 962ns latency seems very high for parallel PCI -- it should be
able to achieve below 300ns. The lava card may be using wait cycles to
throttle down to standard LPT speeds. The card uses an FPGA so it
might be possible to improve its performance. It's also likely the
processor-to-chipset link is adding significant latency.
I also tested a PCIe parallel card that uses an OXPCIe952 chip.
When attached to a PCH PCIe lane, read time was 1614ns and write time
was 1626ns. Minimum obverved PPS echo delay was 1.9us when polling.
When attached to a CPU lane, input&output time was 832ns.
Over a 1-minute test in polling mode with no system load I observed a
min. echo delay of 1.1us, max of 2.0us, average 1.6us
adev-praecis-pciecpu-poll-noload-1min-01.plog
24-hours with no system load :
adev-praecis-pciecpu-poll-noload-24h-01.plog
I'll show some longer-term results from the counter once I get a GPIB
card and write a logging program.
[1] "Motivating Future Interconnects: A Differential Measurement
Analysis of PCI Latency", Miller 2009
[2] 100-series-chipset-datasheet-vol-2.pdf section 18.2.83
[3] https://github.com/mlichvar/ppsallan
<adev.praecis-lavanew-poll-noload-1min-04.plog><adev.praecis-lavanew-poll-load-1min-01.plog><adev.praecis-lavanew-int-noload-1min-01.plog><adev.praecis-lavanew-int-load-1min-01.plog><adev.praecis-pciecpu-poll-noload-1min-01.plog><adev.praecis-pciecpu-poll-noload-24h-01.plog>_______________________________________________
time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.
On Tue, 21 Feb 2017 07:37:34 -0500, BC wrote:
Some 1588 chip sets have (or had, I havent looked recently) external sync pins.
This does get into the whole, whats a motherboard / whats a peripheral
debate. Plugging in a 1588 card to get that pin probably no longer counts
as a simple solution. If plugging in a card does count then that opens up
a lot of possible options.
Bob
I found while looking at the datasheets for newer Intel server
ethernet cards that they have the ability to timestamp GPIO pin
transitions, but none of them have their internal timebase
synchronized to a counter in the CPU. It looks like they are clocked
from a separate XO on the card. Maybe if it was synchronized to the
PCIe clock / BCLK they could take advantage of that new Always
Running Timer in Skylake processors. I'm surprised that Intel hasn't
made a big deal about it. Support for it was added in the Linux e1000e
driver early last year.
Hi
On Feb 21, 2017, at 9:50 PM, Trevor N. qb4@comcast.net wrote:
On Tue, 21 Feb 2017 07:37:34 -0500, BC wrote:
Some 1588 chip sets have (or had, I havent looked recently) external sync pins.
This does get into the whole, whats a motherboard / whats a peripheral
debate. Plugging in a 1588 card to get that pin probably no longer counts
as a simple solution. If plugging in a card does count then that opens up
a lot of possible options.
Bob
I found while looking at the datasheets for newer Intel server
ethernet cards that they have the ability to timestamp GPIO pin
transitions, but none of them have their internal timebase
synchronized to a counter in the CPU. It looks like they are clocked
from a separate XO on the card.
Everything on a motherboard traces back to this or that XO. None of the time
sources are anything that would get you excited as a frequency standard. Linking
them together can be exciting. Replacing this or that one with a better frequency
source is possible in some cases. If you are going to replace one, why not
replace several :)
Bob
Maybe if it was synchronized to the
PCIe clock / BCLK they could take advantage of that new Always
Running Timer in Skylake processors. I'm surprised that Intel hasn't
made a big deal about it. Support for it was added in the Linux e1000e
driver early last year.
time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.