Hi Dan,
Answering to the mailinglist, because I think it's of general interest.
On Fri, 16 Dec 2016 08:41:37 -0500
Dan Kemppainen dan@irtelemetrics.com wrote:
My question is in regards to the FPGA and how the logic was configured.
I have not seen examples of asynchronous logic, as a true ring
oscillator would require (maybe there out there, and I've missed them!).
Are you playing some tricks in the FPGA, or running a true ring oscillator?
The general way is to use some logic resource and configure it as
a single combinatorial line that delays the signal. The idea to use
FPGAs instead of ASICs have been around for a long time (see e.g. [1]).
The early attempts used the LUTs and connected them into a string.
I don't know who came up with it, but at some point (early 2000s)
people switched over to use the adder chain that most (all?) modern
FPGAs have and got a boost in resolution, as the adder structure uses
short-cuts through the logic fabric.
Specifying such a structure is usually easy. The hard part is to
convince the synthesizer to not "optimize" it away. You can see
in [2] two ways how to achieve that:
at line 74 there are the "preserve" attributes, telling the
synthesiser that those signals are to be kept, no matter what.
at line 95 the carry adder chain is instanciated explicitly
as a low level primitive.
The latter is there so I really get the adder chain I was looking
for (also telling the synthesizer that those should stick around).
The former prevents any derived signals from being optimized away
because they look "constant".
To get good performance (ie uniform and small bin size) one needs to
hand place (or at least restrict the placement) of the adder chain.
This is done using the synthesizer directly (ie there are no VHDL
directives for that)
Depending on the software you are using, you might also need to
tell the timing analysis to ignore all signals involved in the
delay line up to the capture register.
As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.
There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.
The CERN TDC Core user guide[3] explains how an additional ring oscillator
is used to compensate for the change in delay of the delay line at runtime.
In any case, if you are able to achieve the same results with an Altera
device, I'd be interested in learning how that's done.
Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure
it should work the same on a Cyclone5. For the other families you would
probably need to modify the delay line and the ring oscillator.
BTW: In case I was not clear enough, the 22ps I mentined are the "minimum"
bin size of the delay line. Most of the bins have a size of 22ps and 44ps.
There are several with 0ps size and with ~100ps size. The control logic of
the TDC Core does an automatic calibration step at startup to measure the
bin sizes and compensate them. The final resolution is bounded by the maximum
bin size encountered, but most of the range will have a better resolution.
We did not measure the performance of the TDC itself (lack of equipment and
the setup was more than just iffy), so I cannot give you any hard numbers on
what can be achieved with a Cyclone4.
Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.
Attila Kinali
[1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with
200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997
[2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd
[3] "Time to Digital Converter Core for Spartan-6 FPGAs",
by Sébastien Bourdeauducq, 2011
http://www.ohwr.org/documents/98
[4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond
Its Cell Delay", by Wu and Shi, 2008
http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf
--
Malek's Law:
Any simple idea will be worded in the most complicated way.
On Sat, 17 Dec 2016 12:44:55 +0100
Attila Kinali attila@kinali.ch wrote:
As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.
There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.
Addendum:
In ASICs, the inverter in the delay line is directly modified to
control the switching speed. Most methods work by changing or limiting
current through one part of the inverter to slow it down.
A good overview over many ways how to achieve that with differential
inverters can be found in [1]
Attila Kinali
[1] "CMOS Differential Ring Oscillators", by Jalik, Ibne Reaz, Mohd Ali, 2013
https://doi.org/10.1109/MMM.2013.2259401
--
Malek's Law:
Any simple idea will be worded in the most complicated way.
Hi
On Dec 17, 2016, at 6:44 AM, Attila Kinali attila@kinali.ch wrote:
Hi Dan,
Answering to the mailinglist, because I think it's of general interest.
On Fri, 16 Dec 2016 08:41:37 -0500
Dan Kemppainen dan@irtelemetrics.com wrote:
My question is in regards to the FPGA and how the logic was configured.
I have not seen examples of asynchronous logic, as a true ring
oscillator would require (maybe there out there, and I've missed them!).
Are you playing some tricks in the FPGA, or running a true ring oscillator?
The general way is to use some logic resource and configure it as
a single combinatorial line that delays the signal. The idea to use
FPGAs instead of ASICs have been around for a long time (see e.g. [1]).
The early attempts used the LUTs and connected them into a string.
I don't know who came up with it, but at some point (early 2000s)
people switched over to use the adder chain that most (all?) modern
FPGAs have and got a boost in resolution, as the adder structure uses
short-cuts through the logic fabric.
Specifying such a structure is usually easy. The hard part is to
convince the synthesizer to not "optimize" it away. You can see
in [2] two ways how to achieve that:
at line 74 there are the "preserve" attributes, telling the
synthesiser that those signals are to be kept, no matter what.
at line 95 the carry adder chain is instanciated explicitly
as a low level primitive.
The latter is there so I really get the adder chain I was looking
for (also telling the synthesizer that those should stick around).
The former prevents any derived signals from being optimized away
because they look "constant".
To get good performance (ie uniform and small bin size) one needs to
hand place (or at least restrict the placement) of the adder chain.
This is done using the synthesizer directly (ie there are no VHDL
directives for that)
Depending on the software you are using, you might also need to
tell the timing analysis to ignore all signals involved in the
delay line up to the capture register.
As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.
There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.
The CERN TDC Core user guide[3] explains how an additional ring oscillator
is used to compensate for the change in delay of the delay line at runtime.
In any case, if you are able to achieve the same results with an Altera
device, I'd be interested in learning how that's done.
Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure
it should work the same on a Cyclone5. For the other families you would
probably need to modify the delay line and the ring oscillator.
BTW: In case I was not clear enough, the 22ps I mentined are the "minimum"
bin size of the delay line. Most of the bins have a size of 22ps and 44ps.
There are several with 0ps size and with ~100ps size. The control logic of
the TDC Core does an automatic calibration step at startup to measure the
bin sizes and compensate them. The final resolution is bounded by the maximum
bin size encountered, but most of the range will have a better resolution.
We did not measure the performance of the TDC itself (lack of equipment and
the setup was more than just iffy), so I cannot give you any hard numbers on
what can be achieved with a Cyclone4.
Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.
If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect
it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns
you get. The data is not a nice looking set of “all zeros before” and “all ones after” some
point. Figuring out which bin came first with a random calibration approach requires a
bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you
still get bins that appear to be equivalent (same number of zeros), but have them in a
different order.
Bob
Attila Kinali
[1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with
200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997
[2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd
[3] "Time to Digital Converter Core for Spartan-6 FPGAs",
by Sébastien Bourdeauducq, 2011
http://www.ohwr.org/documents/98
[4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond
Its Cell Delay", by Wu and Shi, 2008
http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf
--
Malek's Law:
Any simple idea will be worded in the most complicated way.
time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.
On Sat, 17 Dec 2016 10:48:35 -0500
Bob Camp kb8tq@n1k.org wrote:
Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.
If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect
it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns
you get. The data is not a nice looking set of “all zeros before” and “all ones after” some
point. Figuring out which bin came first with a random calibration approach requires a
bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you
still get bins that appear to be equivalent (same number of zeros), but have them in a
different order.
Yes, the decoding of wave union is anything but trivial. The main issue
is that the ordering of the thermometer encoded bits is not the same as
the ordering of the delay line. This is due to the badly controlled delays
within the delay line and the clock tree causing unequal capture time for
the registers. This leads to "bubbles" in the encoding. The original TDC Core
code reordered the bits according to the timing simulation results.
For the Cyclone4 we encountered the problem, that the ordering is not stable.
i.e. the ordering changes over time and is dependent on the logic
surrounding the delay line. Thus we reverted to only count the zeros/ones as
an approximation. This of course does not work with wave union and an
other scheme as to be devised. One way to do it would probably be to make
the distance between the edges long enough such that the bubles can be
matched to a single edge. Then one could count the zeros/ones in the area
around the presumed edge position. But as I have not tried this, I do not
know how well it would work in reality.
Attila Kinali
--
Malek's Law:
Any simple idea will be worded in the most complicated way.
Hi
On Dec 18, 2016, at 7:46 AM, Attila Kinali attila@kinali.ch wrote:
On Sat, 17 Dec 2016 10:48:35 -0500
Bob Camp kb8tq@n1k.org wrote:
Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.
If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect
it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns
you get. The data is not a nice looking set of “all zeros before” and “all ones after” some
point. Figuring out which bin came first with a random calibration approach requires a
bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you
still get bins that appear to be equivalent (same number of zeros), but have them in a
different order.
Yes, the decoding of wave union is anything but trivial. The main issue
is that the ordering of the thermometer encoded bits is not the same as
the ordering of the delay line. This is due to the badly controlled delays
within the delay line and the clock tree causing unequal capture time for
the registers. This leads to "bubbles" in the encoding. The original TDC Core
code reordered the bits according to the timing simulation results.
For the Cyclone4 we encountered the problem, that the ordering is not stable.
i.e. the ordering changes over time and is dependent on the logic
surrounding the delay line. Thus we reverted to only count the zeros/ones as
an approximation. This of course does not work with wave union and an
other scheme as to be devised. One way to do it would probably be to make
the distance between the edges long enough such that the bubles can be
matched to a single edge. Then one could count the zeros/ones in the area
around the presumed edge position. But as I have not tried this, I do not
know how well it would work in reality.
It does work, but it is a bit messy. The number of “indeterminate” states starts to get
a bit alarming. Without a fancy system to actually figure out what is what ( = a generator
that steps off at 1 ps resolution with jitter < 0.1 ps) there is a lot of “faith”involved.
You also can do the same things with multiple delay lines. The assumption is that the
odd stuff will not happen at the same places. At least with my limited skills, that turned
out to be a poor assumption. The problems occurred to often and they pretty much
always lined up.
The scary part is that you can come up with issues like crosstalk that can turn the
system into something a bit non-random. There have to be sensitivities like that
on the die. It’s just a question of if they are large enough to matter in this case …
Back to shopping the auction sites for that generator. I’ve got $20 as my max bid :)
Bob
Attila Kinali
--
Malek's Law:
Any simple idea will be worded in the most complicated way.
time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.