time-nuts@lists.febo.com

Discussion of precise time and frequency measurement

View all threads

FPGA based TDC (was: Time nuts, off line questions.)

AK
Attila Kinali
Sat, Dec 17, 2016 11:44 AM

Hi Dan,

Answering to the mailinglist, because I think it's of general interest.

On Fri, 16 Dec 2016 08:41:37 -0500
Dan Kemppainen dan@irtelemetrics.com wrote:

My question is in regards to the FPGA and how the logic was configured.
I have not seen examples of asynchronous logic, as a true ring
oscillator would require (maybe there out there, and I've missed them!).
Are you playing some tricks in the FPGA, or running a true ring oscillator?

The general way is to use some logic resource and configure it as
a single combinatorial line that delays the signal. The idea to use
FPGAs instead of ASICs have been around for a long time (see e.g. [1]).
The early attempts used the LUTs and connected them into a string.
I don't know who came up with it, but at some point (early 2000s)
people switched over to use the adder chain that most (all?) modern
FPGAs have and got a boost in resolution, as the adder structure uses
short-cuts through the logic fabric.

Specifying such a structure is usually easy. The hard part is to
convince the synthesizer to not "optimize" it away. You can see
in [2] two ways how to achieve that:

  • at line 74 there are the "preserve" attributes, telling the
    synthesiser that those signals are to be kept, no matter what.

  • at line 95 the carry adder chain is instanciated explicitly
    as a low level primitive.

The latter is there so I really get the adder chain I was looking
for (also telling the synthesizer that those should stick around).
The former prevents any derived signals from being optimized away
because they look "constant".

To get good performance (ie uniform and small bin size) one needs to
hand place (or at least restrict the placement) of the adder chain.
This is done using the synthesizer directly (ie there are no VHDL
directives for that)

Depending on the software you are using, you might also need to
tell the timing analysis to ignore all signals involved in the
delay line up to the capture register.

As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.

There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.

The CERN TDC Core user guide[3] explains how an additional ring oscillator
is used to compensate for the change in delay of the delay line at runtime.

In any case, if you are able to achieve the same results with an Altera
device, I'd be interested in learning how that's done.

Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure
it should work the same on a Cyclone5. For the other families you would
probably need to modify the delay line and the ring oscillator.

BTW: In case I was not clear enough, the 22ps I mentined are the "minimum"
bin size of the delay line. Most of the bins have a size of 22ps and 44ps.
There are several with 0ps size and with ~100ps size. The control logic of
the TDC Core does an automatic calibration step at startup to measure the
bin sizes and compensate them. The final resolution is bounded by the maximum
bin size encountered, but most of the range will have a better resolution.
We did not measure the performance of the TDC itself (lack of equipment and
the setup was more than just iffy), so I cannot give you any hard numbers on
what can be achieved with a Cyclone4.

Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.

		Attila Kinali

[1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with
200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997

[2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd

[3] "Time to Digital Converter Core for Spartan-6 FPGAs",
by Sébastien Bourdeauducq, 2011
http://www.ohwr.org/documents/98

[4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond
Its Cell Delay", by Wu and Shi, 2008
http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf

--
Malek's Law:
Any simple idea will be worded in the most complicated way.

Hi Dan, Answering to the mailinglist, because I think it's of general interest. On Fri, 16 Dec 2016 08:41:37 -0500 Dan Kemppainen <dan@irtelemetrics.com> wrote: > My question is in regards to the FPGA and how the logic was configured. > I have not seen examples of asynchronous logic, as a true ring > oscillator would require (maybe there out there, and I've missed them!). > Are you playing some tricks in the FPGA, or running a true ring oscillator? The general way is to use some logic resource and configure it as a single combinatorial line that delays the signal. The idea to use FPGAs instead of ASICs have been around for a long time (see e.g. [1]). The early attempts used the LUTs and connected them into a string. I don't know who came up with it, but at some point (early 2000s) people switched over to use the adder chain that most (all?) modern FPGAs have and got a boost in resolution, as the adder structure uses short-cuts through the logic fabric. Specifying such a structure is usually easy. The hard part is to convince the synthesizer to not "optimize" it away. You can see in [2] two ways how to achieve that: * at line 74 there are the "preserve" attributes, telling the synthesiser that those signals are to be kept, no matter what. * at line 95 the carry adder chain is instanciated explicitly as a low level primitive. The latter is there so I really get the adder chain I was looking for (also telling the synthesizer that those should stick around). The former prevents any derived signals from being optimized away because they look "constant". To get good performance (ie uniform and small bin size) one needs to hand place (or at least restrict the placement) of the adder chain. This is done using the synthesizer directly (ie there are no VHDL directives for that) Depending on the software you are using, you might also need to tell the timing analysis to ignore all signals involved in the delay line up to the capture register. > As I understand, the GP2x chips have a true ring oscillator, and it's > period PLL'd to the main crystal. I'm not completely sure how control of > the period of the ring oscillator is achieved, but I believe it's by > drive voltage, and/or varactor diode type components in the ring. There is no direct control. The ring defines the oscillation frequency by its length. Yes, all changes in temperature and voltage affect the delay line and thus the TDC. The CERN TDC Core user guide[3] explains how an additional ring oscillator is used to compensate for the change in delay of the delay line at runtime. > In any case, if you are able to achieve the same results with an Altera > device, I'd be interested in learning how that's done. Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure it should work the same on a Cyclone5. For the other families you would probably need to modify the delay line and the ring oscillator. BTW: In case I was not clear enough, the 22ps I mentined are the "minimum" bin size of the delay line. Most of the bins have a size of 22ps and 44ps. There are several with 0ps size and with ~100ps size. The control logic of the TDC Core does an automatic calibration step at startup to measure the bin sizes and compensate them. The final resolution is bounded by the maximum bin size encountered, but most of the range will have a better resolution. We did not measure the performance of the TDC itself (lack of equipment and the setup was more than just iffy), so I cannot give you any hard numbers on what can be achieved with a Cyclone4. Sébastian told me that for the original TDC design (based on a Spartan3 IIRC) they didn't employ any method for increasing the resolution by compensating for different bin sizes, as they got pretty close to the minimal bin size anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4] approach might be a good idea. Unfortunately, we didn't have the time to try it. Attila Kinali [1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with 200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997 [2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd [3] "Time to Digital Converter Core for Spartan-6 FPGAs", by Sébastien Bourdeauducq, 2011 http://www.ohwr.org/documents/98 [4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond Its Cell Delay", by Wu and Shi, 2008 http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf -- Malek's Law: Any simple idea will be worded in the most complicated way.
AK
Attila Kinali
Sat, Dec 17, 2016 12:33 PM

On Sat, 17 Dec 2016 12:44:55 +0100
Attila Kinali attila@kinali.ch wrote:

As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.

There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.

Addendum:

In ASICs, the inverter in the delay line is directly modified to
control the switching speed. Most methods work by changing or limiting
current through one part of the inverter to slow it down.
A good overview over many ways how to achieve that with differential
inverters can be found in [1]

			Attila Kinali

[1] "CMOS Differential Ring Oscillators", by Jalik, Ibne Reaz, Mohd Ali, 2013
https://doi.org/10.1109/MMM.2013.2259401

--
Malek's Law:
Any simple idea will be worded in the most complicated way.

On Sat, 17 Dec 2016 12:44:55 +0100 Attila Kinali <attila@kinali.ch> wrote: > > As I understand, the GP2x chips have a true ring oscillator, and it's > > period PLL'd to the main crystal. I'm not completely sure how control of > > the period of the ring oscillator is achieved, but I believe it's by > > drive voltage, and/or varactor diode type components in the ring. > > There is no direct control. The ring defines the oscillation frequency > by its length. Yes, all changes in temperature and voltage affect the > delay line and thus the TDC. Addendum: In ASICs, the inverter in the delay line is directly modified to control the switching speed. Most methods work by changing or limiting current through one part of the inverter to slow it down. A good overview over many ways how to achieve that with differential inverters can be found in [1] Attila Kinali [1] "CMOS Differential Ring Oscillators", by Jalik, Ibne Reaz, Mohd Ali, 2013 https://doi.org/10.1109/MMM.2013.2259401 -- Malek's Law: Any simple idea will be worded in the most complicated way.
BC
Bob Camp
Sat, Dec 17, 2016 3:48 PM

Hi

On Dec 17, 2016, at 6:44 AM, Attila Kinali attila@kinali.ch wrote:

Hi Dan,

Answering to the mailinglist, because I think it's of general interest.

On Fri, 16 Dec 2016 08:41:37 -0500
Dan Kemppainen dan@irtelemetrics.com wrote:

My question is in regards to the FPGA and how the logic was configured.
I have not seen examples of asynchronous logic, as a true ring
oscillator would require (maybe there out there, and I've missed them!).
Are you playing some tricks in the FPGA, or running a true ring oscillator?

The general way is to use some logic resource and configure it as
a single combinatorial line that delays the signal. The idea to use
FPGAs instead of ASICs have been around for a long time (see e.g. [1]).
The early attempts used the LUTs and connected them into a string.
I don't know who came up with it, but at some point (early 2000s)
people switched over to use the adder chain that most (all?) modern
FPGAs have and got a boost in resolution, as the adder structure uses
short-cuts through the logic fabric.

Specifying such a structure is usually easy. The hard part is to
convince the synthesizer to not "optimize" it away. You can see
in [2] two ways how to achieve that:

  • at line 74 there are the "preserve" attributes, telling the
    synthesiser that those signals are to be kept, no matter what.

  • at line 95 the carry adder chain is instanciated explicitly
    as a low level primitive.

The latter is there so I really get the adder chain I was looking
for (also telling the synthesizer that those should stick around).
The former prevents any derived signals from being optimized away
because they look "constant".

To get good performance (ie uniform and small bin size) one needs to
hand place (or at least restrict the placement) of the adder chain.
This is done using the synthesizer directly (ie there are no VHDL
directives for that)

Depending on the software you are using, you might also need to
tell the timing analysis to ignore all signals involved in the
delay line up to the capture register.

As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.

There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.

The CERN TDC Core user guide[3] explains how an additional ring oscillator
is used to compensate for the change in delay of the delay line at runtime.

In any case, if you are able to achieve the same results with an Altera
device, I'd be interested in learning how that's done.

Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure
it should work the same on a Cyclone5. For the other families you would
probably need to modify the delay line and the ring oscillator.

BTW: In case I was not clear enough, the 22ps I mentined are the "minimum"
bin size of the delay line. Most of the bins have a size of 22ps and 44ps.
There are several with 0ps size and with ~100ps size. The control logic of
the TDC Core does an automatic calibration step at startup to measure the
bin sizes and compensate them. The final resolution is bounded by the maximum
bin size encountered, but most of the range will have a better resolution.
We did not measure the performance of the TDC itself (lack of equipment and
the setup was more than just iffy), so I cannot give you any hard numbers on
what can be achieved with a Cyclone4.

Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.

If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect
it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns
you get. The data is not a nice looking set of “all zeros before” and “all ones after” some
point. Figuring out which bin came first with a random calibration approach requires a
bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you
still get bins that appear to be equivalent (same number of zeros), but have them in a
different order.

Bob

		Attila Kinali

[1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with
200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997

[2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd

[3] "Time to Digital Converter Core for Spartan-6 FPGAs",
by Sébastien Bourdeauducq, 2011
http://www.ohwr.org/documents/98

[4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond
Its Cell Delay", by Wu and Shi, 2008
http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf

--
Malek's Law:
Any simple idea will be worded in the most complicated way.


time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.

Hi > On Dec 17, 2016, at 6:44 AM, Attila Kinali <attila@kinali.ch> wrote: > > Hi Dan, > > Answering to the mailinglist, because I think it's of general interest. > > On Fri, 16 Dec 2016 08:41:37 -0500 > Dan Kemppainen <dan@irtelemetrics.com> wrote: > >> My question is in regards to the FPGA and how the logic was configured. >> I have not seen examples of asynchronous logic, as a true ring >> oscillator would require (maybe there out there, and I've missed them!). >> Are you playing some tricks in the FPGA, or running a true ring oscillator? > > The general way is to use some logic resource and configure it as > a single combinatorial line that delays the signal. The idea to use > FPGAs instead of ASICs have been around for a long time (see e.g. [1]). > The early attempts used the LUTs and connected them into a string. > I don't know who came up with it, but at some point (early 2000s) > people switched over to use the adder chain that most (all?) modern > FPGAs have and got a boost in resolution, as the adder structure uses > short-cuts through the logic fabric. > > Specifying such a structure is usually easy. The hard part is to > convince the synthesizer to not "optimize" it away. You can see > in [2] two ways how to achieve that: > > * at line 74 there are the "preserve" attributes, telling the > synthesiser that those signals are to be kept, no matter what. > > * at line 95 the carry adder chain is instanciated explicitly > as a low level primitive. > > The latter is there so I really get the adder chain I was looking > for (also telling the synthesizer that those should stick around). > The former prevents any derived signals from being optimized away > because they look "constant". > > To get good performance (ie uniform and small bin size) one needs to > hand place (or at least restrict the placement) of the adder chain. > This is done using the synthesizer directly (ie there are no VHDL > directives for that) > > Depending on the software you are using, you might also need to > tell the timing analysis to ignore all signals involved in the > delay line up to the capture register. > >> As I understand, the GP2x chips have a true ring oscillator, and it's >> period PLL'd to the main crystal. I'm not completely sure how control of >> the period of the ring oscillator is achieved, but I believe it's by >> drive voltage, and/or varactor diode type components in the ring. > > There is no direct control. The ring defines the oscillation frequency > by its length. Yes, all changes in temperature and voltage affect the > delay line and thus the TDC. > > The CERN TDC Core user guide[3] explains how an additional ring oscillator > is used to compensate for the change in delay of the delay line at runtime. > >> In any case, if you are able to achieve the same results with an Altera >> device, I'd be interested in learning how that's done. > > Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure > it should work the same on a Cyclone5. For the other families you would > probably need to modify the delay line and the ring oscillator. > > BTW: In case I was not clear enough, the 22ps I mentined are the "minimum" > bin size of the delay line. Most of the bins have a size of 22ps and 44ps. > There are several with 0ps size and with ~100ps size. The control logic of > the TDC Core does an automatic calibration step at startup to measure the > bin sizes and compensate them. The final resolution is bounded by the maximum > bin size encountered, but most of the range will have a better resolution. > We did not measure the performance of the TDC itself (lack of equipment and > the setup was more than just iffy), so I cannot give you any hard numbers on > what can be achieved with a Cyclone4. > > Sébastian told me that for the original TDC design (based on a Spartan3 IIRC) > they didn't employ any method for increasing the resolution by compensating > for different bin sizes, as they got pretty close to the minimal bin size > anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4] > approach might be a good idea. Unfortunately, we didn't have the time to > try it. If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns you get. The data is *not* a nice looking set of “all zeros before” and “all ones after” some point. Figuring out which bin came first with a random calibration approach requires a bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you still get bins that appear to be equivalent (same number of zeros), but have them in a different order. Bob > > > Attila Kinali > > [1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with > 200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997 > > [2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd > > [3] "Time to Digital Converter Core for Spartan-6 FPGAs", > by Sébastien Bourdeauducq, 2011 > http://www.ohwr.org/documents/98 > > [4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond > Its Cell Delay", by Wu and Shi, 2008 > http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf > > -- > Malek's Law: > Any simple idea will be worded in the most complicated way. > _______________________________________________ > time-nuts mailing list -- time-nuts@febo.com > To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts > and follow the instructions there.
AK
Attila Kinali
Sun, Dec 18, 2016 12:46 PM

On Sat, 17 Dec 2016 10:48:35 -0500
Bob Camp kb8tq@n1k.org wrote:

Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.

If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect
it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns
you get. The data is not a nice looking set of “all zeros before” and “all ones after” some
point. Figuring out which bin came first with a random calibration approach requires a
bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you
still get bins that appear to be equivalent (same number of zeros), but have them in a
different order.

Yes, the decoding of wave union is anything but trivial. The main issue
is that the ordering of the thermometer encoded bits is not the same as
the ordering of the delay line. This is due to the badly controlled delays
within the delay line and the clock tree causing unequal capture time for
the registers. This leads to "bubbles" in the encoding. The original TDC Core
code reordered the bits according to the timing simulation results.
For the Cyclone4 we encountered the problem, that the ordering is not stable.
i.e. the ordering changes over time and is dependent on the logic
surrounding the delay line. Thus we reverted to only count the zeros/ones as
an approximation. This of course does not work with wave union and an
other scheme as to be devised. One way to do it would probably be to make
the distance between the edges long enough such that the bubles can be
matched to a single edge. Then one could count the zeros/ones in the area
around the presumed edge position. But as I have not tried this, I do not
know how well it would work in reality.

			Attila Kinali

--
Malek's Law:
Any simple idea will be worded in the most complicated way.

On Sat, 17 Dec 2016 10:48:35 -0500 Bob Camp <kb8tq@n1k.org> wrote: > > Sébastian told me that for the original TDC design (based on a Spartan3 IIRC) > > they didn't employ any method for increasing the resolution by compensating > > for different bin sizes, as they got pretty close to the minimal bin size > > anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4] > > approach might be a good idea. Unfortunately, we didn't have the time to > > try it. > > If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect > it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns > you get. The data is *not* a nice looking set of “all zeros before” and “all ones after” some > point. Figuring out which bin came first with a random calibration approach requires a > bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you > still get bins that appear to be equivalent (same number of zeros), but have them in a > different order. Yes, the decoding of wave union is anything but trivial. The main issue is that the ordering of the thermometer encoded bits is not the same as the ordering of the delay line. This is due to the badly controlled delays within the delay line and the clock tree causing unequal capture time for the registers. This leads to "bubbles" in the encoding. The original TDC Core code reordered the bits according to the timing simulation results. For the Cyclone4 we encountered the problem, that the ordering is not stable. i.e. the ordering changes over time and is dependent on the logic surrounding the delay line. Thus we reverted to only count the zeros/ones as an approximation. This of course does not work with wave union and an other scheme as to be devised. One way to do it would probably be to make the distance between the edges long enough such that the bubles can be matched to a single edge. Then one could count the zeros/ones in the area around the presumed edge position. But as I have not tried this, I do not know how well it would work in reality. Attila Kinali -- Malek's Law: Any simple idea will be worded in the most complicated way.
BC
Bob Camp
Sun, Dec 18, 2016 2:26 PM

Hi

On Dec 18, 2016, at 7:46 AM, Attila Kinali attila@kinali.ch wrote:

On Sat, 17 Dec 2016 10:48:35 -0500
Bob Camp kb8tq@n1k.org wrote:

Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.

If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect
it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns
you get. The data is not a nice looking set of “all zeros before” and “all ones after” some
point. Figuring out which bin came first with a random calibration approach requires a
bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you
still get bins that appear to be equivalent (same number of zeros), but have them in a
different order.

Yes, the decoding of wave union is anything but trivial. The main issue
is that the ordering of the thermometer encoded bits is not the same as
the ordering of the delay line. This is due to the badly controlled delays
within the delay line and the clock tree causing unequal capture time for
the registers. This leads to "bubbles" in the encoding. The original TDC Core
code reordered the bits according to the timing simulation results.
For the Cyclone4 we encountered the problem, that the ordering is not stable.
i.e. the ordering changes over time and is dependent on the logic
surrounding the delay line. Thus we reverted to only count the zeros/ones as
an approximation. This of course does not work with wave union and an
other scheme as to be devised. One way to do it would probably be to make
the distance between the edges long enough such that the bubles can be
matched to a single edge. Then one could count the zeros/ones in the area
around the presumed edge position. But as I have not tried this, I do not
know how well it would work in reality.

It does work, but it is a bit messy. The number of “indeterminate” states starts to get
a bit alarming. Without a fancy system to actually figure out what is what ( = a generator
that steps off at 1 ps resolution with jitter < 0.1 ps) there is a lot of “faith”involved.

You also can do the same things with multiple delay lines. The assumption is that the
odd stuff will not happen at the same places. At least with my limited skills, that turned
out to be a poor assumption. The problems occurred to often and they pretty much
always lined up.

The scary part is that you can come up with issues like crosstalk that can turn the
system into something a bit non-random. There have to be sensitivities like that
on the die. It’s just a question of if they are large enough to matter in this case …

Back to shopping the auction sites for that generator. I’ve got $20 as my max bid :)

Bob

			Attila Kinali

--
Malek's Law:
Any simple idea will be worded in the most complicated way.


time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.

Hi > On Dec 18, 2016, at 7:46 AM, Attila Kinali <attila@kinali.ch> wrote: > > On Sat, 17 Dec 2016 10:48:35 -0500 > Bob Camp <kb8tq@n1k.org> wrote: > >>> Sébastian told me that for the original TDC design (based on a Spartan3 IIRC) >>> they didn't employ any method for increasing the resolution by compensating >>> for different bin sizes, as they got pretty close to the minimal bin size >>> anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4] >>> approach might be a good idea. Unfortunately, we didn't have the time to >>> try it. >> >> If you do go to the Wave Union, it does work on Cyclone 5’s and Spartan 6’s. I suspect >> it will work on the newer stuff as well. The big issue you run into is “decoding” the patterns >> you get. The data is *not* a nice looking set of “all zeros before” and “all ones after” some >> point. Figuring out which bin came first with a random calibration approach requires a >> bit of a leap of faith (count the zeros / count the ones) to order them. Even with that you >> still get bins that appear to be equivalent (same number of zeros), but have them in a >> different order. > > Yes, the decoding of wave union is anything but trivial. The main issue > is that the ordering of the thermometer encoded bits is not the same as > the ordering of the delay line. This is due to the badly controlled delays > within the delay line and the clock tree causing unequal capture time for > the registers. This leads to "bubbles" in the encoding. The original TDC Core > code reordered the bits according to the timing simulation results. > For the Cyclone4 we encountered the problem, that the ordering is not stable. > i.e. the ordering changes over time and is dependent on the logic > surrounding the delay line. Thus we reverted to only count the zeros/ones as > an approximation. This of course does not work with wave union and an > other scheme as to be devised. One way to do it would probably be to make > the distance between the edges long enough such that the bubles can be > matched to a single edge. Then one could count the zeros/ones in the area > around the presumed edge position. But as I have not tried this, I do not > know how well it would work in reality. It does work, but it is a bit messy. The number of “indeterminate” states starts to get a bit alarming. Without a fancy system to actually figure out what is what ( = a generator that steps off at 1 ps resolution with jitter < 0.1 ps) there is a lot of “faith”involved. You also can do the same things with multiple delay lines. The assumption is that the odd stuff will not happen at the same places. At least with my limited skills, that turned out to be a poor assumption. The problems occurred to often and they pretty much *always* lined up. The scary part is that you can come up with issues like crosstalk that can turn the system into something a bit non-random. There have to be sensitivities like that on the die. It’s just a question of if they are large enough to matter in this case … Back to shopping the auction sites for that generator. I’ve got $20 as my max bid :) Bob > > Attila Kinali > > -- > Malek's Law: > Any simple idea will be worded in the most complicated way. > _______________________________________________ > time-nuts mailing list -- time-nuts@febo.com > To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts > and follow the instructions there.