Empathy List Archives

AK

Attila Kinali

Sat, Dec 17, 2016 11:44 AM

Hi Dan,

Answering to the mailinglist, because I think it's of general interest.

On Fri, 16 Dec 2016 08:41:37 -0500
Dan Kemppainen dan@irtelemetrics.com wrote:

My question is in regards to the FPGA and how the logic was configured.
I have not seen examples of asynchronous logic, as a true ring
oscillator would require (maybe there out there, and I've missed them!).
Are you playing some tricks in the FPGA, or running a true ring oscillator?

The general way is to use some logic resource and configure it as
a single combinatorial line that delays the signal. The idea to use
FPGAs instead of ASICs have been around for a long time (see e.g. [1]).
The early attempts used the LUTs and connected them into a string.
I don't know who came up with it, but at some point (early 2000s)
people switched over to use the adder chain that most (all?) modern
FPGAs have and got a boost in resolution, as the adder structure uses
short-cuts through the logic fabric.

Specifying such a structure is usually easy. The hard part is to
convince the synthesizer to not "optimize" it away. You can see
in [2] two ways how to achieve that:

at line 74 there are the "preserve" attributes, telling the
synthesiser that those signals are to be kept, no matter what.
at line 95 the carry adder chain is instanciated explicitly
as a low level primitive.

The latter is there so I really get the adder chain I was looking
for (also telling the synthesizer that those should stick around).
The former prevents any derived signals from being optimized away
because they look "constant".

To get good performance (ie uniform and small bin size) one needs to
hand place (or at least restrict the placement) of the adder chain.
This is done using the synthesizer directly (ie there are no VHDL
directives for that)

Depending on the software you are using, you might also need to
tell the timing analysis to ignore all signals involved in the
delay line up to the capture register.

As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.

There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.

The CERN TDC Core user guide[3] explains how an additional ring oscillator
is used to compensate for the change in delay of the delay line at runtime.

In any case, if you are able to achieve the same results with an Altera
device, I'd be interested in learning how that's done.

Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure
it should work the same on a Cyclone5. For the other families you would
probably need to modify the delay line and the ring oscillator.

BTW: In case I was not clear enough, the 22ps I mentined are the "minimum"
bin size of the delay line. Most of the bins have a size of 22ps and 44ps.
There are several with 0ps size and with ~100ps size. The control logic of
the TDC Core does an automatic calibration step at startup to measure the
bin sizes and compensate them. The final resolution is bounded by the maximum
bin size encountered, but most of the range will have a better resolution.
We did not measure the performance of the TDC itself (lack of equipment and
the setup was more than just iffy), so I cannot give you any hard numbers on
what can be achieved with a Cyclone4.

Sébastian told me that for the original TDC design (based on a Spartan3 IIRC)
they didn't employ any method for increasing the resolution by compensating
for different bin sizes, as they got pretty close to the minimal bin size
anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4]
approach might be a good idea. Unfortunately, we didn't have the time to
try it.

		Attila Kinali

[1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with
200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997

[2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd

[3] "Time to Digital Converter Core for Spartan-6 FPGAs",
by Sébastien Bourdeauducq, 2011
http://www.ohwr.org/documents/98

[4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond
Its Cell Delay", by Wu and Shi, 2008
http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf

--
Malek's Law:
Any simple idea will be worded in the most complicated way.

Hi Dan, Answering to the mailinglist, because I think it's of general interest. On Fri, 16 Dec 2016 08:41:37 -0500 Dan Kemppainen <dan@irtelemetrics.com> wrote: > My question is in regards to the FPGA and how the logic was configured. > I have not seen examples of asynchronous logic, as a true ring > oscillator would require (maybe there out there, and I've missed them!). > Are you playing some tricks in the FPGA, or running a true ring oscillator? The general way is to use some logic resource and configure it as a single combinatorial line that delays the signal. The idea to use FPGAs instead of ASICs have been around for a long time (see e.g. [1]). The early attempts used the LUTs and connected them into a string. I don't know who came up with it, but at some point (early 2000s) people switched over to use the adder chain that most (all?) modern FPGAs have and got a boost in resolution, as the adder structure uses short-cuts through the logic fabric. Specifying such a structure is usually easy. The hard part is to convince the synthesizer to not "optimize" it away. You can see in [2] two ways how to achieve that: * at line 74 there are the "preserve" attributes, telling the synthesiser that those signals are to be kept, no matter what. * at line 95 the carry adder chain is instanciated explicitly as a low level primitive. The latter is there so I really get the adder chain I was looking for (also telling the synthesizer that those should stick around). The former prevents any derived signals from being optimized away because they look "constant". To get good performance (ie uniform and small bin size) one needs to hand place (or at least restrict the placement) of the adder chain. This is done using the synthesizer directly (ie there are no VHDL directives for that) Depending on the software you are using, you might also need to tell the timing analysis to ignore all signals involved in the delay line up to the capture register. > As I understand, the GP2x chips have a true ring oscillator, and it's > period PLL'd to the main crystal. I'm not completely sure how control of > the period of the ring oscillator is achieved, but I believe it's by > drive voltage, and/or varactor diode type components in the ring. There is no direct control. The ring defines the oscillation frequency by its length. Yes, all changes in temperature and voltage affect the delay line and thus the TDC. The CERN TDC Core user guide[3] explains how an additional ring oscillator is used to compensate for the change in delay of the delay line at runtime. > In any case, if you are able to achieve the same results with an Altera > device, I'd be interested in learning how that's done. Our code works with a Cyclone4 (EP4CE22 to be exact). I'm pretty sure it should work the same on a Cyclone5. For the other families you would probably need to modify the delay line and the ring oscillator. BTW: In case I was not clear enough, the 22ps I mentined are the "minimum" bin size of the delay line. Most of the bins have a size of 22ps and 44ps. There are several with 0ps size and with ~100ps size. The control logic of the TDC Core does an automatic calibration step at startup to measure the bin sizes and compensate them. The final resolution is bounded by the maximum bin size encountered, but most of the range will have a better resolution. We did not measure the performance of the TDC itself (lack of equipment and the setup was more than just iffy), so I cannot give you any hard numbers on what can be achieved with a Cyclone4. Sébastian told me that for the original TDC design (based on a Spartan3 IIRC) they didn't employ any method for increasing the resolution by compensating for different bin sizes, as they got pretty close to the minimal bin size anyways. But for the Cyclone4 I guess using something like Wu's Wave Union[4] approach might be a good idea. Unfortunately, we didn't have the time to try it. Attila Kinali [1] "Field-Programmable-Gate-Array-Based Time-to-Digital Converter with 200-ps Resolution", by Kalisz, Szplet, Pasierbinski, Poniecki, 1997 [2] http://git.kinali.ch/attila/nios2_clocksync/blob/master/fpga/cores/tdc/core/tdc_delayline.vhd [3] "Time to Digital Converter Core for Spartan-6 FPGAs", by Sébastien Bourdeauducq, 2011 http://www.ohwr.org/documents/98 [4] "The 10-ps Wavelet TDC: Improving FPGA TDC Resolution beyond Its Cell Delay", by Wu and Shi, 2008 http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC_abs08.pdf -- Malek's Law: Any simple idea will be worded in the most complicated way.

AK

Attila Kinali

Sat, Dec 17, 2016 12:33 PM

On Sat, 17 Dec 2016 12:44:55 +0100
Attila Kinali attila@kinali.ch wrote:

As I understand, the GP2x chips have a true ring oscillator, and it's
period PLL'd to the main crystal. I'm not completely sure how control of
the period of the ring oscillator is achieved, but I believe it's by
drive voltage, and/or varactor diode type components in the ring.

There is no direct control. The ring defines the oscillation frequency
by its length. Yes, all changes in temperature and voltage affect the
delay line and thus the TDC.

Addendum:

In ASICs, the inverter in the delay line is directly modified to
control the switching speed. Most methods work by changing or limiting
current through one part of the inverter to slow it down.
A good overview over many ways how to achieve that with differential
inverters can be found in [1]

			Attila Kinali

[1] "CMOS Differential Ring Oscillators", by Jalik, Ibne Reaz, Mohd Ali, 2013
https://doi.org/10.1109/MMM.2013.2259401

--
Malek's Law:
Any simple idea will be worded in the most complicated way.

On Sat, 17 Dec 2016 12:44:55 +0100 Attila Kinali <attila@kinali.ch> wrote: > > As I understand, the GP2x chips have a true ring oscillator, and it's > > period PLL'd to the main crystal. I'm not completely sure how control of > > the period of the ring oscillator is achieved, but I believe it's by > > drive voltage, and/or varactor diode type components in the ring. > > There is no direct control. The ring defines the oscillation frequency > by its length. Yes, all changes in temperature and voltage affect the > delay line and thus the TDC. Addendum: In ASICs, the inverter in the delay line is directly modified to control the switching speed. Most methods work by changing or limiting current through one part of the inverter to slow it down. A good overview over many ways how to achieve that with differential inverters can be found in [1] Attila Kinali [1] "CMOS Differential Ring Oscillators", by Jalik, Ibne Reaz, Mohd Ali, 2013 https://doi.org/10.1109/MMM.2013.2259401 -- Malek's Law: Any simple idea will be worded in the most complicated way.

BC

Bob Camp

Sat, Dec 17, 2016 3:48 PM

Hi

On Dec 17, 2016, at 6:44 AM, Attila Kinali attila@kinali.ch wrote:

Hi Dan,

Answering to the mailinglist, because I think it's of general interest.

On Fri, 16 Dec 2016 08:41:37 -0500
Dan Kemppainen dan@irtelemetrics.com wrote:

My question is in regards to the FPGA and how the logic was configured.
I have not seen examples of asynchronous logic, as a true ring
oscillator would require (maybe there out there, and I've missed them!).
Are you playing some tricks in the FPGA, or running a true ring oscillator?

The general way is to use some logic resource and configure it as
a single combinatorial line that delays the signal. The idea to use
FPGAs instead of ASICs have been around for a long time (see e.g. [1]).
The early attempts used the LUTs and connected them into a string.
I don't know who came up with it, but at some point (early 2000s)
people switched over to use the adder chain that most (all?) modern
FPGAs have and got a boost in resolution, as the adder structure uses
short-cuts through the logic fabric.

Specifying such a structure is usually easy. The hard part is to
convince the synthesizer to not "optimize" it away. You can see
in [2] two ways how to achieve that:

at line 74 there are the "preserve" attributes, telling the
synthesiser that those signals are to be kept, no matter what.
at line 95 the carry adder chain is instanciated explicitly
as a low level primitive.