next up previous contents
Next: Accumulators Up: Dual-rail Asynchronous Autocorrelator Design Previous: Block diagram

Digital Delay line and Multiplier.

Functions 1-3 in mode I are performed in the digital delay line with XOR multipliers implemented as a linear array. The structure of three adjacent stages is presented in Fig. gif. Each stage of the delay line consists of two D-flip-flops, one XOR gate, one SFQ pulse merger (confluence buffer), and six splitters, interconnected by JTLs. Our design is based on self-timing approach with local clock generation. It eliminates clock skew problems typical for designs with global timing which limit the performance at higher frequencies.

  figure958
Figure:   Three stages of the digital delay line.

Let us take a closer look at stage 1 in Fig. gif : dual-rail signal ( tex2html_wrap_inline2441 ) propagates asynchronously through the stage from left to right and forms a local clock pulse (clk) in the merger M1. The undelayed signal is thus transformed from dual-rail representation ( tex2html_wrap_inline2441 ) into the conventional RSFQ representation (a, clk). The generated clock pulse triggers the following events:

A number of local time constraints has to be satisfied for the correct operation of every stage of the delay line. For example, the same local clock signal triggers gates D1, D2 and XOR1 in stage 1, but the output of the D1 should arrive at the XOR1 input before the clock and at the D2 input after the clock. Although the constraints are quite simple, the problem of satisfying all of them while optimizing the parameters of the circuit for higher frequencies (16 GHz and up) becomes very challenging. A straightforward approach, based on independently optimized elementary cells with ports loaded on standard JTLs [43, 51], has proven to be insufficient - it does not allow to reach the highest possible operating frequency. To solve this problem, we have developed a more general approach allowing simultaneous automated optimization of several adjacent stages. The cornerstones of this new technique are the Hierarchical Single Flux Quantum Hardware Description Language (hSFQHDL) (see Appendix gif, Appendix gif and Ref. [11] ) and the automated design centering Circuit Optimization Work Bench (COWBoy) [11]. hSFQHDL, which is build on top of the flat SFQHDL [10], allows automated verification of hierarchical RSFQ circuits of arbitrary complexity and introduces a new level of abstraction into RSFQ design - the behavioral (or functional) description of the circuit, when internal states and timing interrelationships of the building blocks on any level of hierarchy can be included into the script as easily as those of the individual Josephson junctions. Our experience shows that this latter feature of hSFQHDL is of vital importance for the correct description of large circuits, because the script of a hierarchical design is far more than a mere sum of the individual scripts of cells constituting it. The COWBoy program, based on the heuristic gradient descent algorithm, solves the complex task of finding the optimum point in a multi-dimensional (up to several hundred dimensions) parameter space, making it a simple background job (which can, however, take up to several days of CPU time of an HP-700 series workstation for a circuit comprising a 100 Josephson junctions). The combination of the hierarchical approach with automated parameter margin optimization greatly accelerates the design of large and complex RSFQ circuits and vastly improves its quality, giving the results which are very hard, if at all possible, to reproduce by traditional methods.

Using this approach, we have been able to achieve simulated margins of tex2html_wrap_inline2083 on power supply in the final design of the single stage of the delay line. The stage comprises 98 Josephson junctions and its layout for the standard tex2html_wrap_inline1503 - tex2html_wrap_inline1505 tex2html_wrap_inline1507 - tex2html_wrap_inline1509 HYPRES' technology [7] occupies area tex2html_wrap_inline2467 , with estimated power dissipation of tex2html_wrap_inline2469 .


next up previous contents
Next: Accumulators Up: Dual-rail Asynchronous Autocorrelator Design Previous: Block diagram

Alexander Rylyakov
Fri May 23 18:57:25 EDT 1997