Functions 1-3 in mode I are performed in the digital delay line with
XOR multipliers implemented as a linear array. The structure of three
adjacent stages is presented in Fig.
. Each stage of the
delay line consists of two D-flip-flops, one XOR gate, one SFQ pulse
merger (confluence buffer), and six splitters, interconnected by JTLs.
Our design is based on self-timing approach with local clock
generation. It eliminates clock skew problems typical for designs with
global timing which limit the performance at higher frequencies.
Figure: Three stages of the digital delay line.
Let us take a closer look at stage 1 in Fig.
: dual-rail signal (
) propagates asynchronously through the stage from left
to right and forms a local clock pulse (clk) in the merger M1.
The undelayed signal is thus
transformed from dual-rail representation (
) into the conventional RSFQ representation (a, clk).
The generated clock pulse triggers the following events:
A number of local time constraints has to be satisfied for the correct
operation of every stage of the delay line. For example, the same
local clock signal triggers gates D1, D2 and XOR1 in stage 1, but the
output of the D1 should arrive at the XOR1 input before the
clock and at the D2 input after the clock. Although the
constraints are quite simple, the problem of satisfying all of them
while optimizing the parameters of the circuit for higher frequencies
(16 GHz and up) becomes very challenging. A straightforward approach,
based on independently optimized elementary cells with ports loaded on
standard JTLs [43, 51], has proven to be insufficient
- it does not allow to reach the highest possible operating
frequency. To solve this problem, we have developed a more general
approach allowing simultaneous automated optimization of several
adjacent stages. The cornerstones of this new technique are the
Hierarchical Single Flux Quantum Hardware Description Language
(hSFQHDL) (see Appendix
, Appendix
and
Ref. [11] ) and the automated design centering Circuit
Optimization Work Bench (COWBoy) [11]. hSFQHDL, which is
build on top of the flat SFQHDL [10], allows automated
verification of hierarchical RSFQ circuits of arbitrary complexity and
introduces a new level of abstraction into RSFQ design - the
behavioral (or functional) description of the circuit, when internal
states and timing interrelationships of the building blocks on any
level of hierarchy can be included into the script as easily as those
of the individual Josephson junctions. Our experience shows that this
latter feature of hSFQHDL is of vital importance for the correct
description of large circuits, because the script of a hierarchical
design is far more than a mere sum of the individual scripts of cells
constituting it. The COWBoy program, based on the heuristic gradient
descent algorithm, solves the complex task of finding the optimum
point in a multi-dimensional (up to several hundred dimensions)
parameter space, making it a simple background job (which can,
however, take up to several days of CPU time of an HP-700 series
workstation for a circuit comprising a 100 Josephson junctions). The
combination of the hierarchical approach with automated parameter
margin optimization greatly accelerates the design of large and
complex RSFQ circuits and vastly improves its quality, giving the
results which are very hard, if at all possible, to reproduce by
traditional methods.
Using this approach, we have been able to achieve simulated margins
of
on power supply in the final design of the single stage
of the delay line. The stage comprises 98 Josephson
junctions and its layout for the standard
-
-
HYPRES' technology [7]
occupies area
, with estimated power dissipation
of
.