VLSI ASIC & FPGA: SOURCE SYNCHRONOUS INTERFACES

Source synchronous designs

Traditional interfaces limit interconnect speed to less than 250 MHz and pc-board-interconnect length to approximately 5 in. Designers are increasingly turning to source-synchronous interconnects that demonstrate transfer rates of 1 billion transitions/sec at distances of 5m and greater.

Several examples of source-synchronous technology exist. Their implementations affect design complexity and overall performance. Within memory subsystems, major examples include double-data-rate (DDR) SRAM, DDR synchronous DRAM (SDRAM), synchronous-graphics RAM, and Direct Rambus DRAM.
For networking and I/O, examples include the scalable coherent interface (SCI), Silicon Graphics' (http://www.sgi.com/) CrayLink, and High Performance Parallel Interface (HIPPI)-6400-PH


TABLE 1—COMPARISON OF STANDARD AND SOURCE-SYNCHRONOUS INTERFACES


Synchronous interface                          Source-synchronous interface
=======================                        ================================

Limits the time of flight between two ICs      Has no time-of-flight limit between two ICs
to a clock period


Requires clock-skew control                    Requires no clock-skew control between ICs

Presents no interface-synchronization          Presents an interface synchronization challenge

challengefor multiple-RAM interface            for interface with two or more RAMs

Increases pin count for interface to           Increases frequency to increase total

increase total bandwidth for interface         bandwidth for interface

However,

However, source-synchronous interfaces create new design-analysis challenges. Interface latency is not necessarily predictable; if your design requires predictable latency, overall interface latency increases. Increases in I/O speeds require more robust IC-package electrical performance. Because the I/O frequency can be much higher than that of the core logic, I/O-interface-logic complexity must grow to handle the frequency multiplication. Data bit-to-bit timing skews and "eye patterns" define overall link-operation frequencies, whereas you may have previously ignored these effects.

Implementing interfaces:

DDR interfaces transmit data on both edges of the clock, or "strobe." These types of interfaces offer a straightforward way to increase the bandwidth to various memory subsystems, such as levels 2 and 3 cache, main memory, and frame-buffer memory, and build on the foundation of the previous-generation single-data-rate interface. The trade-off, however, is often a more complex interface-agent RAM port, and latency prediction becomes more difficult because of the asynchronous nature of the data reception.

The current standard DDR SDRAM includes both an address/control interface and a data interface (Figure 2). Data transfers for reads and writes on both edges of a DQS (data-I/O) bidirectional strobe. Address and control signals transmit at half the data frequency and latch on only the rising edge of the transmit clock. Several design issues complicate the analysis of this interface. Any timing skews or uncertainties, such as pulse-width distortion and jitter on CLK and DQS, cause data- and address-timing problems at the SDRAM input and at the memory-agent IC's synchronization flip-flop. DQS' bidirectional and random nature further worsens its jitter component. In contrast, the CLK signal is unidirectional and of constant frequency.
For this interface, data and DQS synchronously and inphase exit the SDRAM (Figure 3). You must delay DQS to create data setup-and-hold time at the synchronization flip-flop. Possible delay techniques include using a digital-delay-locked loop (DLL) or PLL within the interface agent or using a pc-board etch-delay line. All of these techniques work, but none is flexible; once you implement these techniques, they lock the interface into an operating-frequency range. In addition, the DLL or PLL may be board-space-prohibitive for designs requiring multiple SDRAMs. Each SDRAM would require two DLLs or PLLs on the interface-agent IC (Figure 4).
Target data rates for DDR SDRAMs are 250 Mbps and greater, translating to clock frequencies in excess of 125 MHz. At these speeds, poorly terminated or unterminated lines exhibit signal-integrity effects that increase settling time. Lines that approach the tuned resonant or quarter- and half-wavelengths of the clock frequencies are key factors in settling-time jitter for poorly terminated lines. For a 125-MHz DDR SDRAM, the tuned-resonant lengths in FR4 stripline etch for the 250-Mbps data line are 5.71 and 11.43 in., not accounting for package delays. At these lengths, driver and receiver reflections superimpose on rising and falling edges of the next data bits, changing the measured rising- and falling-edge settling times (Figure 5).

Another example of settling-time jitter is a signal that does not stabilize to VOH (output high voltage) or VOL (output low voltage) before the next transition occurs. Such effects are eye patterns, or "intersymbol interference" (Figure 6a). As line length and topology become more complex, network termination becomes crucial in limiting jitter and its effects. What's the "eye"? As an example, a 200-MHz data bus has a maximum data-toggle rate of 1 bit every 5 nsec. Take a look at the voltage in the time domain at the receiver input, and you can see rising and falling edges with highs and lows.
Now, take 10-nsec slices of the time domain, and take those 5-nsec partitions, and pile them up like a deck of cards. The edges cross, and the ends are the dc high and low voltages. The area where no signal trace exists between the rising and falling edges and the highest low and lowest high is the eye. If you place the clock edge so that it rises in the middle of the eye, you can latch settled data, assuming that the rising/falling edge before the clock meets setup time and that the following edge meets hold time. Terminated lines increase the eye size, thereby increasing setup-and-hold time, allowing your interface to run more reliably and enabling you to increase its speed (Figure 6b).

DDR-SDRAM-design analysis

Interface-design analysis consists of signal quality, interface timing, and interface synchronization. Signal-line topology, pc-board routing and construction, and IC-package electrical parasitics all influence signal quality. Using a pseudorandom pattern sequence, you can characterize overshoot, eye-pattern jitter, and eye-pattern closure for a given signal topology (Figure 7).
You can determine appropriate line termination by examining operational-frequency targets. The DDR-SDRAM interface does not lend itself to parallel data-bus termination because it is bidirectional. Series termination, ideally within the driver to eliminate separate passive components on the pc board, is a more appropriate scheme (Figure 8). However, the tolerance of the series output resistance limits the effectiveness of series termination within the driver. Typical process limitations are ±22%, a wider tolerance than the process variation on discrete resistors. As operational speed increases in the future to more than 500 Mbps per I/O buffer, series-resistor tolerance will become a strong definer of eye-pattern jitter and closure.
Three main paths require analysis for the interface, and each of these paths further breaks down into three sections. Each timing path contains transmitter-, interconnect-, and receiver-timing components. Transmitter timing consists of all possible components of timing jitter and skews within the transmitting IC that would subtract from either setup or hold at the synchronizing latch within the receiving IC. Interconnect timing comprises all jitter and skew components of the signal trace, and receiver timing comprehends these same components within the receiving IC itself.
The goal of timing analysis is to achieve non-negative setup-and-hold margins using a summation of all worst-case effects. If robust system-level error detection and correction allow for an occasional bit error, you can employ statistical timing analysis. For DDR-SDRAM timing, pay attention to the data-write, data-read, and address signal paths. Robust data timing is typically the hardest to achieve because of the dual-edge latching and high-speed nature of these signals. Good driver design and proper signal topology often solve challenging multiload-address-bus timing problems.
The transmitter-timing parameters for the memory-controller ASIC in the following design example come from a design that TriCN Associates did with Nvidia (http://www.nvidia.com/), modified to guardband results. DDR-SDRAM data comes from multiple DRAM vendors' specifications and Spice models; Table 2, Table 3, and Table 4 report the worst-case results. Interconnect-timing parameters are the results of worst-case analysis of all timing paths that use multiple SDRAM vendors and one memory-controller ASIC as a baseline.
The results consolidate into a worst-case analysis of timing for setup-and-hold data with both the ASIC and the SDRAM driving the interface. Using faster SDRAMs results in interfaces with improved timing margin, but this analysis demonstrates that any SDRAM vendor can provide a DDR interface that meets the operational-frequency target. All setup-and-hold-timing data in Table 2, Table 3, and Table 4 comes from extracted pc-board layouts that were then simulated in Spice using 3 sigma error margin.

Data-write timing

Write timing includes interface-agent output-drive timing, interconnect timing, and DDR-SDRAM input-receive timing (Table 2). The interface agent must minimize the overall skew and jitter between the data bits (DQ) and the strobe. Skew components come from CLK-to-data and tPD delay (propagation-delay) differences in the flip-flops, boundary-scan components, and output driver. Jitter can come from the PLL or oscillator, as well as from ac fluctuations on power supplies due to core and output switching events.
The interconnect-timing components originate with trace-length and dielectric-constant differences between data lines in the pc board and package. If you use a delay line to push out the strobe, strobe-centering errors occur due to dielectric-constant variations over all manufacturing-tolerance ranges. The final component of timing error for the interconnect is eye-pattern jitter on both the data and the strobe. This error arises from signal-integrity variations for random pattern sequences on either terminated or unterminated lines.
Receiver timing is DDR-SDRAM-vendor-specific. In this design example, the SDRAM places 800-psec-setup- and 400-psec-hold-time requirements on the data with respect to the strobe.

Data-read timing

Read timing breaks down into interface-agent receive timing, interconnect timing, and DDR-SDRAM output-drive timing (Table 3). The DDR-SDRAM data-output drive skews with respect to the data strobe, and you should replace the typical output skews in this example with more exact numbers from your DRAM vendor. Interconnect-timing components are identical in cause and resolution to the data-write timings.
The interface agent must minimize the overall skew and jitter between the DQ and the strobe in the receiving block. Skew components come from tPD differences in the boundary-scan components, input receiver, and strobe-routing skew. The setup-and-hold times of the latching flip-flop directly contribute to the timing budget, and you should also minimize them.

Address timing

Address timing, like data-write timing, includes interface-agent output-drive timing, interconnect timing, and DDR-SDRAM input-receive timing (Table 4). Receiver timing comes from the DDR-SDRAM vendor. This example places 2000-psec-setup- and 1000-psec-hold-time requirements on the data with respect to CLK.
All paths analyzed under three sigma conditions for silicon process, pc-board process, voltage, and temperature in this case study show that you can implement a DDR-SDRAM interface with no less than 7% of performance margin for all timing paths. As DDR-SDRAM vendors improve input and output timing specifications, this analysis shows that performance for these interfaces will rapidly approach 500-Mbps bandwidth.

VLSI ASIC & FPGA

Monday, January 28, 2008

SOURCE SYNCHRONOUS INTERFACES

1 comment:

Blog Archive

About Me