Tuesday, January 29, 2008

Flip Flops & Register timings

Flip Flop timings:

The two important timings of a flip flop are

- Setup time

- Hold time

- Negative Hold time

divide by 3.5 clock divider

Divide by 3.5 clock divider

- Divide by 3 first and add the negedge flop in series to make divide by 3.5

A simple divide by 3 counter can be as as follows

=== Divide by 3 ===
always @ (posedge clk or negedge reset)
if (!reset) Q[1:0] <= 2'b0;
else Q[1:0] <= {Q[0], (!Q[1] & !Q[0])};

The above code is a sequence of 00, 01, 10, 00.

always @(negedge clk)
Q0_inv <= Q[0];

assign divide_by_3 = Q[0] | Q0_inv ;

====End of Divide by 3===

always @ (negedge clk)
divide_by_dot5 <= divide_by_3 ;

assign divide_by_3dot5 = divide_by_3 | divide_by_dot5;

===== End of Divide by 3.5====

Monday, January 28, 2008

SOURCE SYNCHRONOUS INTERFACES

Source synchronous designs

Traditional interfaces limit interconnect speed to less than 250 MHz and pc-board-interconnect length to approximately 5 in. Designers are increasingly turning to source-synchronous interconnects that demonstrate transfer rates of 1 billion transitions/sec at distances of 5m and greater.

Several examples of source-synchronous technology exist. Their implementations affect design complexity and overall performance. Within memory subsystems, major examples include double-data-rate (DDR) SRAM, DDR synchronous DRAM (SDRAM), synchronous-graphics RAM, and Direct Rambus DRAM.
For networking and I/O, examples include the scalable coherent interface (SCI), Silicon Graphics' (http://www.sgi.com/) CrayLink, and High Performance Parallel Interface (HIPPI)-6400-PH




TABLE 1—COMPARISON OF STANDARD AND SOURCE-SYNCHRONOUS INTERFACES


Synchronous interface Source-synchronous interface
======================= ================================

Limits the time of flight between two ICs Has no time-of-flight limit between two ICs
to a clock period


Requires clock-skew control Requires no clock-skew control between ICs

Presents no interface-synchronization Presents an interface synchronization challenge

challengefor multiple-RAM interface for interface with two or more RAMs

Increases pin count for interface to Increases frequency to increase total

increase total bandwidth for interface bandwidth for interface



However,

However, source-synchronous interfaces create new design-analysis challenges. Interface latency is not necessarily predictable; if your design requires predictable latency, overall interface latency increases. Increases in I/O speeds require more robust IC-package electrical performance. Because the I/O frequency can be much higher than that of the core logic, I/O-interface-logic complexity must grow to handle the frequency multiplication. Data bit-to-bit timing skews and "eye patterns" define overall link-operation frequencies, whereas you may have previously ignored these effects.

Implementing interfaces:

DDR interfaces transmit data on both edges of the clock, or "strobe." These types of interfaces offer a straightforward way to increase the bandwidth to various memory subsystems, such as levels 2 and 3 cache, main memory, and frame-buffer memory, and build on the foundation of the previous-generation single-data-rate interface. The trade-off, however, is often a more complex interface-agent RAM port, and latency prediction becomes more difficult because of the asynchronous nature of the data reception.

The current standard DDR SDRAM includes both an address/control interface and a data interface (Figure 2). Data transfers for reads and writes on both edges of a DQS (data-I/O) bidirectional strobe. Address and control signals transmit at half the data frequency and latch on only the rising edge of the transmit clock. Several design issues complicate the analysis of this interface. Any timing skews or uncertainties, such as pulse-width distortion and jitter on CLK and DQS, cause data- and address-timing problems at the SDRAM input and at the memory-agent IC's synchronization flip-flop. DQS' bidirectional and random nature further worsens its jitter component. In contrast, the CLK signal is unidirectional and of constant frequency.
For this interface, data and DQS synchronously and inphase exit the SDRAM (Figure 3). You must delay DQS to create data setup-and-hold time at the synchronization flip-flop. Possible delay techniques include using a digital-delay-locked loop (DLL) or PLL within the interface agent or using a pc-board etch-delay line. All of these techniques work, but none is flexible; once you implement these techniques, they lock the interface into an operating-frequency range. In addition, the DLL or PLL may be board-space-prohibitive for designs requiring multiple SDRAMs. Each SDRAM would require two DLLs or PLLs on the interface-agent IC (Figure 4).
Target data rates for DDR SDRAMs are 250 Mbps and greater, translating to clock frequencies in excess of 125 MHz. At these speeds, poorly terminated or unterminated lines exhibit signal-integrity effects that increase settling time. Lines that approach the tuned resonant or quarter- and half-wavelengths of the clock frequencies are key factors in settling-time jitter for poorly terminated lines. For a 125-MHz DDR SDRAM, the tuned-resonant lengths in FR4 stripline etch for the 250-Mbps data line are 5.71 and 11.43 in., not accounting for package delays. At these lengths, driver and receiver reflections superimpose on rising and falling edges of the next data bits, changing the measured rising- and falling-edge settling times (Figure 5).

Another example of settling-time jitter is a signal that does not stabilize to VOH (output high voltage) or VOL (output low voltage) before the next transition occurs. Such effects are eye patterns, or "intersymbol interference" (Figure 6a). As line length and topology become more complex, network termination becomes crucial in limiting jitter and its effects. What's the "eye"? As an example, a 200-MHz data bus has a maximum data-toggle rate of 1 bit every 5 nsec. Take a look at the voltage in the time domain at the receiver input, and you can see rising and falling edges with highs and lows.
Now, take 10-nsec slices of the time domain, and take those 5-nsec partitions, and pile them up like a deck of cards. The edges cross, and the ends are the dc high and low voltages. The area where no signal trace exists between the rising and falling edges and the highest low and lowest high is the eye. If you place the clock edge so that it rises in the middle of the eye, you can latch settled data, assuming that the rising/falling edge before the clock meets setup time and that the following edge meets hold time. Terminated lines increase the eye size, thereby increasing setup-and-hold time, allowing your interface to run more reliably and enabling you to increase its speed (Figure 6b).

DDR-SDRAM-design analysis

Interface-design analysis consists of signal quality, interface timing, and interface synchronization. Signal-line topology, pc-board routing and construction, and IC-package electrical parasitics all influence signal quality. Using a pseudorandom pattern sequence, you can characterize overshoot, eye-pattern jitter, and eye-pattern closure for a given signal topology (Figure 7).
You can determine appropriate line termination by examining operational-frequency targets. The DDR-SDRAM interface does not lend itself to parallel data-bus termination because it is bidirectional. Series termination, ideally within the driver to eliminate separate passive components on the pc board, is a more appropriate scheme (Figure 8). However, the tolerance of the series output resistance limits the effectiveness of series termination within the driver. Typical process limitations are ±22%, a wider tolerance than the process variation on discrete resistors. As operational speed increases in the future to more than 500 Mbps per I/O buffer, series-resistor tolerance will become a strong definer of eye-pattern jitter and closure.
Three main paths require analysis for the interface, and each of these paths further breaks down into three sections. Each timing path contains transmitter-, interconnect-, and receiver-timing components. Transmitter timing consists of all possible components of timing jitter and skews within the transmitting IC that would subtract from either setup or hold at the synchronizing latch within the receiving IC. Interconnect timing comprises all jitter and skew components of the signal trace, and receiver timing comprehends these same components within the receiving IC itself.
The goal of timing analysis is to achieve non-negative setup-and-hold margins using a summation of all worst-case effects. If robust system-level error detection and correction allow for an occasional bit error, you can employ statistical timing analysis. For DDR-SDRAM timing, pay attention to the data-write, data-read, and address signal paths. Robust data timing is typically the hardest to achieve because of the dual-edge latching and high-speed nature of these signals. Good driver design and proper signal topology often solve challenging multiload-address-bus timing problems.
The transmitter-timing parameters for the memory-controller ASIC in the following design example come from a design that TriCN Associates did with Nvidia (http://www.nvidia.com/), modified to guardband results. DDR-SDRAM data comes from multiple DRAM vendors' specifications and Spice models; Table 2, Table 3, and Table 4 report the worst-case results. Interconnect-timing parameters are the results of worst-case analysis of all timing paths that use multiple SDRAM vendors and one memory-controller ASIC as a baseline.
The results consolidate into a worst-case analysis of timing for setup-and-hold data with both the ASIC and the SDRAM driving the interface. Using faster SDRAMs results in interfaces with improved timing margin, but this analysis demonstrates that any SDRAM vendor can provide a DDR interface that meets the operational-frequency target. All setup-and-hold-timing data in Table 2, Table 3, and Table 4 comes from extracted pc-board layouts that were then simulated in Spice using 3 sigma error margin.

Data-write timing

Write timing includes interface-agent output-drive timing, interconnect timing, and DDR-SDRAM input-receive timing (Table 2). The interface agent must minimize the overall skew and jitter between the data bits (DQ) and the strobe. Skew components come from CLK-to-data and tPD delay (propagation-delay) differences in the flip-flops, boundary-scan components, and output driver. Jitter can come from the PLL or oscillator, as well as from ac fluctuations on power supplies due to core and output switching events.
The interconnect-timing components originate with trace-length and dielectric-constant differences between data lines in the pc board and package. If you use a delay line to push out the strobe, strobe-centering errors occur due to dielectric-constant variations over all manufacturing-tolerance ranges. The final component of timing error for the interconnect is eye-pattern jitter on both the data and the strobe. This error arises from signal-integrity variations for random pattern sequences on either terminated or unterminated lines.
Receiver timing is DDR-SDRAM-vendor-specific. In this design example, the SDRAM places 800-psec-setup- and 400-psec-hold-time requirements on the data with respect to the strobe.

Data-read timing

Read timing breaks down into interface-agent receive timing, interconnect timing, and DDR-SDRAM output-drive timing (Table 3). The DDR-SDRAM data-output drive skews with respect to the data strobe, and you should replace the typical output skews in this example with more exact numbers from your DRAM vendor. Interconnect-timing components are identical in cause and resolution to the data-write timings.
The interface agent must minimize the overall skew and jitter between the DQ and the strobe in the receiving block. Skew components come from tPD differences in the boundary-scan components, input receiver, and strobe-routing skew. The setup-and-hold times of the latching flip-flop directly contribute to the timing budget, and you should also minimize them.

Address timing

Address timing, like data-write timing, includes interface-agent output-drive timing, interconnect timing, and DDR-SDRAM input-receive timing (Table 4). Receiver timing comes from the DDR-SDRAM vendor. This example places 2000-psec-setup- and 1000-psec-hold-time requirements on the data with respect to CLK.
All paths analyzed under three sigma conditions for silicon process, pc-board process, voltage, and temperature in this case study show that you can implement a DDR-SDRAM interface with no less than 7% of performance margin for all timing paths. As DDR-SDRAM vendors improve input and output timing specifications, this analysis shows that performance for these interfaces will rapidly approach 500-Mbps bandwidth.

latch fundamentals

Difference between latch & flip flop

 Latches are level sensitive i.e. the output captures the input when the clock signal is high, so as long as the clock is logic 1, the output can change if the input also changes.
 Flip-Flops are edge sensitive i.e. flip flop will store the input only when there is a rising or falling edge of the clock.
 A positive level latch is transparent to the positive level(enable), and it latches the final input before it is changing its level(i.e. before enable goes to '0' or before the clock goes to -ve level.)
 A positive edge flop will have its output effective when the clock input changes from '0' to '1' state ('1' to '0' for negative edge flop) only.

Advantages of latch design
 Latches are faster, flip flops are slower.
 Latches take less gates (less power) to implement than flip-flops.
 Latch facilitate time borrowing or cycle stealing whereas flip flops allow
synchronous logic.

latch timings ( Recovery and Removal )
Recovery Time
 Recovery specifies the minimum time that an asynchronous control input pin must be held stable after being de-asserted and before the next clock (active-edge) transition.
 Recovery time specifies the time the inactive edge of the asynchronous signal has to arrive before the closing edge of the clock.
 Recovery time is the minimum length of time an asynchronous control signal (eg.preset) must be stable before the next active clock edge. The recovery slack time calculation is similar to the clock setup slack time calculation, but it applies asynchronous control signals.

Removal Time
 Removal specifies the minimum time that an asynchronous control input pin must be held stable before being de-asserted and after the previous clock (active-edge) transition.
 Removal time specifies the length of time the active phase of the asynchronous signal has to be held after the closing edge of clock.
 Removal time is the minimum length of time an asynchronous control signal must be stable after the active clock edge. Calculation is similar to the clock hold slack calculation, but it applies asynchronous control signals

Time borrowing in latches
Time borrowing is a concept that is used in latch based pipelines in which you typically have 2 stages of combinatorial surrounded by latches. If the first combinational piece of logic has a much longer delay than the second one, you can borrow some of the time of the second part to the first part

Example :
R2 = R0 * R1

Asume the instruction requiring two cycles to multiply and store the final result into R2. If store to R2 can be done in half cycle, then the multiplication can be allowed to extend to 1 1/2 cycles.

Hence, we can have

Pos latch - Mul combo1 - Neg Latch - Mul combo2 - Pos Latch - No logic - Neg Latch - Store combo - Pos Latch
with Mul combo1+Mul combo 2 < 1.5 clk cycles.
The multiply need be finished only before the second pos latch closes, so that the correct data passes onto the neg latch.

Instead of
Pose reg - Mul combo1&2 - Pos Reg - Store Combo - Pos Reg

However, not many people use time borrowing in multipliers, reasons evident for the following section of code, where R2 is required to be ready in 1 cycle.
R2 = R0 * R1
R4 = R2 * R3

Design and scan issue with latcehs
 Latches are not friendly with DFT tools. Minimize inferring of latches if your design has to be made testable. Since enable signal to latch is not a regular clock that is fed to the rest of the logic. To ensure testability, you need to use OR gate using "enable" and "scan_enable" signals as input and feed the output to the enable port of the latch
 Static timing analyzers typically make assumptions about latch transparency. If one assumes the latch is transparent (i.e.triggered by the active time of clock,not triggered by just clock edge), then the tool may find a false timing path through the input data pin. If one assumes the latch is not transparent, then the tool may miss a critical path.
 If target technology supports a latch cell then race condition problems are minimized. If target technology does not support a latch then synthesis tool will infer it by basic gates which is prone to race condition. Then you need to add redundant logic to overcome this problem. But while optimization redundant logic can be removed by the synthesis tool ! This will create endless problems for the design team
 Latches should not be used unless absolutely necessary. In most cases, a flip-flop will work just as well. When synthesizing designs, be especially careful to avoid accidentally inferring a latch when one is not intended. The problem with latches centers around the transparency issue. In the circuit shown in Figure, if Gate A and Gate B both go low, we might have an oscillator.


Wednesday, January 23, 2008

VLSI Knowledge

VLSI Knowledge

Digital Basics
Latch fundamentals
Flip Flops
Different types of flip flops, RS, JK, D, T etc.
Conversion of one flip flop to other
Registers with Flip Flops (timing requirements)
setup time requirement
Hold time requirement
Negative hold time : The value on the input can change *before* the clock edge and still be carried to the output. It means that the window in which the input signal has to be held stable ends before the arrival of the clock edge. This happens if the propagation delay on the data path is guaranteed to be longer than the propagation of the clock to the same flip-flop. It is far easier to guarantee relative delays on a monolithic chip than it is to guarantee minimum delays at the board level where you have different chips. For that reason, it is desirable for the chip manufacturers to make the hold times zero or negative by careful use of delays

Counters

Clock dividers

Divide by 3.5 clock divider

Statemachines
Sequence detectors
Hazards
Adders, subtractors, multipliers, dividers

Thursday, January 17, 2008

VLSI ASIC & FPGA

VLSI





  • ASIC
    ASIC Design Verification fundamentals.


  • FPGA
Overview of FPGAs



Xilinx FPGAs:




Virtex4 FPGA

Virtex5 FPGA

Advantage of Virtex5 over Virtex4 FPGAs

Individual Block RAM Size
Virtex-5 - 36 Kbits, configurable as one 36 Kbits Block RAM or two independent 18 Kbits Block RAM
Virtex-4 -18 Kbits Block RAM

Number of Block RAM

Virtex5 - 288
Virtex4 - 552


Performance of BRAM
Virtex5-550 MHz (-3 fast)/ 500 MHz (-2 medium)
Virtex4-500 MHz (-12 fast)






Page under construction. stay tuned for more updates on the ASIC and FPGA.