## An Improved Latch for SerDes Interface: Design and Analysis under PVT and AC Noise

Mithilesh KUMAR, Abir J. MONDAL

Dept. of Electronics & Communication Engineering, NIT Arunachal Pradesh, Jote 791113, India

mithilesh02kumar93@gmail.com, abir jm@hotmail.com

Submitted September 3, 2022 / Accepted March 15, 2023 / Online first April 13, 2023

Abstract. Digital subsystem prefers CMOS process, but it is difficult to manage speed and average power  $(P_{avg})$  trade-off in each era with power supply voltage  $(V_{dd})$  scaling. Current mode logic (CML) has emerged as an alternative to design the fundamental block of a SerDes, namely, the latch. However, available CML circuits consume significant  $P_{avg}$  and suffer from rapid input slewing. Typically, fast switching inputs enable current flow to effective supply voltage  $V_P$  and overcharges output. In fact,  $V_P$  is different than externally applied  $V_{dd}$  and oscillates with time as and when an abrupt current is drawn. This affects delay  $t_d$  and introduces jitter. The topic presents a new latch for SerDes interface using a new current steering circuit and coupled to a power delivery network (PDN). The significant point is to attain an almost constant t<sub>d</sub> in comparison to conventional designs while the  $V_{dd}$  changes. The post-layout results at 0.09-µm CMOS and 1.1 V  $V_{dd}$  indicate that the  $P_{avg}$  and  $t_d$  are 339.5  $\mu$ W and 61.9 ps, respectively, at 27 °C. Surprisingly, the  $t_d$  variation is noted to be minimum and the power supply noise induced jitter is around 1.5 ns when  $V_P$  close to the circuit varies due to sudden current.

#### Keywords

PDN, latch, figure of merit, Monte Carlo, output noise, jitter

#### 1. Introduction

High performance logic circuits having low-power and high data rate have emerged due to the increasing demand for high-speed communication, namely, SerDes [1]. High noise immunity and clock rates are also essential for the basic building blocks. The most important building block of a SerDes is a latch, where CML is preferred to meet the power requirements [2]. CMOS cannot be used for tens of Gbps. Even though in sub nanometer CMOS is used with transition frequencies, namely, 300 GHz for n-channel and 200 GHz for p-channel, there are drawbacks to address. The most significant one is to accommodate reduction of  $V_P$  [3].  $V_{dd}$  has been reduced significantly to maintain the electric field in the channel region. Low-voltage constraint restricts the use of several stacked gates in a CMOS and makes conventional circuits unsuitable [4]. Further, short channel effects degrade small-signal parameters and limit the use of CMOS in sub nanometer era [5]. This affects the intrinsic gain of the gates and degrades noise margin. Even though novel CML circuits have been presented, the problem of stacking several level of gates under a low voltage constraint has not been discussed. Novel approaches accommodate additional gates to operate at low  $V_{dd}$  and high-speed [6]. These gates require bias circuitry, which results in a complex design and increases layout area. In a given process and  $V_{dd}$ , latches using CML logic are established in terms of  $P_{avg}$ ,  $t_{d}$ , figure of merit (FoM), power delay product (PDP) and output noise. It is imperative for a latch to provide delay time intervals with varying offset for a specific application even under identical biasing conditions. This happens due to process, voltage and temperature (PVT). In a SerDes other circuits are included along with latch while connected to a common  $V_P$  and all do not operate at the same time [7]. As some circuits are suddenly turned on, an abrupt current is drawn.  $V_{\rm P}$  droops and oscillates with time due to Ldi/dt noise [8]. It is imperative to understand the effect of power supply noise (PSN) on delay and how the jitter so introduced varies with PSN.

This title reports a latch for SerDes interface and operating at 1 GHz and rise  $(t_R)$ , fall  $(t_F)$  times of 100 ps. Coupled to a PDN, the performances are evaluated under PVT and when  $V_{\rm P}$  is drooping and having fluctuations. In a SerDes, latch is designed with other circuits connected to a common  $V_{\rm P}$ . All of them do not operate at the same time and a sudden turn on of any circuit will cause a current I(t)to be pumped in. This causes  $V_{\rm P}$  to fluctuate due to LdI/dtnoise. It is necessary to understand AC first droop effect on performance. The work is categorized as follows. Section 2 describes the working of conventional latches in the said technology, the problems and how the present design attempts to address it. The working of proposed design is elaborated in Sec. 3. Additionally, a model to derive delay analytically is shown to indicate dependency. Section 4 presents the result of post-layout simulation in 0.09-µm CMOS and  $1.1 \text{ V} V_{dd}$ . The working, performances of frequency divider (FD) built using the present latch are detailed in Sec. 5. Lastly, Section 6 concludes.

### 2. Literature

A low voltage CML latch was proposed to estimate speed and power efficiency [9]. An analytical model was also derived to optimize speed and to understand power-delay dependency. Reference [10] presented high-speed regenerative latches robust against environmental noise sources. High-speed CML latch using active inductor was proposed to operate at 180-nm CMOS and V<sub>dd</sub> of 1.8 V [11]. Multiplexer latches (Mux-latches) and flip-flops having high throughput were presented for serial links [12]. Measurement results at 90-nm CMOS indicate an almost bit-errorfree while operating at 6 Gbps. A low-voltage CML latch employing triple tail was presented to reduce need of  $V_{dd}$ [13]. Reference [14] proposed low-voltage high-speed CML latches at 40-nm CMOS and 1.1 V V<sub>dd</sub>. Error detecting latch (EDL) to design timing-error resilient system was presented in 28-nm CMOS to indicate a 26% and 33% reduction in layout area and leakage power, respectively [15]. A CML latch exploiting dynamic body bias threshold lowering was designed using 40-nm CMOS and 0.6 V V<sub>dd</sub> [16]. A SR NOR latch using silicon micro-ring resonator was presented to operate at 15 Gbps [17]. A reliable and low cost latch at 45-nm CMOS was described to indicate tolerance to radiation induced single event upsets caused by high energetic particles [18]. Charge steering latches at 28-nm FD-SOI CMOS was presented to note a 40% power saving at 28 Gbps for wireline transceivers [19]. Two low cost double-and-triple-nodeupset tolerant latches were presented for nano-scale CMOS [20]. Reference [21] presented an improved CML latch at 180-nm CMOS process and 1.2 V V<sub>dd</sub> for high-speed application [21]. A CML latch employing forward body bias threshold lowering was described at 28-nm FDSOI and 14-nm CMOS to operate at 0.6 V [22]. Reference [23] proposed a ternary D latch using Graphene Nano Ribbon Field Effect Transistors (GNRFET) and shows better results in terms of  $t_d$  and  $P_{avg}$  at 16-nm.

Designs using CML or readjusted CML are presented to improve performances in the said process and  $V_{dd}$ . This comes at the cost of circuit complexity. Accommodating gates to control leakage current require precise biasing, which affects output swing,  $P_{avg}$ ,  $t_d$  and FoM. Certainly, modifications were introduced to minimize either  $P_{avg}$  or  $t_d$ but not both. There is no detail available on the trade-off or effect of rapidly switching input. Even after architectural adjustment, during read cycle data are available at the gates performing write operation. Eventually, inputs switching at GHz frequency and simultaneous availability at the write gates during read cycle causes either current flow towards  $V_{\rm P}$  or outputs to be overcharged. In a SerDes, there are other circuits along with latch sharing a common  $V_{\rm P}$ , the voltage close to the circuit. Further, all may not operate at the same time and a sudden turn on will cause  $V_{\rm P}$  to fluctuate due to LdI/dt noise. There is also no mention of the effect of AC first droop or power supply noise (PSN) on the delay, jitter of the conventional designs.

The following discussion highlights the outcome of conventional designs (Fig. 1) coupled to a typical PDN on



Fig. 1. Schematic of the conventional latches [13], [14], [21].

application of inputs (In1 and In2) switching at 1 GHz and  $t_{\rm R}$ ,  $t_{\rm F}$  of 100 ps. Gate sizes, bias current and resistances value are chosen in a manner to obtain roughly full swing voltage, comparable average power and optimum delay. At 0.09-µm CMOS and 1.1 V  $V_{\rm dd}$ , simulation in post-layout shows how the outputs (Out1 and Out2) deteriorate, Fig. 2. This is mainly attributed to fast switching inputs and its availability at the write gates,  $M_{\rm N1}$  and  $M_{\rm N3}$  of Fig. 1, during read cycle. Both effects allow a current to flow towards  $V_{\rm P}$ , thereby, in-



Fig. 2. Post-layout output transients of Fig. 1 [13], [14], [21].

creases  $P_{avg}$  and makes  $t_d$  estimation quite difficult. In addition to so, NMOS M<sub>N4</sub> in [13] is always on depending on clock Clk1 and Clk2. Apart from controlling the write or read cycle, pair of gates in the folded logic stage (M<sub>P1</sub> and M<sub>N1</sub>) [14] are also on based on clock. This creates a direct path between  $V_P$ , ground and increases  $P_{avg}$ . The  $P_{avg}$ ,  $t_d$  and FoM are tabulated in comparison section highlighting performances deviation with present design. This work avoids the issue using a latch designed with a new current steering circuit [24]. It also accommodates structural adjustment to abstain output from deterioration due to fast slewing input and also to maintain full voltage swing. This minimizes average power and results in an almost flat delay. Subsequently, performances are evaluated under PVT and PSN due to injection of high I(t).

#### 3. Proposed Latch

The proposed latch coupled to a common PDN and current I(t) drawn by other circuits is shown in Fig. 3(a) [8]. Package pin and socket used to join PCB to a stable  $V_{dd}$  are inductance  $L_{mb}$ , capacitance  $C_{mb}$  and resistance  $R_{mb}$ . Along with them the path to power supply also contains resistance  $R_{\rm skt}$  and inductance  $L_{\rm skt}$ . The package inductance to connect C4 silicon bump to package decoupling capacitance  $C_{pkg}$  is  $L_{pkg}$  whereas, active resistance of decoupling capacitance and package traces is  $R_{pkg}$ . The resistance and inductance due to via are  $R_{via}$  and  $L_{via}$ , respectively. However, the resistance and decoupling capacitance of die are  $R_{die}$  and  $C_{die}$ , respectively. Normally, as other circuits are suddenly turned on, the current drawn is I(t). Besides,  $V_P$  is maintained close to  $V_{dd}$  using  $C_{die}$  and oscillation peak is regulated with  $R_{die}$ . The sudden I(t) causes  $V_P$  to droop and oscillate with PDN frequency, thereby affects t<sub>d</sub>. A regular CPU is imitated using the PDN, where 3-7 clock cycles are required to generated I(t) of 10–30 A. The working of latch coupled to a PDN is described to note performances with process, voltage, temperature and during the period  $V_{\rm P}$  is drooping.

Latch built using two identical current steering circuits [24] and connected in a cross-coupled manner is shown in Fig. 3(b). It is designed using PMOS ( $M_{P1}$ ) and two NMOS ( $M_{N1}$  and  $M_{N2}$ ) gates. A bias voltage  $V_b$  applied as shown in Fig. 3(b) allows  $M_{N1}$  to be always on. Further, the source and drain terminals of the  $M_{P1}$  and  $M_{N2}$ , respectively, are shorted and connected to the drain terminals of PMOS  $M_{P2}$ . The clock Clk2 and Clk1 at the gate of  $M_{P2}$  either allow the differential data at the inputs In1 and In2 to be written at the outputs Out1 and Out2, respectively, or retain the previous state of the outputs. A logic low and high at the Clk2 and Clk1, respectively, allows write operation. However, a vice-versa enables the cross-coupled  $M_{N2}$  to retain the corresponding data at the outputs.

During read cycle the present set-up, Fig. 3(b), avoids retention of differential data at the Out1 and Out2, respectively. To establish the fact let us consider that the Out1 and Out2 are at logic high and low, respectively. This allows one of the cross-coupled M<sub>N2</sub> to be on. Since Clk1 is also enabled during the cycle, Out2 is also noted to be charged to logic high through M<sub>N2A</sub>. Subsequently, both the outputs are at same logic level, which is undesirable. Similar behavior is evident with logic low and high at the Out1 and Out2, respectively. Therefore, Figure 3(b) is modified to retain differential data at the outputs. The cross-coupled M<sub>N2</sub> is replaced with PMOS M<sub>P3</sub> to avoid state change at the outputs, while the remaining connections are unaltered. This is shown in Fig. 3(b) with bidirectional arrows. Even though differential data are retained at the outputs, two more issues arise due to fast slewing inputs. Normally, differential data switching at 1 ns and  $t_{\rm R}$ ,  $t_{\rm F}$  times of about 0.1 ns are fed at the In1 and In2, respectively. Even though, the clock enables the read cycle, differential data are available continuously at the In1and In2 of the gates M<sub>P1</sub>. This results in a flow of current towards V<sub>P</sub> and deteriorates the Out1 and Out2 from having a complete voltage swing.

Ensuing discussion detailed how to avoid worsening of the outputs and the subsequent lowering of voltage swing during read cycle. One approach is to break the path between  $V_P$  and ground to forbid current flow towards  $V_P$ . A gate is inserted and controlled using the write clock to forestall current flow during read cycle. However, stack height increa-



Fig. 3. (a) Block diagram of latch coupled to a typical PDN. (b) Schematic of the latch built using current steering circuit.

ses and additional voltage drop across each gate prevents full voltage swing at the outputs. Figure 4 shows an alternative design derived from Fig. 3(b). Two additional NMOS M<sub>N2</sub> and transmission gate (TG) built using PMOS  $M_{P4}$  and NMOS M<sub>N3</sub> are included in the design as shown in the figure. M<sub>N2</sub> is biased with an aim to create additional discharging path. Whereas, TG separates the MP1 from fast switching inputs during read cycle. The usefulness of M<sub>N2</sub> is understood with the following case. Suppose, the Out1 and Out2 are at logic high and low, respectively. During the next write cycle a logic low and high are to be written at the Out1 and Out2, respectively. In the absence of  $M_{N2}$ , Out1 discharges through M<sub>N1</sub> and Out2 charges towards V<sub>P</sub>. It is interesting to note that the Out1 cannot be discharged completely. The inclusion of M<sub>N2</sub> provides an additional discharging path. Initially, M<sub>N2B</sub> is off because Out2 is at logic low and Out1 discharges through M<sub>N1B</sub> only. But as Out2 charges to a certain level ( $\approx$  to the threshold of M<sub>N2A</sub>), M<sub>N2B</sub> turns on and Out1 discharges through the two paths.

The clock that enables the write and read cycle also controls the TG operation. During read cycle, TG prevents the differential data at the In1 and In2 to be available at the write gates  $M_{P1}$ . In fact, the gate terminal of  $M_{P1}$  is not floating but maintains the last data bit (logic high or low) of the write cycle during the entire read period. This inhibits switching while the on-resistance of the TG,  $M_{P1}$  capacitance forms a low pass filter. Therefore, suppression of the high frequency components avoids current flow towards  $V_P$  and prevents the outputs from falling apart.

#### 3.1 Analytical Model for Delay

Figure 4 allows it to be modelled analytically to understand  $t_d$  dependency. Writing state equation over different time scale during write cycle facilitates  $t_d$  to be expressed in terms of design variables and process parameters. This will be a factor to determine  $t_d$  without affecting  $P_{avg}$ . A section of Fig. 4 is redrawn in Fig. 5(a), where the gate voltage of  $M_{N2B}$  is set at Out2 because of the connection, Fig. 4. Let us assume that the Out1 and Out2 are at logic high and low, respectively, as shown in Fig. 5(b, c). In this figure the high and low values are denoted using  $V_{OH}$  and  $V_{OL}$ , respectively, whereas  $V_{50\%}$  is equal to half of  $V_{dd}$ . A high to low and a low to high switching at the Out1 and Out2, respectively, also correspond to an identical behavior at the In1 and In2, respectively. Typically, the swing at Out2 is expressed as  $V_{\rm OH} - V_{\rm Out1}$ , where  $V_{\rm Out1}$  refers to voltage level at Out1 and happens to be a fair approximation. At node Out1,

$$I_{\rm L} = I_{\rm P1} - I_{\rm N1} - I_{\rm N2}.$$
 (1)

A low to high switching at the In2 disables  $M_{P1B}$ , thereby sets  $I_{P1} \approx 0$  and a  $V_{OL}$  at Out2 also sets  $I_{N2} \approx 0$  at the onset of discharging process. Therefore,

$$I_{\rm L} = C_{\rm L} \frac{\mathrm{d}V_{\rm Out1}}{\mathrm{d}t} = -I_{\rm N1}.$$
 (2)

Now  $I_{\rm N1} = \frac{1}{2} \mu_{\rm n} C_{\rm OX} \left( \frac{W}{L} \right)_{\rm N1} \left( V_{\rm GSN1} - V_{\rm T0N1} \right)^2$ ,

$$I_{\rm N1} = \frac{1}{2} \mu_{\rm n} C_{\rm OX} \left(\frac{W}{L}\right)_{\rm N1} \left(V_{\rm b} - V_{\rm T0N1}\right)^2$$
(3)

where  $V_{\text{GSN1}} = V_{\text{b}}$ ,  $V_{\text{T0N1}}$ ,  $W_{\text{N1}}$  and  $L_{\text{N1}}$  are the gate to source, threshold voltage, width and length of M<sub>N1</sub>. However,  $\mu_{\text{n}}$  and  $C_{\text{OX}}$  are process parameters. Subsequently, (1) can be written as,

$$C_{\rm L} \frac{{\rm d}V_{\rm Out1}}{{\rm d}t} = -\frac{1}{2}\mu_{\rm n}C_{\rm OX} \left(\frac{W}{L}\right)_{\rm N1} \left(V_{\rm b} - V_{\rm T0N1}\right)^2 \qquad (4)$$

$$\Rightarrow \int_{t_3}^{t_4} dt = \frac{-2C_L}{\mu_n C_{\text{OX}} \left(\frac{W}{L}\right)_{\text{N1}} \left(V_b - V_{\text{T0N1}}\right)^2} \int_{V_{\text{OH}}}^{V_{\text{OH}} - V_{\text{T0N1}}} dV_{\text{Out1}}$$

$$\Rightarrow t_4 - t_3 = \frac{2C_L V_{T0N1}}{\mu_n C_{\text{OX}} \left(\frac{W}{L}\right)_{\text{N1}} \left(V_{\text{b}} - V_{T0N1}\right)^2}.$$
 (5)

Initially, Out1 discharges through the always on  $M_{N1}$  and Out2 charges through  $M_{P1}$  to  $V_P$ , Fig. 4(b). As Out2 reaches  $V_{OL} + V_{TP1}$ , Fig. 5(c),  $M_{N2}$  also turns on to create an addition discharging path for Out1. Therefore,  $I_L = -I_{N1} - I_{N2}$  and

$$I_{\rm L} = C_{\rm L} \frac{\mathrm{d}V_{\rm Out1}}{\mathrm{d}t} = -I_{\rm N1} - I_{\rm N2} \tag{6}$$

$$\Rightarrow C_{\rm L} \frac{dV_{\rm Out1}}{dt} = -\frac{\mu_{\rm n} C_{\rm OX} W_{\rm N1}}{2L_{\rm N1}} (V_{\rm b} - V_{\rm T0N1})^2 - \frac{\mu_{\rm n} C_{\rm OX} W_{\rm N2}}{2L_{\rm N2}} (V_{\rm gN2} - V_{\rm T0N2})^2$$
(7)

where  $L_{N2}$  refers to the channel length of  $M_{N2}$  and  $L_{N1} = L_{N2} = L$ . Also,  $W_{N2}$ ,  $V_{gN2}$  and  $V_{T0N2}$  are the width, gate to source and threshold voltage, respectively, of  $M_{N2}$ . Substituting  $V_{gN2}$  and  $V_{T0N1} = V_{T0N2} = V_{T0N}$  in (7) gives,

$$C_{\rm L} \frac{{\rm d}V_{\rm Out1}}{{\rm d}t} = -\frac{\mu_{\rm n}C_{\rm OX}W_{\rm N1}}{2L} (V_{\rm b} - V_{\rm T0N})^2 - \frac{\mu_{\rm n}C_{\rm OX}W_{\rm N2}}{2L} (V_{\rm OH} - V_{\rm Out1} - V_{\rm T0N})^2$$
(8)



Fig. 4. Schematic of latch derived from Fig. 3(b) to achieve undistorted full voltage swing at the outputs.





$$\implies C_{\rm L} \frac{\mathrm{d}V_{\rm Out1}}{\mathrm{d}t} = -\frac{\mu_{\rm n}C_{\rm OX}W_{\rm N2}}{2L} \Big[ (V_{\rm Out1} - X_1)^2 - k_1^2 \Big] \tag{9}$$

where 
$$X_1 = V_{\text{OH}} (1 - V_{\text{T0N}})$$
 and  
 $k_1^2 = \frac{W_{\text{N1}}}{W_{\text{N2}}} (V_{\text{b}} - V_{\text{T0N}})^2 + (V_{\text{OH}} - V_{\text{T0N}})^2 - V_{\text{OH}}^2 (1 - V_{\text{T0N}})^2$ .

Now rearranging (9) and integrating gives

\*\*

$$\int_{t_4}^{t_5} dt = \int_{V_{\text{OH}}-V_{\text{TON}}}^{V_{50\%}} \frac{-2C_{\text{L}}dV_{\text{Out1}}}{-\frac{\mu_{\text{n}}C_{\text{OX}}W_{\text{N2}}}{2L} \left[ \left(V_{\text{out1}}-X_1\right)^2 - k_1^2 \right]}$$

$$(t_{5} - t_{4}) = \frac{LC_{L}}{\mu_{n}C_{OX}W_{N2}k_{1}} \ln \frac{V_{OH}^{2} - V_{OH}(3X_{1} - k_{1}) - V_{TON}(V_{OH} + 2k_{1} - 2X_{1}) + 2(X_{1}^{2} - k_{1}^{2})}{V_{OH}^{2} - V_{OH}(3X_{1} + k_{1}) - V_{TON}(V_{OH} - 2k_{1} - 2X_{1}) + 2(X_{1}^{2} - k_{1}^{2})}.$$

$$(10)$$

Substituting  $V_{\text{T0N1}} = V_{\text{T0N2}} = V_{\text{T0N}}$  in (5), the high-to-low delay ( $t_{\text{pHL}}$ ) is written as

$$t_{5} - t_{3} = \frac{2LC_{\rm L}V_{\rm T0N}}{\mu_{\rm n}C_{\rm OX}W_{\rm N1}(V_{\rm b} - V_{\rm T0N})^{2}} + \frac{LC_{\rm L}}{\mu_{\rm n}C_{\rm OX}W_{\rm N2}k_{\rm l}} \cdot \\ \ln \frac{V_{\rm OH}^{2} - V_{\rm OH}(3X_{\rm 1} - k_{\rm 1}) - V_{\rm T0N}(V_{\rm OH} + 2k_{\rm 1} - 2X_{\rm 1}) + 2(X_{\rm 1}^{2} - k_{\rm 1}^{2})}{V_{\rm OH}^{2} - V_{\rm OH}(3X_{\rm 1} + k_{\rm 1}) - V_{\rm T0N}(V_{\rm OH} - 2k_{\rm 1} - 2X_{\rm 1}) + 2(X_{\rm 1}^{2} - k_{\rm 1}^{2})}.$$
(11)

Now a high to low switching at the In2 enables  $M_{P1B}$  whereas, a ( $V_{OH} - V_{Out1}$ ) at the Out2 sets  $M_{N2B}$  at the onset of charging process. At node Out1, the state equation is same as (1). Therefore,

$$I_{\rm L} = C_{\rm L} \frac{\mathrm{d}V_{\rm Out1}}{\mathrm{d}t} = I_{\rm P1} - I_{\rm N1} - I_{\rm N2}.$$
 (12)

During charging,  $M_{\rm P1}$  is in saturation and  $M_{\rm N1},\,M_{\rm N2}$  operate in linear region. Therefore,

$$C_{L} \frac{dV_{Out1}}{dt} = \frac{\mu_{p} C_{OX} W_{P1}}{2L_{P1}} (V_{SP1} - V_{OL} - V_{TP1})^{2} \left[ 1 + \lambda_{P1} (V_{SP1} - V_{Out1}) \right] - \left[ \frac{\mu_{n} C_{OX} W_{N1}}{L} \left\{ (V_{b} - V_{T0N1}) V_{Out1} - \frac{V_{Out1}^{2}}{2} \right\} \right] - \left[ \frac{\mu_{n} C_{OX} W_{N2}}{L} \left\{ (V_{OH} - V_{T0N2}) V_{Out1} - \frac{3V_{Out1}^{2}}{2} \right\} \right]$$
(13)

where  $W_{P1}$ ,  $L_{P1}$ ,  $V_{SP1}$ ,  $V_{TP1}$  and  $\lambda_{P1}$  are the width, length, source voltage, threshold voltage and channel length modulation coefficient, respectively, of  $M_{P1}$ . However,  $\mu_p$  is a process parameter and the remaining parameters are defined above while discussing high to low delay. Rearranging (13),

$$C_{L} \frac{dV_{Out1}}{dt} = \frac{\mu_{n} C_{OX} (W_{N1} + 3W_{N2})}{2L}$$

$$\begin{bmatrix} V_{Out1}^{2} - 2V_{Out1} \\ \frac{W_{N1} (V_{b} - V_{T0N1}) + W_{N2} (V_{OH} - V_{T0N2})}{\frac{H_{p}}{2\mu_{n}} W_{P1} (V_{SP1} - V_{TP1})^{2} \lambda_{P1}} \\ \frac{W_{N1} + 3W_{N2}}{W_{N1} + 3W_{N2}} \end{bmatrix}$$

$$+ \frac{\mu_{p} W_{P1}}{\mu_{n} (W_{N1} + 3W_{N2})} (V_{SP1} - V_{TP1})^{2} (1 + \lambda_{P1} V_{SP1})$$

$$(14)$$

$$\Rightarrow C_{\rm L} \frac{\mathrm{d}V_{\rm Out1}}{\mathrm{d}t} = \frac{\mu_{\rm n} C_{\rm OX} \left(W_{\rm N1} + 3W_{\rm N2}\right)}{2L} \left\{ \left(V_{\rm Out1} - X_2\right)^2 - k_2^2 \right\}$$
(15)

where  $X_2 =$ 

$$\frac{W_{N1}(V_{\rm b}-V_{\rm T0N1})+W_{N2}(V_{\rm OH}-V_{\rm T0N2})+\frac{\mu_{\rm p}}{2\mu_{\rm n}}W_{\rm P1}(V_{\rm SP1}-V_{\rm TP1})^{2}\lambda_{\rm P1}}{W_{\rm N1}+3W_{\rm N2}}$$
  
and  $k_{2}^{2} = \frac{\mu_{\rm p}W_{\rm P1}}{\mu_{\rm n}(W_{\rm N1}+3W_{\rm N2})}(V_{\rm SP1}-V_{\rm TP1})^{2}(1+\lambda_{\rm P1}V_{\rm SP1})-X_{2}^{2}$ .

Integrating (15) gives

$$\int_{t_0}^{t_1} dt = \frac{2LC_{\rm L}}{\mu_{\rm n}C_{\rm OX}\left(W_{\rm N1} + 3W_{\rm N2}\right)} \int_{0}^{V_{\rm TP1}} \frac{dV_{\rm Out1}}{\left(V_{\rm Out1} - X_2\right)^2 - k_2^2}$$
(16)  
$$t_1 - t_0 = \frac{LC_{\rm L}}{\mu_{\rm n}C_{\rm OX}\left(W_{\rm N1} + 3W_{\rm N2}\right)k_2} \ln \frac{X_2^2 + V_{\rm TP1}\left(k_2 - X_2\right) - k_2^2}{X_2^2 - V_{\rm TP1}\left(k_2 - X_2\right) - k_2^2}.$$
(17)

As Out1 reaches  $V_{OL} + V_{TP1}$  and Out2 at  $V_{OH} - V_{Out1}$ , the state equation at the Out is same as (12), where  $M_{P1}$  operates in linear region and  $M_{N1}$ ,  $M_{N2}$  are in saturation. Therefore,

$$C_{\rm L} \frac{dV_{\rm Out1}}{dt} = \frac{-\mu_{\rm p}C_{\rm OX}W_{\rm P1}}{2L} \Big[ V_{\rm Out1}^2 - 2V_{\rm Out1}V_{\rm TP1} + 2V_{\rm SP1}V_{\rm TP1} - V_{\rm SP1}^2 \Big] - \frac{\mu_{\rm n}C_{\rm OX}}{2L} \Bigg[ W_{\rm N1} (V_{\rm b} - V_{\rm T0N1})^2 + W_{\rm N2} \begin{cases} (V_{\rm OH} - V_{\rm T0N2})^2 + \\ V_{\rm Out1}^2 - 2(V_{\rm OH} - V_{\rm T0N2})V_{\rm Out1} \end{cases} \Bigg]$$

$$(18)$$

$$C_{\rm L} \frac{\mathrm{d}V_{\rm Out1}}{\mathrm{d}t} = \frac{-C_{\rm OX} \left(\mu_{\rm p} W_{\rm P1} + \mu_{\rm n} W_{\rm N2}\right)}{2L} \left[ \left(V_{\rm Out1} - X_3\right)^2 - K_3^2 \right]$$
(19)

where 
$$X_{3} = \frac{\mu_{p}W_{P1}V_{TP1} + \mu_{n}W_{N2}(V_{OH} - V_{T0N2})}{\mu_{p}W_{P1} + \mu_{n}W_{N2}}$$
 and  
 $K_{3}^{2} = -X_{3}^{2} + \left[\frac{\mu_{p}W_{P1}(2V_{SP1}V_{TP1} - V_{SP1}^{2}) + \mu_{n}W_{N1}(V_{b} - V_{T0N1})^{2}}{+\mu_{n}W_{N2}(V_{OH} - V_{T0N2})^{2}}{\mu_{p}W_{P1} + \mu_{n}W_{N2}}\right].$ 

Integrating (19),

$$\int_{t_{1}}^{t_{2}} dt = \frac{2LC_{L}}{C_{OX} \left(\mu_{p}W_{P1} + \mu_{n}W_{N2}\right)} \int_{V_{TP1}}^{V_{S0\%}} \frac{dV_{Out1}}{K_{3}^{2} - (V_{Out1} - X_{3})^{2}} \quad (20)$$

$$t_{2} - t_{1} = \frac{LC_{L}}{C_{OX} \left(\mu_{p}W_{P1} + \mu_{n}W_{N2}\right)k_{3}} \cdot \left[\ln \left\{\frac{2K_{3}^{2} + k_{3} \left(V_{OH} - 2V_{TP1}\right) + X_{3} \left(V_{OH} + 2V_{TP1}\right)}{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + X_{3} \left(V_{OH} + 2V_{TP1}\right)}\right] \cdot \left[\ln \left\{\frac{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + X_{3} \left(V_{OH} + 2V_{TP1}\right)}{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + X_{3} \left(V_{OH} + 2V_{TP1}\right)}\right] \cdot \left[\ln \left\{\frac{1}{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + 2K_{3} \left(V_{OH} + 2V_{TP1}\right)}{2K_{3}^{2} - V_{TP1}V_{OH}}\right\}\right] \cdot \left[\ln \left\{\frac{1}{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + 2K_{3} \left(V_{OH} + 2V_{TP1}\right)}{2K_{3}^{2} - V_{TP1}V_{OH}}\right\}\right] \cdot \left[\ln \left\{\frac{1}{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + 2K_{3} \left(V_{OH} + 2V_{TP1}\right)}{2K_{3}^{2} - V_{TP1}V_{OH}}\right\}\right] \cdot \left[\ln \left\{\frac{1}{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + 2K_{3} \left(V_{OH} + 2V_{TP1}\right)}{2K_{3}^{2} - V_{TP1}V_{OH}}\right)\right\}\right] \cdot \left[\ln \left\{\frac{1}{2K_{3}^{2} - k_{3} \left(V_{OH} - 2V_{TP1}\right) + 2K_{3} \left(V_{OH} + 2V_{TP1}\right)}{2K_{3}^{2} - K_{3} \left(V_{OH} - 2V_{TP1}\right) + 2K_{3} \left(V_{OH} - 2V_{TP1}\right)}\right]\right\}$$

The low-to-high delay  $(t_{pLH})$  is written as

$$t_{2}-t_{0} = \frac{LC_{L}}{\mu_{n}C_{OX}(W_{N1}+3W_{N2})k_{2}} \ln \frac{X_{2}^{2}+V_{TP1}(k_{2}-X_{2})-k_{2}^{2}}{X_{2}^{2}-V_{TP1}(k_{2}-X_{2})-k_{2}^{2}} + \frac{LC_{L}}{C_{OX}(\mu_{p}W_{P1}+\mu_{n}W_{N2})k_{3}} \cdot \left\{ \frac{2K_{3}^{2}+k_{3}(V_{OH}-2V_{TP1})+X_{3}(V_{OH}+2V_{TP1})}{-2X_{3}^{2}-V_{TP1}V_{OH}} - \frac{2X_{3}^{2}-V_{TP1}V_{OH}}{-2X_{3}^{2}-V_{TP1}V_{OH}} \right\}.$$
(22

The total delay  $(t_d)$  is expressed as summation of (11) and (22). But, in a given process and  $V_{dd}$  the  $t_d (t_{pHL} + t_{pLH})$  is a function of process parameter, design variable and bias voltage. Typically, the log term in (11) reduces to a constant value. Subsequently,  $t_{\text{pHL}}$  can be adjusted by varying W of M<sub>N1</sub> and M<sub>N2</sub>. Increasing W enables more current flow to get a full voltage swing. Eventually,  $t_{pHL}$  decreases at the cost of  $P_{\text{avg.}}$  Whereas, reducing W results in vice-versa. Similarly, for a given  $V_{dd}$  the log terms in (22) also reduce to a constant value. Therefore,  $t_{pLH}$  is noted to be adjusted only by varying  $C_{\rm L}$  or W of M<sub>N1</sub>, M<sub>N2</sub> and M<sub>P1</sub>. The effect is found to be identical to that of  $t_{\text{pHL}}$ . Therefore, W needs to be chosen in a manner to maintain not only full voltage swing but also comparable  $P_{\text{avg}}$  and optimum  $t_{\text{d}}$ .

#### **Results and Analysis** 4.

The circuit parasitic values in the PDN are shown in Tab. 1 [8]. At 0.09- $\mu$ m CMOS and 1.1 V  $V_{dd}$ . Table 2 shows the gate sizes used to design Fig. 4 for an almost full voltage swing. The TG gate dimension is also chosen such that fast slewing data is avoided at the gate MP1 during read cycle but maintains a constant voltage to prevent floating case. The layout of Fig. 4 is shown in Fig. 6(a). The area is noted to be  $18.4 \times 16.1 \,\mu\text{m}^2$  without the bond pads and added die decoupling capacitor. It includes inputs, outputs, clock, VP, Vb and ground connections. The clock, input and output vs. time in post-layout are shown in Fig. 6(b, c, d). It is observed that as long as Clk2 is at logic low ( $\sim 0$ ), data at the In1 and In2 are written at the Out1 and Out2, respectively. The instant Clk2 becomes logic high (~1.1 V), Clk1 is a logic low and the latch enters hold state. This corresponds to the time around 2.5 ns. The cases arising for writing and holding both a logic low and high at the outputs are shown in Fig. 6(d). It is worthy to state that during this period, either 0 or 1.1 V is maintained at the gate of M<sub>P1</sub> to prevent floating nodes. In comparison to clock and inputs the output transients in postlayout are noted to drop slightly due to layout parasitics pcap and pres. However, this drop is not significant enough to degrade working of FD as discussed later. The post-layout metrics along with the corresponding mean  $(\mu)$  and standard deviation ( $\sigma$ ) are tabulated in Tab. 3 for no skew and 5% skew in  $V_{dd}$ , rise and fall times.  $P_{avg}$  is found to be varying between 232.7  $\mu$ W for SS and 463.4  $\mu$ W for FF for a fixed V<sub>dd</sub>. The corresponding variation for  $t_d$  is 49.7 ps for FF and 83.3 ps for SS. Yet, the difference in data for PDP and  $V_n$  at different corners is quite less. It is noted that  $P_{avg}$  and PDP are highest for FF, where  $3\sigma$  is 41.1 µW and 2.1 fJ, respectively. Whereas, at SS  $t_d$  and  $V_n$  are maximum and the  $3\sigma$  is about 2.4 ps and 12.6 nV/ $\sqrt{\text{Hz}}$ , respectively.

For a nominal (NN) case the histogram of  $P_{avg}$ ,  $t_d$  and PDP are shown in Fig. 7. These are obtained while performing 500 run Monte Carlo analysis. The  $P_{avg}$ ,  $t_d$  and PDP are similar to the no skew data in Tab. 3. The  $\mu$  and  $\sigma$ 

V

0.6

| Inductances  | L <sub>vr</sub> | L <sub>blk</sub> | L <sub>mb</sub> | $L_{pin}$        | $L_{pkg}$        | L <sub>via</sub> | -                |
|--------------|-----------------|------------------|-----------------|------------------|------------------|------------------|------------------|
|              | 1 nH            | 1 nH             | 300 pH          | 50 pH            | 30 pH            | 20 pH            | -                |
| <b>D</b> : ( | R <sub>vr</sub> | R <sub>blk</sub> | R <sub>mb</sub> | R <sub>pin</sub> | R <sub>pkg</sub> | R <sub>via</sub> | R <sub>die</sub> |
| Resistances  | 1 mΩ            | 2 mΩ             | 0.5 mΩ          | 0.2 mΩ           | 1 mΩ             | 0.2 mΩ           | 1 mΩ             |
| Capacitances | -               | C <sub>blk</sub> | C <sub>mb</sub> | -                | $C_{pkg}$        | -                | C <sub>die</sub> |
|              | -               | 1.5 nF           | 0.1 nF          | -                | 0.02 nF          | -                | 100 nF           |

| M <sub>P1</sub> | M <sub>P2</sub> | Мрз         | M <sub>P4</sub> | M <sub>N1</sub> | M <sub>N2</sub> | M <sub>N3</sub> |  |
|-----------------|-----------------|-------------|-----------------|-----------------|-----------------|-----------------|--|
| 6 μ/100 n       | 9.0µ/100 n      | 3.6 µ/100 n | 3.0 μ/100 n     | 1.2 μ/100 n     | 1.0 μ/100 n     | 1.5 μ/100 n     |  |

| Process & corner |    | Pa    | Pavg (µW) |      | t <sub>d</sub> (ps) |      | PDP (fJ) |     | V <sub>n</sub> (nV/√Hz) |  |
|------------------|----|-------|-----------|------|---------------------|------|----------|-----|-------------------------|--|
|                  | NN |       | 339.6     |      | 61.8                |      | 21.0     |     | 4.0                     |  |
|                  | SS |       | 232.7     |      | 83.3                |      | 19.4     |     | 4.2                     |  |
| No skew          | FF |       | 463.4     |      | 49.7                |      | 23.0     |     | 3.8                     |  |
|                  | SF |       | 250.0     |      | 62.1                |      | 15.5     |     | 3.6                     |  |
|                  | FS |       | 383.7     |      | 52.8                |      | 20.3     |     | 5.3                     |  |
|                  |    | μ     | σ         | μ    | σ                   | μ    | σ        | μ   | σ                       |  |
|                  | NN | 339.6 | 10.5      | 61.8 | 0.3                 | 21.0 | 0.6      | 4.0 | 0.04                    |  |
|                  | SS | 232.7 | 7.5       | 83.3 | 0.8                 | 19.4 | 0.4      | 4.2 | 0.05                    |  |
| 5% skew          | FF | 463.4 | 13.7      | 49.7 | 0.2                 | 23.0 | 0.7      | 3.8 | 0.03                    |  |
|                  | SF | 250.0 | 7.8       | 62.1 | 1.1                 | 15.5 | 0.2      | 3.6 | 0.04                    |  |
|                  | FS | 383.7 | 12.6      | 52.8 | 0.7                 | 20.3 | 0.3      | 5.3 | 0.03                    |  |

Tab. 2. Gate sizes of Fig. 4.

Tab. 1. Parasitic values of PDN of Fig. 3(a) [8].

Tab. 3. Performances metric in post-layout.

of the noise margin low (NM<sub>L</sub>) for all the process corners are found to have some dependence of temperature, Tab. 4. The same is true for noise margin high (NM<sub>H</sub>). Since  $\sigma$  are in the range of few mV and the average values are larger, the confidence levels for our system tolerances are quite high. An estimate shows by computing the average of the  $\sigma$  for both NM<sub>L</sub> and NM<sub>H</sub> over all temperatures that this circuit can work way over 3 sigma limits.

The performance variations of the present latch with temperature at distinct corners are illustrated in Fig. 8. The  $P_{\text{avg}}$  drawn differs between 194.6  $\mu$ W and 547.4  $\mu$ W for SS and FF, respectively, as shown in Fig. 8(a). This confirms to



Fig. 6. (a) Layout of Fig. 4, plot of (b) clock, (c) input and (d) output vs. time in post-layout.



**Fig. 7.** Monte Carlo analysis of (a)  $P_{avg}$ , (b)  $t_d$  and (c) *PDP*.

| Temperature | NM  | L (V) | NM <sub>H</sub> (V) |      |  |
|-------------|-----|-------|---------------------|------|--|
| (°C)        | μ   | σ     | μ                   | σ    |  |
| -27         | 0.4 | 0.1   | 0.3                 | 0.02 |  |
| 0           | 0.4 | 0.1   | 0.28                | 0.03 |  |
| 27          | 0.4 | 0.1   | 0.27                | 0.03 |  |
| 54          | 0.4 | 0.1   | 0.26                | 0.03 |  |
| 90          | 0.4 | 0.1   | 0.25                | 0.03 |  |

Tab. 4. Noise margin variation with temperature in post-layout.

a temperature change of 117°C and attributed to an increase in  $t_d$  between the input and output, Fig. 8(b). At distinct corners, a temperature variation between -27°C and 90°C enables a change by 15.7 ps, 23.7 ps, 13.6 ps, 11.6 ps and 24.8 ps, respectively. Subsequently, this refers to an increase of about 0.1 ps for 1°C temperature change. In addition to that, PDP drops with an increase in temperature and corresponds to about 0.01 fJ for 1°C temperature change, Fig. 8(c). A figure of merit (FoM) is defined in terms of energy delay product (EDP), area and NM (23) to indicate circuit robustness. The objective is to minimize FoM for a given process,  $V_{dd}$  and junction temperature. For a NN process, FoM is noted to be 0.56 ns × fJ ×  $\mu$ m<sup>2</sup> at 1.1 V  $V_{dd}$  and 27°C. But remains almost flat with 117°C temperature shift.

$$FoM = EDP \times Area \times \frac{Voltage \ Swing}{Noise \ Margin}$$
(23)

Figure 9(a) illustrates that corresponding to a  $V_{dd}$  variation between 0.8 V to 1.1 V, the  $P_{avg}$  drawn in post-layout

varies between 122  $\mu$ W and 463.3  $\mu$ W for SS and FF, respectively. However,  $P_{avg}$  drops by about 0.4–0.7  $\mu$ W for 1 mV drop in  $V_{dd}$ . At distinct corners,  $t_d$  changes by about 0.03–0.06 ps for 1 mV drop in  $V_{dd}$ , Fig. 9(b). However, the variation in  $t_d$  with  $V_{dd}$  is quite minimal in comparison to the temperature change. Figure 9(c) shows a PDP drop of 0.02–0.04 fJ for 1 mV drop in  $V_{dd}$ . At this point, one can say that at 0.09- $\mu$ m CMOS and 0.8 V the total capacitance is minimum. Thus, operating at this voltage increases speed because of lowering of capacitance thereby reduces power. FoM is also observed to reduce as  $V_{dd}$  shifts and confirms to a drop 0.007–0.008 ns × fJ ×  $\mu$ m<sup>2</sup> for 1 mV reduction.



Fig. 8. Performances variation: (a)  $P_{avg}$ , (b)  $t_d$ , (c) PDP and (d) FoM vs. temperature.



Fig. 9. Performances variation: (a)  $P_{avg}$ , (b)  $t_d$ , (c) PDP and (d) FoM vs.  $V_{dd}$ .

Figures 10 (a), (b) show the post-layout white noise  $(V_n)$  variation with frequency for different temperature and  $V_{dd}$ , respectively. In both cases, the noise value reduces at higher frequency. The plot of output eye in post-layout is shown in Fig. 10(c). The eye goodness is determined using parameters defined in Tab. 5. The most significant parameters are the height and width of eye, Fig. 10(c). Typically, this defines the accuracy of the output voltage swing variations and timing jitter. In this connection the jitter is 0.2 ps while the eye opening is 162.2 mV. In addition to so, the rise and fall times are larger than the jitter, which normally changes them.



Fig. 10. Plot of output noise vs. frequency for different (a) temperature, (b)  $V_{dd}$  and (c) output eye of Fig. 4.

| Parameters                      | Values |
|---------------------------------|--------|
| Threshold crossing average (ps) | 20.2   |
| Threshold crossing stddev (ps)  | 11.2   |
| Level 0 mean (mV)               | 57.5   |
| Level 0 stddev (mV)             | 59.1   |
| Level 1 mean (mV)               | 819.0  |
| Level 1 stddev (mV)             | 140.6  |
| Eye amplitude (mV)              | 761.4  |
| Eye height (mV)                 | 162.2  |
| Eye width (ps)                  | 21.8   |
| Eye S/N                         | 3.8    |
| Eye rise time (ps)              | 8.9    |
| Eye fall time (ps)              | 14.9   |
| Random jitter (left) (ps)       | 0.2    |
| Random jitter (right) (ps)      | 0.04   |
| Deterministic jitter (ps)       | 0.06   |

Tab. 5. Eye details in post-layout.

#### 4.1 Performance Study under PSN

Following pumping of sudden current I(t) the node P  $V_P$  is observed to understand the  $t_d$  variation. It is imperative for the  $t_d$  to vary due to  $V_P$  fluctuations as shown in Fig. 11(a). Figures 11(b, c, d) illustrate the variation in the In1 and Out1 following the 0–10 A, 20 A and 30 A in 10 ns.

The In1 and Out1 variations for 0–30 A in 10 ns are shown in Fig. 12 on different time scale to understand the delay change. Typically, this shows the Out1 variation as the  $V_P$  shifts from 1.1 V to AC first droop and goes back to 1.1 V. Initially, the  $t_d$  because of no noise in  $V_P$ , Fig. 12(a), is denoted by  $\zeta$ . As  $V_P$  drops to 0.819 V,  $t_d$  is noted to be shifted by  $\Delta$ , Fig. 12(b). A  $t_d$  identical to Fig. 12(a) is noted as  $V_P$  ascents to 1.1 V, Fig. 12(c). Table 6 tabulates the  $t_d$  for all I(t). It is interesting to note for 0–10 A and 20 A the  $t_d$  is quite close to the no noise data (0 A) at 1.1 V. However,  $t_d$ changes as  $V_P$  droops for larger I(t).



Fig. 11. Plot of (a)  $V_P$  at node P and (b), (c), (d) In1, Out1 of Fig. 4 because of I(t).



**Fig. 12.** Plot of (a), (b), (c) In1 and Out1 due to 0-30A in 10 ns to understand  $t_d$  variation.

| $\frac{\Delta V_{\min}}{(V)}$ | Current<br>ramps | t <sub>d</sub><br>(ps) | Jitter<br>(ns) | $\Delta V_{\min}$ (V) | DC<br>(V) | t <sub>d</sub><br>(ps) | Jitter<br>(ns) |
|-------------------------------|------------------|------------------------|----------------|-----------------------|-----------|------------------------|----------------|
| 0.280                         | 1.1 V<br>_0_30 A | 53.0                   | 1.5            |                       | 0.819     | 62.0                   | 1.5            |
| 0.187                         | 1.1 V<br>_0–20 A | 60.0                   | 1.5            | 0                     | 0.913     | 63.0                   | 1.5            |
| 0.090                         | 1.1 V<br>_0–10 A | 61.7                   | 1.5            |                       | 1.01      | 62.7                   | 1.5            |
| 0                             | 1.1 V_0 A        | 61.8                   | 1.5            |                       | 0         |                        | 1.5            |

**Tab. 6.**  $t_d$  and jitter due to AC first droop and DC.

At this point, it also necessary to understand the  $t_d$  change with DC. This refers to replacing AC first droop with  $V_{dd}$  in Fig. 3(b). The  $t_d$  with DC is also tabulated in Tab. 6 to indicate an almost identical result to that of 1.1 V. It is worth to mention that Ldi/dt noise has least effect on the  $t_d$ .

The variation in period is often called jitter. This is evaluated from the time fall while there is a shift in noise from the minimal value to zero. The jitter out of AC noise and DC are tabulated in Tab. 6 to portray a constant value. Normally, CPU power can be as high as 100 W with 1 V  $V_{dd}$ . But, the present work considers only 30 A which is 30 W. This current reduces  $V_P$  from 1.1 V to 0.8 V and the circuit part works. The 30 W requires a robust design for thermal management. If the junction temperature is less than the required maximum allowed junction temperature, the part-will work reliably. Normally that number for silicon is about 90°C (Intel product lines). So, this 30 A (~ 30 W) will not produce any problem to reduce the reliability.

#### 4.2 Performances Comparison

To make a fair comparison [13], [14] and [21] are designed in 0.09- $\mu$ m and  $V_{dd}$  of 1.1 V as shown in Tab. 8. Gate sizes, bias current and resistances value required to attain an almost full voltage swing are tabulated in Tab. 7. Based on these, Out1, Out2 are obtained for In1, In2 switching at 1 GHz and  $t_{\rm R}$ ,  $t_{\rm F}$  of 100 ps as shown in Fig. 2. Normally, fast switching In1, In2 and its availability at the write gates during read cycle allows a current to flow towards V<sub>P</sub>. In addition to so, inclusion of NMOS  $M_{N4}$  [13] and the folded logic stage [14] creates a direct path between  $V_{\rm P}$  and ground. Current flows through the path to expedite  $P_{\text{avg}}$  and degrades the output swing. Indeed, the third row depicts an increase in  $P_{\text{avg.}}$  Structural adjustment in Fig. 4 avoids current flow to depict a far better output for inputs switching at 1 GHz. The present design has achieved a minimum value of 0.34 mW. The  $t_d$  between the input and output, referred to as latency, is 61.0 ps. This is close to [13], but less to that of [14] and [21]. The PDP also turns out to be minimum among all the designs. The FoM described using (23) implies the robustness of the circuit. A low value is desirable to indicate better performance in terms of EDP, layout area and noise margin (NM). Table 8 suggests that Fig. 4 is far better. At any given frequency, the present latch is switching close to 14 number of gates. Whereas the gate count for [14] and [21] is close to 10 and 8, respectively. Reference [21] has the smallest number of gates switching at any time, thereby, results in the lowest dynamic current. However, the layout area of all the gates in the present design is small. So, a low leakage current is expected as depicted in Tab. 8. Further, there are no constant current source in the proposed design while others used constant current sources. This, in fact, makes a significant difference in leakage current. It is also interesting to note that the proposed design can tolerate higher noise margin for lower and lowest for high logic level.

# 5. Frequency Divider using the Present Latch

The block diagram of the FD built using Fig. 4 is shown in Fig. 13(a). Two identical Fig. 4 are connected in a manner such that the Out1 and Out2 of latch B are fed back

|      | M <sub>P1</sub> | M <sub>N1</sub> | M <sub>N2</sub> | M <sub>N3</sub> | M <sub>N4</sub> | R      | Iss  | Io    |
|------|-----------------|-----------------|-----------------|-----------------|-----------------|--------|------|-------|
| [13] | 1µ/100n         | 5µ/100n         | 5µ/100n         | 7µ/100n         | 21µ/100n        | -      | -    | -     |
| [14] | 13µ/100n        | 13µ/100n        | 15µ/100n        | 13µ/100n        | 13µ/100n        | 1.2 kΩ | 2 mA | -     |
| [21] | -               | 11µ/100n        | 11µ/100n        | 11µ/100n        | 15µ/100n        | 2.0 kΩ | -    | 10 µA |

Tab. 7. Gate sizes of Fig. 1(a), (b), (c).

| Performances                     | Present | [13]  | [14]   | [21]  |
|----------------------------------|---------|-------|--------|-------|
| Technology (nm)                  | 90      | 90    | 90     | 90    |
| $V_{\rm dd}$ (V)                 | 1.1     | 1.1   | 1.1    | 1.1   |
| $P_{\rm avg}({ m mW})$           | 0.34    | 5.5   | 3.2    | 0.66  |
| <i>t</i> <sub>d</sub> (ps)       | 61.0    | 60.3  | 84.2   | 69.3  |
| PDP (fJ)                         | 21.0    | 334   | 271    | 46.0  |
| FoM (ns×fJ×µm²)                  | 0.56    | 10.7  | 37.8   | 2.3   |
| Dynamic current (µA)             | 54.8    | -     | 88.5   | 5.1   |
| Static current (µA)              | 256.6   | -     | 2855.8 | 542.0 |
| Area (µm²)                       | 296.2   | 367.6 | 1007.6 | 463.9 |
| $NM_{L}(V)$                      | 0.406   | 0.166 | 0.065  | 0.076 |
| <b>NM</b> н (V)                  | 0.268   | 0.549 | 0.543  | 0.595 |
| $V_{\rm n}  ({\rm nV}/{\rm Mz})$ | 4.0     | 2.3   | 2.1    | 2.3   |

Tab. 8. Performances comparison with conventional designs.





Fig. 13. (a) Block diagram of the frequency divider using Fig. 4 and (b) layout.

to the In2 and In1, respectively, of latch A while sharing one  $V_{\rm P}$  and ground. The layout of Fig. 13(a) is portrayed in Fig. 13(b). The total area without the bond pads and die decoupling capacitor is noted to be 39.7 × 16.1  $\mu$ m<sup>2</sup>. The post-layout transients at the Out1 and Out2 are shown in Fig. 14. Figure 14 shows that the Out1 and Out2 are at twice the time period of Clk1 and Clk2, respectively. For a NN case, the histograms of  $P_{\rm avg}$  and  $t_{\rm d}$ , Fig. 15, are obtained to indicate a value of 643.4  $\mu$ W and 90.8 ps, respectively.



Fig. 14. Post-layout output transient.



**Fig. 15.** Plot of Monte Carlo study (a)  $P_{\text{avg}}$  and (b)  $t_{\text{d}}$ .

#### 6. Conclusions

A new low power latch coupled to a typical PDN is presented as an alternative to conventional designs built using CML. Simulation results at 90-nm CMOS and V<sub>dd</sub> of 1.1 V indicate an undistorted output even though inputs are slewing rapidly. In addition to that, the proposed latch consumes less power than [13, 14, 21]. Even though the  $t_d$  is comparable with [13], the present design achieves an almost constant  $t_d$  with  $V_{dd}$  variations at 27°C. The shift in  $t_d$  is also minimum with temperature changes and lowest among all the design. A flat PDP is also obtained for both  $V_{dd}$  and temperature shift. However, an abrupt current pumped into the PDN causes the  $V_{\rm P}$  near the die to fluctuate. It is interesting to note that the output plot does not deteriorate much for an I(t) of 0–10 or 20 A in 10 ns. In addition to that, the shift in  $t_d$  is minimal for the said I(t). However, a large shift is expected beyond 20 A. The jitter so introduced because of PSN is noted to be about 1.5 ns.

### References

- GHILIONI, A., MAZZANTI, A., SVELTO, F. Analysis and design of mm wave frequency dividers based on dynamic latches with load modulation. *IEEE Journal of Solid State Circuits*, 2013, vol. 48, no. 8, p. 1842–1850. DOI: 10.1109/JSSC.2013.2258793
- [2] CHANDRAKASAN, A. P., SHENG, S., BRODERSEN, R. W. Low power CMOS digital design. *IEEE Journal of Solid State Circuits*, 1992, vol. 27, no. 4, p. 473–484. DOI: 10.1109/4.126534
- [3] RABEY, J. M., CHANDRAKASAN, A., NIKOLIC, B. *Digital Integrated Circuits: A Design Perspective*. 2<sup>nd</sup> ed. Upper Saddle River, NJ: Prentice Hall, 2003. ISBN: 978-9332573925
- [4] NG, H. T., ALLSTOT, D. J. CMOS current steering logic for lowvoltage mixed signal circuits. *IEEE Transactions on Very Large-Scale Integration (VLSI) Systems*, 1997, vol. 5, no. 3, p. 301–308. DOI: 10.1109/92.609873
- [5] HASSAN, H., ANIS, M., ELMASRY, M. MOS current mode circuits: analysis, design and variability. *IEEE Transactions on Very Large-Scale Integration (VLSI) Systems*, 2005, vol. 13, no. 8, p. 885–898. DOI: 10.1109/TVLSI.2005.853609
- [6] TAPARIA, A., BANERJEE, B., VISWANATHAN, T. R. CS-CMOS: A low noise logic family for mixed signal SoCs. *IEEE Transactions on Very Large-Scale Integration (VLSI) Systems*, 2011, vol. 19, no. 12, p. 2141–2148. DOI: 10.1109/TVLSI.2010.2089812
- [7] HOSSAIN, M. D. S., SAVIDIS, I. Dynamic differential signaling based logic families for robust ultra-power near threshold computing. *Microelectronics Journal*, 2020, vol. 102, p. 1–14. DOI: 10.1016/j.mejo.2020.104801
- [8] BHATTACHARYYA, B. K., LASKAR, N., DEBNATH, S., et al. Innovative scaling method to minimize cost of integrated circuit packages and devices. *IEEE Transactions on Component*, *Packaging and Manufacturing Technology*, 2014, vol. 4, no. 9, p. 1489–1494. DOI: 10.1109/TCPMT.2014.2339272
- [9] ALIOTO, M., MITA, R., PALUMBO, G. Performance evaluation of the low-voltage CML D-latch topology. *Integration*, 2003, vol. 36, no. 4, p. 191–209. DOI: 10.1016/j.vlsi.2003.09.001
- [10] HEYDARI, P., MOHANAVELU, R. Design of ultrahigh speed low-voltage CMOS CML buffers and latches. *IEEE Transactions* on Very Large-Scale Integration (VLSI) Systems, 2004, vol. 12, no. 10, p. 1081–1093. DOI: 10.1109/TVLSI.2004.833663

- [11] PAYANDEHNIA, P., MAGHAMI, H., SHEIKHAEI, S., et al. High speed CML latch using active inductor in 0.18μm CMOS technology. In *IEEE 19<sup>th</sup> Iranian Conference on Electrical Engineering*. Tehran (Iran), 2011, p. 1–4. ISSN: 2164-7054
- [12] TSAI, W. Y., CHIU, C. T., WU, J. M., et al. A novel low-gate count pipeline topology with multiplexer flip-flops for serial links. *IEEE Transactions on Circuits and Systems – I: Regular Papers*, 2012, vol. 59, no. 11, p. 2600–2610. DOI: 10.1109/TCSI.2012.2206494
- [13] GUPTA, K., PANDEY, N., GUPTA, M. MCML D latch using triple-tail cells: Analysis and design. *Active and Passive Electronic Components*, 2013, p. 1–9. DOI: 10.1155/2013/217674
- [14] SCOTTI, G., BELLIZIA, D., TRIFILETTI, A., et al. Design of lowvoltage high-speed CML D latches in nanometer CMOS technologies. *IEEE Transactions on Very Large-Scale Integration* (VLSI) Systems, 2017, vol. 25, no. 12, p. 3509–3520. DOI: 10.1109/TVLSI.2017.2750207
- [15] YAN, A., HU, Y., CUI, J., et al. Information assurance through redundant design: A novel TNU error resilient latch for harsh radiation environment. *IEEE Transactions on Computers*, 2020, vol. 69, no. 6, p. 789–799. DOI: 10.1109/TC.2020.2966200
- [16] SCOTTI, G., TRIFILETTI, A., PALUMBO, G. A novel 0.6V MCML D-latch topology exploiting dynamic body bias threshold lowering. In *IEEE 25<sup>th</sup> International Conference on Electronics, Circuits and Systems*. Bordeaux (France), 2018, p. 233–236. DOI: 10.1109/ICECS.2018.8618015
- [17] KUI, L. F., UDDIN, M. R., MUSYIIRAH, N., et al. Design simulation and analysis of a digital electro-optic SR NOR latch. In *TENCON 2018 - 2018 IEEE Region 10 Conference*. Jeju (Korea), 2018, p. 2422–2425. DOI: 10.1109/TENCON.2018.8650276
- [18] AMIRANY, A., RAJAEI, R. Low power and highly reliable single event upset immune latch for nanoscale CMOS technologies. In *IEEE Iranian Conference on Electrical Engineering*. Mashhad (Iran), 2018, p. 103–107. DOI: 10.1109/ICEE.2018.8472552
- [19] PIKE, J., PARVIZI, M., BEN-HAMIDA, N., et al. New chargesteering latches in 28nm CMOS for use in high-speed wireline transceiver. In *IEEE International Symposium on Circuits and Systems.* Florence (Italy), 2018, p. 1–5. DOI: 10.1109/ISCAS.2018.8351013
- [20] YAN, A., LAI, C., ZHANG, Y., et al. Novel low cost double and triple node upset tolerant latch designs for nano-scale CMOS. *IEEE Transactions on Emerging Topics in Computing*, 2018, vol. 9, no. 1, p. 520–533. DOI: 10.1109/TETC.2018.2871861
- [21] KUMAWAT, M., UPADHYAY, A. K., SHARMA, S., et al. An improved current mode logic latch for high-speed applications. *International Journal of Communication Systems*, 2019, vol. 33, no. 13, p. 1–9. DOI: 10.1002/dac.4118
- [22] SCOTTI, G., TRIFILETTI, A., PALUMBO, G. A novel 0.5V MCML D flip-flop topology exploiting forward body bias threshold lowering. *IEEE. Transactions on Circuits and Systems II: Express Briefs*, 2020, vol. 67, no. 3, p. 560–564. DOI: 10.1109/TCSII.2019.2919186
- [23] SANDHIE, Z. T., AHMED, F. U., CHOWDHURY, M. H. Design of ternary master-slave D flip-flop using MOS-GNRFET. In *IEEE International Midwest Symposium on Circuits and Systems*. Springfield (MA, USA), 2020, p. 554–557. DOI: 10.1109/MWSCAS48704.2020.9184618
- [24] KUMAR, M., MONDAL, A. J. A new low power current steering logic circuit for the design of digital subsystem. *International Journal of Electronics*, 2022, vol. 9, no. 3, p. 497–519. DOI: 10.1080/00207217.2021.1914188

### About the Authors ...

Mithilesh KUMAR was born in Arunachal Pradesh, India.

He received his M.Tech. in VLSI & Embedded Systems from the National Institute of Technology Arunachal Pradesh in 2018. At present he is a Ph.D. scholar in the Dept. of Electronics & Communication Engineering, NIT Arunachal Pradesh. His research interests include design and analysis of SerDes for serial links. Abir J. MONDAL (corresponding author) was born in West Bengal, India. He received his M.Tech. from the National Institute of Technology Durgapur in 2011 and Ph.D. from the National Institute of Technology Arunachal Pradesh in 2018. His research interests include SerDes, low voltage swing signaling, temperature sensor.