# Design Techniques for 50GS/s ADC



Yida Duan

# Electrical Engineering and Computer Sciences University of California at Berkeley

Technical Report No. UCB/EECS-2016-173 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-173.html

December 1, 2016

Copyright © 2016, by the author(s). All rights reserved.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.

Acknowledgement

The authors would like to thank the sponsors, faculty, staff, and students of Berkeley Wireless Research Center for support, and the TSMC University Shuttle Program for chip fabrication.

# Design Techniques for 50GS/s ADC

Yida Duan

# **Research Project**

Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of **Master of Science, Plan II**.

Approval for the Report and Comprehensive Examination:

**Committee:** 

Professor Elad Alon Research Advisor

Professor Vladimir Stojanović Second Reader

1

# **Design Techniques for 50GS/s ADCs**

# Abstract

In recent years, the explosive growth in data traffic has led to the demand for extremely high sample-rate ADCs. For example, high performance receivers for backplane channels and multi-mode fibers with DSP-based channel equalization or electronic dispersion compensation (EDC) rely on ADCs with sampling rates of greater than 10GS/s [1], [2]. Similarly, emerging 100Gbps/400Gbps coherent fiber optics receivers with high degree of modulation require even higher sampling speed – greater than 50GS/s [3], [4]. In these applications, moderate ENOB between 4 and 6 are required.

In this report, the design techniques for building 6b 50GS/s ADC is presented. To demonstrate proposed circuits and design techniques, a 12.8GS/s 32-way hierarchically time-interleaved quarter ADC prototype is fabricated in 65nm CMOS. It achieved 4.6 ENOB and 25GHz 3dB effective resolution bandwidth (ERBW). As described in Section VII, the layout of the prototype is taken particular care so that it can be straight-forwardly expanded 51.2GS/s via additional interleaving without significantly impacting ERBW and FOM.



Fig. 1. Conventional time-interleaved ADC

## I. Introduction

These applications typically require relatively modest resolution – ~4-6 ENOB. Most state-ofart solutions with a high degree of interleaving achieve either relatively degraded energy efficiency [3], [4] or lower-than-Nyquist 3dB ERBW [6]. These tradeoffs arise due to several issues with conventional time-interleaved ADC architectures. First, in order to isolate the capacitive parasitics of the samplers and mitigate kickback, such ADCs (Fig. 1) often consist of a broadband input buffer that directly drives all of the parallel sampling switches [5]. Since all the parallel switches directly sample the continuously changing output voltage of the buffer (or the input voltage itself, if a buffer is not used), the jitter of the clocks driving every one of these switches will degrade SNDR at high frequencies. Thus, to meet the stringent jitter required to achieve extremely high ERBW, an excessive amount of power must be spent in the clock distribution network to keep these clock signals as clean as possible. Furthermore, because this input signal (buffered or not) must fan out to all of the sub-ADCs, this routing may add significant parasitic capacitance, especially if the number of sub-ADCs is large. This will either limit the front-end bandwidth (and hence the bandwidth of the overall ADC) or cause excessive power consumption for the buffer. Finally, the input buffer must charge each sampling capacitor through the series resistance of the sampling switch during track mode, thus further limiting the bandwidth and the sampling rate of the converter.

To overcome these issues, we leverage a hierarchical sampling architecture [3], [7], [8] and propose a cascode sampling circuit. Section II reviews hierarchical sampling as an attractive architecture to alleviate the need for distributing a large number of low-jitter clocks. Then, Section III. proposes and introduces a cascode sampling circuit to overcome the speed limitations of traditional samplers based on series switches. In Section IV, a general power optimization method for hierarchical sampling network with cascode sampler is presented. Section V describes the circuit level implementations of the building blocks used in the design, including the cascode sampler, clock generation circuits, and sub-ADCs. Finally, measurement results are presented in Section VI, and the paper is concluded in Section VII.



# **II. Hierarchically Time-Interleaved Sampling**

As mentioned earlier and highlighted in Fig. 2 (ADC architecture) and Fig.3 (timing diagram), this design adopts a hierarchical sampling approach with a 4-way interleaved front-end sampleand-hold circuit in order to reduce the number of low-jitter clocks that must be generated and distributed [3], [7]. Once the continuously changing input voltage is sampled and held by the frond-end (Rank-1) sampler, the output of this sampler is a constant voltage during the entire hold time. Thus, any perturbation of sampling instance at the Rank-2 sampler does not directly translate into voltage errors as long as it is within this hold window, allowing the jitter requirements for the Rank-2 and subsequent ranks of samplers to be greatly relaxed. As a result, the only jitter-critical clocks in the entire sampler system are the 4 12.8GS/s clocks of the font-end samplers (Rank-1). An additional benefit of hierarchical sampling is the greatly reduced signal routing at the output of the front-end buffer; since it can limit the input bandwidth of the entire ADC, the bandwidth of this buffer is critical. As opposed to conventional time-interleaved ADCs where the input buffer must fan-out to all sub-ADCs (in this design, 128 sub-ADCs), the front-end sampler drives only the next rank of samplers/de-multiplexors (in this case, a single Rank-2 demux.), thus substantially reducing the parasitic capacitance at the output of the frond-end sampler.

#### **III. Cascode Sampler**

The hierarchical sampling architecture has significant benefits for high sample-rate applications, but its overall performance is still limited by the sampler circuits themselves. In particular, as we will describe next, conventional sampling circuits suffer from bandwidth limitations that can compromise either the overall bandwidth or the energy-efficiency of the entire ADC.

A conventional sampling circuit (Fig. 4a) consists of a source follower buffer combined with a series sampling switch. The final load capacitance  $C_L$  is thus driven by the sum of the output

resistance of the source follower and the switch resistance. This series configuration of resistors makes the conventional sampling circuit very power-inefficient in high speed designs. To make matters worse, when the sampling period approaches 4 FO4 (i.e., 4 times the fan-out-of-4 delay of an inverter) – i.e., at ~10GS/s and above – constant-V<sub>GS</sub> sampling techniques [9] do not perform well because of the long rise time of the switch control signal. The circuit's settling time must therefore be maintained even under the worst-case (signal-dependent) switch resistance, leading to substantially increased buffer power consumption.



Fig. 4. Schematic and small-signal model of (a) conventional sampler (b) cascode sampler

In order to mitigate the penalty caused by the series resistance of the sampling switch and hence improve the tradeoff between sampling speed and power consumption, we propose a cascode sampling circuit that merges the sampling operation into the buffer itself [10]. A single-ended version of the proposed cascode sampling circuit is shown in Fig. 4b. During the track phase when  $\Phi$  is high, M<sub>1,2,3</sub> form a cascoded common-source amplifier, with the PMOS M<sub>3</sub> acting as a triode load resistor. M<sub>1</sub> and M<sub>3</sub> are sized to provide a DC gain of ~1. During the hold phase when  $\Phi$  is low, both M<sub>2</sub> and M<sub>3</sub> are cut-off and the output voltage is held on *C<sub>L</sub>*. The key advantage of this design is that as long as the cascode device (M<sub>2</sub>) operates in saturation and has sufficiently high  $f_T$  relative to the operating rate, the dominant pole of the circuit is set only by the output node resistance and capacitance. In other words, in contrast to the traditional sampling circuit, the addition of the sampling switch does not directly affect the settling time.

In order to more rigorously highlight the benefits of a cascode sampling circuit over the conventional sampling circuit, we can use small signal models (Fig. 4) of both designs to analyze the trade-off between the input device  $g_m$  and the dominant pole location. In this technology, the dominant pole for the cascode sampling circuit is:

$$P_1 = -\frac{g_{ds3}}{C_L + C_{d2} + C_{d3} + C_{g3}/2} \qquad (1)$$

To provide some numerical comparisons, we will assume that the  $f_T$  of the PMOS transistors is half that of the NMOS transistors, and that the ratio between  $C_{d,s}$  and  $C_g$  is 1 for all transistors.<sup>1</sup> We will further assume (as is the case in our particular technology) that the maximum triode  $g_{ds}$ of a transistor is roughly twice the maximum saturation  $g_m$ . With all of these assumptions combined, if  $f_T$  is the unity current-gain frequency of all the NMOS transistors, then  $g_{ds3}/C_{g3} =$  $2 \cdot 2\pi f_T$ . Finally,  $g_{ds3}$  is equal to  $g_{m1}$  for unity DC gain. With these assumptions, (1) can be rewritten as:

$$P_1 = \frac{2\pi f_T}{\frac{2\pi f_T C_L}{g_{m1}} + \frac{5}{2}}$$
(2)

We next examine the dominant pole of the traditional sampler, which can be approximated as:

$$P_{1} \cong \frac{1}{\frac{C_{g1} + C_{s1} + C_{d3} + C_{d2} + C_{s2} + C_{g2} + C_{L}}{g_{m1}} + \frac{C_{L} + C_{d2} + C_{g2}/2}{g_{ds2}}}$$
(3)

<sup>&</sup>lt;sup>1</sup> In order to simplify the derivation, it is assumed that  $C_{g3}$  equals  $C_{g1}$  for the conventional sampler circuit (Fig. 4a). Although there may be some slight speed advantages to making device M<sub>3</sub> smaller, headroom limitations – especially in advanced processes with low supply voltage – often restrict the degree to which M3 can be downsized.

Utilizing the same assumptions as stated earlier for the cascode sampler, (3) becomes:

$$P_{1} = \frac{2\pi f_{T}}{\frac{15}{4} + 3\beta + \left(\frac{1}{2\beta} + 1\right)\frac{2\pi f_{T}C_{L}}{g_{m1}}}$$
(4)

Where  $\beta = W_2/W_1$  is the ratio of the widths of M<sub>2</sub> and M<sub>1</sub>. The dominant pole achieves its maximum value when  $\beta = 1/\sqrt{3g_{m1}/\pi f_T C_L}$ , and this optimal  $P_1$  is:

$$P_{1} = \frac{2\pi f_{T}}{\frac{15}{4} + \frac{2\pi f_{T}C_{L}}{g_{m1}} + 2\sqrt{\frac{3}{2} \cdot \frac{2\pi f_{T}C_{L}}{g_{m1}}}$$
(5)

With these expressions (Eq. (2) & (5)) in hand, Fig. 5 compares the trade-off between  $g_m$  and the dominant pole for both designs. Notice that the advantage of the cascode sampler is most apparent when the circuit bandwidth approaches a significant fraction of  $f_T$  (but remains well below  $f_T$  so that the source node of the cascode is still relatively fast). Specifically, for  $P_1 \approx \frac{1}{8} \cdot 2\pi f_T$  – which is the target for the Rank-1 sampler in our ADC design – the conventional sampler requires more than four times higher  $g_m$  (and hence power) than the proposed cascode sampler.



Fig. 5. Comparison between conventional and cascode sampler



Fig. 6. Input amp. of a diff.-pair vs. (a) large-signal  $G_m$  (b)  $HD_3$  caused by  $G_m$ -compression

The speed advantage of the cascode sampling structure does not come without expense. In particular, while in the conventional sampler design linearity is significantly improved by the internal feedback of the source follower circuit, the cascode sampler is an open loop structure that directly suffers from distortion due to the non-linear  $g_m$  of the input transistor M<sub>1</sub>. Therefore, in order for an ADC utilizing the cascode sampler to achieve sufficient SFDR, the signal swing must be carefully chosen to remain within the linear range of the sampler.

Since  $g_m$  is the dominant source of non-linearity, one can simply examine the transfer characteristics of a differential pair to predict the distortion of the cascode sampler circuit. As shown in Fig. 6, the variation in large-signal  $G_m$  with large differential input amplitude gives rise to HD<sub>3</sub>. With an over-drive voltage of 350mV for the input transistors and a moderate differential amplitude of ~200mV (peak-to-peak voltage of 400mV), the HD<sub>3</sub> is well below -48dB, and thus the non-linearity of the casocode sampler does not significantly degrade the SNDR of the ADC for the 5-bit target ENOB.<sup>2</sup>

 $<sup>^{2}</sup>$  The 3<sup>rd</sup> order distortion accumulates through the ranks. The final HD<sub>3</sub> at the end of sampler chain is ~N times larger than the individual sampler, where N is the total number of ranks. This effect must be taken into account when

#### **IV. Power Optimization of Sampler Network**



Fig. 7. A general hierarchical sampler with N-ranks, and a front-end track-and-hold

In this section, we develop a general method to optimize the sizes of the samplers and the sampling capacitors for a cascode sampler network with a fixed sampling hierarchy (i.e. fixed number of ranks and branching factors at each rank), and then apply it to our three rank sampler network design. Assuming the sampling network has N ranks with a front-end track-and-hold, and each sampler at Rank-i fans out to  $m_i$  branches (Fig. 7), the available settling time at each rank,  $T_i$ , can be calculated as a function of sampling frequency,  $f_s$ , and  $m_i$ 's<sup>3</sup>:

$$T_{i} = \begin{cases} \frac{1}{2f_{s}} & (i \le 2) \\ \left(\prod_{2}^{i-1} m_{k} - \prod_{1}^{i-2} m_{k}\right) \frac{1}{2f_{s}} & (i > 2) \end{cases}$$
(1)

As shown in Section II, the dominant pole of the cascode sampler is its output pole, so if the DC gain is one and the settling error is set to be  $\varepsilon$ , the settling constraint at Rank-i is:

$$\frac{T_i}{\ln(\varepsilon)} = \frac{C_{total,i}}{g_{m,i}} \qquad (2)$$

where  $g_{m,i}$ , and  $C_{total,i}$  are the trans-conductance, and total load capacitance of Rank-i, respectively. In addition to the settling time constraint, the front-end sampler also has to meet the overall ADC bandwidth requirement:

determining the total number of ranks in the sampler hierarchy.

<sup>&</sup>lt;sup>3</sup> The worst-case available settling time for each rank is the time when it is transparent and its previous rank is opaque

$$\frac{1}{2\pi f_B} = \frac{C_{total,1}}{g_{m,1}} \qquad (\mathbf{3})$$

Combining (1), (2), and (3) results in:

$$\frac{C_{total,i}}{g_{m,i}} = \tau_i' \qquad (\mathbf{4})$$

where:

$$\tau_i' = \begin{cases} \min\left(\frac{1}{\ln(\varepsilon)} \cdot \frac{1}{2f_s}, \frac{1}{2\pi f_B}\right) & (i=1) \\ \frac{1}{\ln(\varepsilon)} \cdot \frac{1}{2f_s} & (i=2) \\ \left(\prod_{2}^{i-1} m_k - \prod_{1}^{i-2} m_k\right) \cdot \frac{1}{\ln(\varepsilon)} \cdot \frac{1}{2f_s} & (i>2) \end{cases}$$
(5)

The total input referred noise must be less than the ADC noise budget to avoid SNDR degradation. The sampled noise power at the output of Rank-i is:

$$\overline{V_{n,i}}^2 = \frac{NF \cdot kT}{C_{total,i}} \qquad (6)$$

Substituting (4) in to (6),

$$\overline{V_{n,i}}^2 = \frac{NF \cdot kT}{\tau_i'} \cdot \frac{1}{g_{m,i}}$$
(7)

where *NF* is the effective noise factor of the cascode sampler. The total input referred noise of the sampler network can be written as a dot product, and the sampler noise constraint becomes:

$$\overline{V_{n,in}}^2 = NF \cdot kT \begin{bmatrix} \frac{1}{\tau_1'} & \dots & \frac{1}{\tau_N'} \end{bmatrix} \cdot \begin{bmatrix} \frac{1}{g_{m,1}} \\ \vdots \\ \frac{1}{g_{m,N}} \end{bmatrix} \le N_B \quad (\mathbf{8})$$

where  $N_B$  is the budget for thermal and flicker noise. Since sampling capacitors are added to reduce the thermal noise, an additional constraint must be imposed on these capacitors.  $C_{total,i}$  is

the sum of the output capacitance of Rank-i, the added sampling capacitance ( $C_{L,i}$ ), and the input capacitance of Rank-i+1:

$$C_{total,i} = \frac{\gamma \cdot g_{m,i}}{f_T} + C_{L,i} + \frac{g_{m,i+1}}{f_T}$$
(9)

Where  $\gamma$  is the ratio between the output capacitance and the input capacitance. Substituting (9) into (4) results in:

$$C_{L,i} = \left(\tau_i' - \frac{\gamma}{f_T}\right) g_{m,i} - \frac{1}{f_T} g_{m,i+1} \qquad (\mathbf{10})$$

With the help of matrix formulation, the dependence of  $C_{L,i}$  on  $g_{m,i}$  for all the ranks in the sampler network described in (10) can be written in one equation:

$$\begin{bmatrix} C_{L,1} \\ C_{L,2} \\ \vdots \\ C_{L,N-1} \\ C_{L,N} \end{bmatrix} = \begin{bmatrix} \tau_1'' & -1/f_T & \cdots & 0 \\ 0 & \tau_2'' & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & \tau_{N-1}'' & -1/f_T \\ 0 & \cdots & 0 & \tau_N'' \end{bmatrix} \cdot \begin{bmatrix} g_{m,1} \\ g_{m,2} \\ \vdots \\ g_{m,N-1} \\ g_{m,N} \end{bmatrix} \ge \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ C_{SAR} \end{bmatrix}$$
(11)

where  $\tau_i'' = \tau_i' - \frac{\gamma}{f_T}$ . The inequality in (10) describes the fact that the additional sampling capacitances must be greater than zero, except for the sampling capacitor of the last rank which needs to be larger than the input capacitance of the SAR ADC. Finally, the total power of the sampler network is proportional to the sum of the  $g_m$ 's of all the samplers in the network:

$$Power \propto \begin{bmatrix} 1 & \cdots & \prod_{k=1}^{N-2} m_{k} & \prod_{k=1}^{N-1} m_{k} \end{bmatrix} \cdot \begin{bmatrix} g_{m,1} \\ \vdots \\ g_{m,N-1} \\ g_{m,N} \end{bmatrix}$$
(12)

With equations (8), (11), and (12), the overall optimization can now be formulated as:

Minimize:<sup>4</sup>  $[1 \cdots \prod_{k=1}^{N-2} m_k \prod_{k=1}^{N-1} m_k] \cdot \overrightarrow{g_m}$ 

<sup>&</sup>lt;sup>4</sup> Note that each element in the sequence  $m_1, m_2 \dots m_N$  represents the branching factor of one sampler at each rank; their cumulative products are the total number of samplers for each rank.

Subject to:  
1. 
$$NF \cdot kT \begin{bmatrix} \frac{1}{\tau_1'} & \dots & \frac{1}{\tau_{N'}} \end{bmatrix} \cdot inv\_pos(\overline{g_m}) \leq N_B$$
  
2. 
$$\begin{bmatrix} \tau_1'' & -1/f_T & \dots & 0 \\ 0 & \tau_1'' & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \tau_{N-1}'' & -1/f_T \\ 0 & \dots & 0 & \tau_N'' \end{bmatrix} \cdot \overline{g_m} \geq \begin{bmatrix} 0 \\ \vdots \\ C_{SAR} \end{bmatrix}$$

where  $\overrightarrow{g_m} = \begin{bmatrix} g_{m,1} \\ \vdots \\ g_{m,N} \end{bmatrix} > 0$  are the variables to be optimized. Since the cost function and the

constraints are convex functions of  $\overrightarrow{g_m}$ , a convex optimization algorithm can be used **Error! Reference source not found.** Table I summarizes the design parameters in this sampling network (m<sub>1</sub>=1, m<sub>2</sub>=4, m<sub>3</sub>=8) obtained using the optimization method and the parameters from TSMC 65nm technology.<sup>5</sup>

| Rank | m      | CL   | g <sub>m</sub> Num. of |          | Sampler |  |
|------|--------|------|------------------------|----------|---------|--|
|      |        |      |                        | samplers | power   |  |
| 1    | 1      | 24fF | 11mA/V                 | 1        | 8mW     |  |
| 2    | 4      | 20fF | 7.5mA/V                | 1        | 8mW     |  |
| 3    | 8      | 63fF | 3.2mA/V                | 4        | 6.8mW   |  |
|      | 43.5mW |      |                        |          |         |  |

Table I. Results of convex optimization of sampler gm and sampling capacitors

# **V. Circuit Implementation**

For this prototype, a 12.8GS/s quarter ADC is designed and tested to demonstrate the proposed circuit and design techniques. In this section, we will describe the implementation of the circuit

<sup>&</sup>lt;sup>5</sup> To obtain accurate optimization results, various  $f_T$ , noise factor, and DC gain for each rank, as well as layout parasitics and the folded structure of Rank-3 are taken into account in actual optimization script.

building blocks used in the 12.8GS/s quarter ADC prototype, including the cascode samplers, the clock generation circuits, and the sub-ADCs.

# A. Sampler Circuit Design

As shown in Fig. 8a, the Rank-1 sampler is implemented using the cascode sampler structure described in Section II. Note that for a 12.8GS/s ADC driven by a terminated  $50\Omega$  transmission line (i.e., a  $25\Omega$  source), a single passive sampling switch (followed by the same sampling circuits used in this design) may have been sufficient to achieve 25GHz bandwidth. However, scaling such a passive sampler design to 51.2GS/s (i.e., 4X higher interleaving than the chip presented here) at this bandwidth would require full-swing and nearly square clock pulses <20ps wide. In contrast, a cascode sampler-based front-end would operate properly with 50% duty-cycle (i.e., ~40ps wide pulse) quadrature 12.8GHz clocks. Finally, a cascode sampler provides the aforementioned benefits of buffering – i.e., reduced input capacitance and kick-back mitigation.



Fig. 8. Schematic of (a) Rank-1 sampler (b) Rank-2 sampler and (c) Rank-3 sampler

Examining the cascode sampler circuit itself, M<sub>0-4</sub> simply comprise a differential version of the circuit, where device M<sub>0</sub> acts as tail current source. In order to maintain a constant input capacitance and minimize glitches on the supply node, the tail current of M<sub>1,2</sub> is steered away to un-used branches M<sub>7-10</sub> when  $\Phi_1$  is low. Note that if the sampling rate was doubled to 26.4GS/s, M<sub>7-10</sub> would be utilized as a sampler operating on  $\overline{\Phi_1}$ . In order to keep M<sub>0-4</sub> in saturation, the clocks connected to the gates of M<sub>3,4</sub> and M<sub>7,8</sub> are level-shifted to achieve peak voltages above V<sub>dd</sub> through AC-coupling capacitors; note however that the |V<sub>gs</sub>| and |V<sub>ds</sub>| of those devices is kept below V<sub>dd</sub> at all times for reliable operation.

The Rank-2 and Rank-3 samplers are shown in Fig. 8b and c. The Rank-2 de-multiplexor is implemented similarly to Rank-1, where the four cascode branches are turned on successively in order to steer signal currents into one of the de-multiplexed branches corresponding to  $V_{op}<0:3>$  and  $V_{on}<0:3>$ . In order to avoid over-ranging the sub-ADCs due to process-induced gain variations, M<sub>0</sub> was added to enable coarse foreground DC gain adjustment for the entire sampler chain.<sup>6</sup> The sub-ADCs require a low input common-mode voltage, and hence unlike the telescopic Rank-1 and Rank-2 designs, the Rank-3 design utilizes the folded-cascode structure shown in Fig. 8c.

Just like the conventional sampler circuit, the "top plate sampling" used by the cascode sampler circuit is prone to signal-dependent noise. We must therefore analyze issues such as charge injection and clock/signal feed-through to ensure that these effects do not excessively degrade the SNDR. In the conventional sampling circuit (without bootstraping), inversion charge in the sampling switch can flow to the sampling node when the switch is opened, causing signal-dependent voltage errors [11]. Fortunately however, to first order, the cascode sampler does not suffer from this issue. Since the cascode device M<sub>3</sub> is in saturation during the track phase (Fig. 9a), its channel is "pinched off" at the drain nodes (output nodes), and thus the majority of the inversion charge will inject into the source node [12].

<sup>&</sup>lt;sup>6</sup> Although foreground adjustment is used in the design for simplicity, a background calibration scheme can be easily implemented by continuously forcing all the sub-ADCs within a threshold range (e.g., between 1 and  $2^{B}$ -1).



Fig. 9. Half circuit illustration of (a) charge injection, (b) clock feed-through, and (c) signal feed-through issues in cascode sampler



Fig. 10. Layout of cascode device (M<sub>2</sub>) with (a) large drain-source capacitance and (b) with minimal drain-source capacitance

The only remaining potential source of charge injection error in the cascode sampler circuit is from the triode PMOS loads. Although some of the signal-dependent inversion charge in these devices will transfer to the output when they are turned off, this effect does not necessarily degrade SNDR. Specifically, the linearly dependent inversion charge merely causes a gain error; it is only the non-linearly dependent inversion charge that gives rise to distortion. Fortunately, as verified by SPICE simulations, this non-linearly dependent portion of the inversion charge in the PMOS loads is not significant enough to be a concern for the target 5-ENOB design.

Another potential source of error in the cascode sampler circuit is clock feed-through (Fig. 9b). Due to the coupling from the sampling clocks to the outputs through  $C_{gd}$  right after the transistors (M<sub>2,3</sub>) stop conducting, a small change in the sampled output voltage occurs:  $\Delta V_o = (-V_{ov2}C_{gd2} + V_{t,p}C_{gd3})/C_L$ .<sup>7</sup> Similarly to charge injection, the non-linear dependence of  $\Delta V_o$  on the input voltage (mostly from  $V_{ov2}$ ) does not lead to significant distortion at the 5-bit level that is the target for our design. This result was once again verified with SPICE simulations.

A final potential limitation in the cascode sampler circuit is signal feed-through in the hold mode due to capacitive coupling through  $C_{ds2}$  (Fig. 9c), where the magnitude of this error is proportional to  $\frac{C_{ds2}}{C_L}$ . This is a well-known issue for time-interleaved ADCs with top-plate sampling structures. A common solution is to cancel the feed-through by adding dummy transistors that cross-couple the source nodes of the cascode devices and the output nodes [8]. However, the parasitic capacitance added by the dummy transistors in this approach can significantly reduce the speed and/or power-efficiency of the sampler. To mitigate the effect of signal feed-through without sacrificing speed, we can instead reduce  $C_{ds}$  by appropriately laying out the devices. In particular, instead of the typical layout shown in Fig. 10a, one can minimize the overlap between the source/drain contacting regions as shown in Fig. 10b. This layout strategy does increase the contact resistance to the source/drain, but in this design/process the resulting effect on the bandwidth of the buffer was negligible. Post-layout simulations indicate that this layout technique achieves a more than 10X reduction of  $C_{ds}$ .

 $<sup>^{7}</sup>$  V<sub>t,p</sub> is the threshold of PMOS transistors; V<sub>ov2</sub> is the overdrive voltage of M<sub>2</sub>.

# B. Clock generation

As shown in Fig. 2, in this design the ADC input signal is first sampled by the Rank-1 sampler, and the remaining Rank-2 and Rank-3 samplers simply function as one-to-four and one-to-eight analog de-multiplexors that bring the sample rate down to that of the sub-ADCs. At the outputs of the hierarchical sampling network, the thirty-two time-interleaved analog samples are digitized by parallel sub-ADCs.

Fig. 3 illustrates a timing diagram of the sampling clocks associated with this system. Using two external 12.8GHz clocks in quadrature phase, all of the clocks used for the sampler and sub-ADCs are generated on-chip from by six digital frequency dividers (FD) and five phase interpolators (PI). The sampling clock for the Rank-1 sampler ( $\Phi_1$ ) has a frequency of 12.8GHz and 50% duty-cycle. Since  $\Phi_1$  is jitter-critical, it is directly tapped from the external clock with only a few inverters in between as buffers. The outputs of the Rank-2 sampler are time-interleaved at 3.2GS/s, so  $\Phi_2$ (0:3) are 3.2GHz clocks in quadrature phase. As shown in Fig. 11, a standard ÷4 divider, FD<sub>1A</sub>, is used to generate these clocks. Note that  $\Phi_2$ (0:3) are non-overlapping (25% duty-cycle) so that the Rank-2 sampler drives only one load capacitor at any time. Because the succeeding PI<sub>2</sub>(0:3) require their four inputs to be have sufficiently overlapping pulses [13], FD<sub>1B</sub> is added to generate these 3.2GHz clocks with 50% duty-cycle. Similarly, the four ÷8 dividers (FD<sub>2</sub>(0:3)) generate the sampling clocks for the Rank-3 samplers, which are eight octal 400MHz clocks with 12.5% duty-cycle.



Fig. 11. Schematic of (a) frequency divider  $FD_{1A}$  (b) TSPC Flip-flop used in the divider

Note that in order to maximize the tracking time for a sampler, the falling edges of its sampling clock should occur right before the rising edges of the sampling clock at its preceding rank (i.e. the falling edge of  $\Phi_2(0)$  should be aligned to rising edge of  $\Phi_1$ ). The alignment of sampling clocks across different ranks is accomplished using phase interpolators. Specifically, PI<sub>1</sub> aligns  $\Phi_2(0:3)$  to  $\Phi_1$ , and PI<sub>2</sub>(0:3) aligns  $\Phi_{3A-D}(0:8)$  to  $\Phi_2(0:3)$ . Note that the requirements on the PI's resolution and jitter are relatively relaxed because they control only the "re-sampling" clocks for Rank-2 and Rank-3. Finally, the outputs of PI<sub>2</sub>(0:3) also serve as the 3.2GHz bit-cycling clocks for the SAR sub-ADCs,  $\Phi_{BC,A-D}$ .



Fig. 12. Schematic of phase interpolator  $PI_1$ 

The schematic of PI<sub>1</sub> is shown in Fig. 12. Similarly to the design presented in [13] and [14], it uses resistor loads and the four current branches have separate 5-bit current DACs to implement  $360^{\circ}$  phase tuning width 7-bit resolution. 4-bit tunable capacitor loads are added to make sure that the transitions are smooth enough over process variations to ensure proper phase interpolation. The differential outputs of PI<sub>1</sub> are converted to a single-ended output by a single stage differential-to-single-ended amplifier. The swing of the final output ( $\Phi_{out}$ ) is recovered to digital levels by an AC-coupled inverter. The PI<sub>2</sub>'s are implemented in a similar fashion.

# C. Sub-ADC

In ADC designs with a high degree of interleaving, both the power and the area consumed by each sub-ADC must be carefully considered. The importance of sub-ADC power is perhaps self-evident, but sub-ADC area can be equally important since a large sub-ADC implies longer wiring to route the inputs and clocks. These long wires can lead to significant parasitic loading and hence substantially increased sampler/clock distribution power. Thus, due to its energy-efficiency and relative compactness, for this design we have chosen a SAR-based sub-ADC.





Fig. 13. (a) Sub-ADC schematic (b) Schematic of comparator (c) timing diagram of sub-ADC

Fig. 13a highlights the 7-bit 400MS/s synchronous SAR sub-ADC design. Extra bits beyond the target ENOB were included to enable digital calibration of cross-channel gain and offset. As mentioned earlier, the 3.2GHz output clocks from PI<sub>2</sub>(0:3) are used directly as the bit-cycling clock ( $\Phi_{BC}$ ). Each sample conversion takes eight cycles of the 3.2GHz clock. The sub-ADC uses seven comparison cycles, leaving one cycle for the Rank-3 sampler to settle the sub-ADC input. Since our target resolution is moderate, the capacitor matching constraints are relatively relaxed, enabling small unit capacitors (1fF) and reduced loading (63fF) for the Rank-3 samplers.

To further save power and area, we also employed a single-ended DAC switching technique. During the tracking phase, the positive input  $(V_{ip})$  is sampled on a 6-bit capacitor DAC while the negative input ( $V_{in}$ ) is sampled onto a matched dummy capacitor array. As shown in Fig. 13c, during bit cycling,  $V_{ip}$  is forced to converge to  $V_{in}$  in a binary fashion, while  $V_{in}$  stays constant. Compared to a typical differential SAR ADC, this technique saves half of the DAC switches and greatly reduces the routing and fan-out of the SAR logic.

One drawback of single-ended switching is that the input common-mode of the comparator does not remain constant during bit cycling. The comparator must therefore be designed so that its offset has minimal dependence on the input common-mode [15]. This is achieved using the circuit shown in Fig. 13b; as in [15],  $M_0$  is added as a current source to make the drain currents of  $M_{1,2}$  relatively independent of input common-mode. This comparator design does not provide full swing outputs, and thus skewed inverters are added to recover to full digital levels. In total, each sub-ADC slice consumes 1.14mW at 400MS/s and occupies  $22\mu m \times 81\mu m$  of die area.

#### **VI. Measurement Results**

The prototype quarter ADC was implemented in a 65nm GP CMOS technology, and a die-photo of the test-chip is shown in Fig. 14. Note that the Rank-1 sampler and clock input are placed at the corner of the quarter ADC core to enable extension to 51.2GS/s full rate ADC without modification on the layout. In particular, one can simply copy and flip the entire core layout three times to generate 51.2GS/s full ADC layout. Due to utilization of the proposed cascode sampler design, the input capacitance of the ADC is <25fF. The 12.8GHz clock inputs are provided by an external signal generator and the ADC inputs are provided through a 0-40GHz differential RF-probe. For all of the following measurements, an Agilent E8257D PSG with ~20fs rms jitter was used as the clock source. Note however that the jitter contributed by the four on-chip inverters that buffer the clock is ~100fs (based on worst-case post-layout simulations), and thus clock jitter contributes a little less than 50% of the noise budget for 5-ENOB at 25GHz.



Fig. 14. Die-photo

As a first step, DC characterization of the ADC was performed. Fig. 15a shows the DC transfer characteristics of all thirty-two sub-ADCs before calibration, while Figs. 15b and c plot their DNL/INL curves. The large DNL errors (-1~3) are measured with the 7-bit raw codes, so they do not degrade SNDR for the target 5-ENOB design. As represented by the width of the DC transfer curves, the peak cross-channel offset mismatch is 57mV, and the peak gain mismatch is  $\pm$ 8%. The offset is dominated by the comparator offsets of sub-ADCs, and gain mismatch is caused by mismatches of the load transistors in the Rank-3 samplers. Due to these mismatches, the ADC input differential swing is reduced to 335mV (vs. the 400mV nominal target value) to avoid clipping the output of any sub-ADCs. For subsequent single-tone tests, a slightly lower differential V<sub>pp</sub> of 300mV was used. Note that in future designs, such reduction in signal swing (and hence SNR) can be elegantly eliminated by utilizing analog domain per-channel offset and gain correction [21]; this approach would also obviate the need for (and overhead of) additional digital correction hardware.



Fig. 15. DC test results of 32 sub-ADCs: (a) DC transfer characteristics (b) DNL (c) INL

To generate fully differential high frequency input signals for single-tone tests, two signal generators were locked in both frequency and phase. Cable losses as well as cable length mismatches were calibrated out at each input frequency. To produce a 4096-pt FFT and calculate SNDR, the input frequency was set to  $N/4096 \cdot f_s$ , where N is an odd number. The ADC outputs were then subsampled at ~39kHz and read out to a PC through shift registers. Foreground digital calibration of cross-channel gain and offset mismatches was performed off-chip using a pilot input tone of ~3.1GHz; no non-linear correction was used. Fig. 16a and b show the spectrum of the ADC output for an input frequency of ~3.1GHz before and after gain/offset calibration. Calibration eliminates most of the inter-modulation tones caused by mismatches. With calibration, the SFDR increases to 32.4dBc – limited by third order distortion, while the second order distortion is -41.6dBc. The remaining intermodulation tones due to residual cross-channel gain mismatch are well below -40dBc.

The ADC spectrum for a 25GHz input tone after gain/offset calibration (performed at ~3.1GHz) is shown in Fig. 16c. In this case, SFDR is limited by second-order distortion; we suspect this stronger second-order distortion is due to the phase imbalance of the differential inputs. As shown in Fig. 17, the prototype achieves 29.5dB SNDR at low input frequencies and 26.4dB at 25GHz. The SNDR remains relatively flat and above 26dB over the entire 25GHz bandwidth.

The total ADC power is 162mW excluding digital I/O and initial clock buffers necessary only to interface with external instruments. As shown in Fig. 18, the entire sampling network consumes 43.5mW, the sub-ADCs consume a total of 36.4mW, and clock generation and distribution consumes 81.2mW. Table II highlights the performance of this work in comparison with other state-of-art high-speed ADCs. The FOM is 0.74pJ/conv-step, which is comparable to other designs with similar speed and technology [16], [17] while achieving the highest -3dB ERBW

published to date. Note that although [21] achieves much lower FOM than this work, it utilizes a significantly more advanced – and more importantly in this application, higher  $f_T$  – process technology at a noticeably lower sample rate and substantially lower ERBW than this work.



Fig. 16. ADC output spectrum for input frequency of (a) 3.12GHz before calibration (b) after calibration, and (c) 25GHz after calibration



Fig. 17. Input frequency vs. SNDR



Fig. 18. ADC power breakdown

| Table I. | Results | comparison |
|----------|---------|------------|
|----------|---------|------------|

|                         | [3]  | [16] | [17] | [5]  | [21]        | This work |
|-------------------------|------|------|------|------|-------------|-----------|
| Technology              | 65nm | 65nm | 40nm | 40nm | 32nm SOI    | 65nm      |
|                         | CMOS | CMOS | CMOS | CMOS | CMOS        | CMOS      |
| $f_{\rm s}({\rm GS/s})$ | 40   | 12   | 10.3 | 25   | 8.8/10      | 12.8      |
| BW (GHz)                | 18   | 6    | 5    | 9    | 4.4/5       | 25        |
| SNDR @ BW (dB)          | 25.1 | 25.1 | 33   | 25.8 | 37          | 26.4      |
| Power (mW)              | 1500 | 81   | 240  | 500  | 35/49       | 162       |
| FOM (pJ/c-s)            | 2.5  | 0.46 | 0.7  | 1.25 | 0.058/0.071 | 0.74      |

#### **VII.** Conclusion and Discussion

In this work, we have developed design techniques for 6b >50GS/s ADCs. The prototype quarter ADC in 65nm CMOS has demonstrated highest effective resolution bandwidth (25GHz) published to date while retaining competitive energy-efficiency/FOM (0.74pJ/conv-step). These results were enabled by the combination of a hierarchical sampling architecture, a power- and area-efficient sub-ADC design, and most importantly, a newly proposed cascode sampler circuit that overcomes the bandwidth limitations introduced by the series resistance of conventional switch-based sampler architectures. The prototype quarter ADC has been specifically optimized to the ADC design to be straight forwardly extended to support 51.2GS/s with additional sub-ADCs & DEMUXs.

To further place these results into the appropriate context, note the Rank-1 sampler in this design already includes the requisite additional current branch, as mentioned previously, increasing the interleaving by a factor of two (to 25.6GS/s) would have no effect whatsoever on the analog bandwidth of the front-end. Increasing the interleaving by another factor of two (to 51.2GS/s full rate) as shown in Fig. 19 would require an additional Rank-1 cascode sampler. This sampler should be clocked in quadrature phase relative the original sampler, but as also highlighted earlier, can still utilize a 50% duty cycle 12.8GHz clock (i.e., the width of the clock pulses does not need to be shrunk, unlike a design with passive front-end sampling).



Fig 19. Illustration of additional circuitry required to extend the design to 51.2GS/s

In comparison to the chip presented here, the addition of an extra Rank-1 sampler has two principle effects on the bandwidth and functionality required by the design. First, the input capacitance of the ADC would be doubled to 50fF. Fortunately however, with a 25 $\Omega$  equivalent source impedance (which would with no other capacitive loading, would result in the pole at the ADC's input being at >127GHz), the degradation in signal bandwidth would be minimal. In fact, since the simulated (post-layout) bandwidth mismatch between two Rank-1 circuits has a  $\sigma$  of ~1.2%, even if left un-calibrated, the variations in input amplitude induced by such mismatch would also have minimal effect on ERBW at the target SNDR of this design.<sup>8</sup>

<sup>&</sup>lt;sup>8</sup> Additional sub-ADCs and DEMUX circuitry would of course introduce additional gain and offset mismatch between the channels as well. As mentioned earlier, it would therefore be highly desirable for such a design to cancel these mismatches directly in the analog domain [21].

A more important effect related to bandwidth mismatch is that the group delay of the two Rank-1 samplers may no longer match. Fortunately, this effect can be corrected by adjusting the timing of the "quadrature" clock fed to the additional Rank-1 sampler; such correction would likely have been required in any case to deal with delay mismatches in the independent clock buffer chains. There are many possible methods to introduce the necessary timing correction; in our specific design, a convenient method would be to simply utilize an additional 12.8GHz phase interpolator. The 12.8GHz phase interpolator in this design has a measured power consumption of ~2mW, and even after accounting for the increase in power that would be required to maintain the same total jitter (~100fs) in the clock buffer chain and achieve higher tuning accuracy,<sup>9</sup> would introduce relatively minimal power overhead compared against the power of the entire 51.2GS/s ADC.

# Acknowledgement

The authors would like to thank the sponsors, faculty, staff, and students of Berkeley Wireless Research Center for support, and the TSMC University Shuttle Program for chip fabrication.

<sup>&</sup>lt;sup>9</sup> Increasing the tuning accuracy in this phase-interpolator based design requires only increasing the resolution of the (statically programmed) current DACs, which does not directly lead to higher power consumption.

## Reference

- M. Harwood, et al., "A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery", *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 436-591, Feb. 2007.
- [2] B. Zhang, et al., "A 195mW / 55mW dual-path receiver AFE for multistandard 8.5-to-11.5 Gb/s serial links in 40nm CMOS", *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 33-35, Feb. 2013.
- [3] Y. Greshishchev, et al., "A 40GS/s 6b ADC in 65nm CMOS", IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 390-391, Feb. 2010.
- [4] I. Dedic, "56GS/s ADC Enabling 100GbE", Proc. Opt. Fiber Commun. Conf. (OFC), pp. 1-3, Mar. 2010
- [5] K. Poulton, et al., "A 20 GS/s 8b ADC with a 1MB memory in 0.18 um CMOS", IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 318-319, Feb. 2003.
- [6] C-C Huang, C-Y Wang, and J-T Wu, "A CMOS 6-Bit 16-GS/s time-interleaved ADC with digital background calibration", *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, pp. 159-160, Jun. 2010.
- [7] S. Gupta, et al., "A 1GS/s 11b Time-Interleaved ADC in 0.13um CMOS", IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 2360-2369, Feb. 2006.
- [8] K. Doris, at al., "A 480 mW 2.6 GS/s 10b Time-Interleaved ADC With 48.5 dB SNDR up to Nyquist in 65 nm CMOS", *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 2821-2833, Dec. 2011.
- [9] A.M. Abo and P.R. Gray, "A 1.5V, 10-it, 14.3-MS/s CMOS pipeline analog-to-digital converter", *IEEE J. Solid-State Circuit*, vol. 34, no. 5, pp. 599-606, May 1999.
- [10] Y. Duan and E. Alon, "A 12.8GS/s Time-Interleaved SAR ADC with 25GHz 3dB ERBW and 4.6b ENOB", *Proc. IEEE Custom Integrated Circuits Conf.*, pp. 1-4, Sept. 2013.
- [11] G. Wegmann, E.A. Vittoz, and F. Rahali, "Charge injection in analog MOS switches", *IEEE J. Solid-State Circuits*, vol. 22, no. 6, pp. 1091-1097, Dec. 1987.
- [12] L. Dai and R. Harjani, "CMOS switched-op-amp-based sample-and-hold circuit", *IEEE J. Solid-State Circuits*, vol. 35, no. 1, pp. 109-113, Jan. 2000.
- [13] S. Sidiropoulos and M.A. Horowitz, "A semidigital dual delay-locked loop", *IEEE J. Solid-State Circuits*, vol. 32, no. 11, pp. 1683-1692, Nov. 1997.
- [14] C. Thakkar, et al., "A 10 Gb/s 45 mW Adaptive 60 GHz Baseband in 65 nm CMOS", *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 952-968, Mar. 2009.
- [15] C-C. Liu, et al., "A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure", *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 731-740, Apr. 2010.

- [16] M. El-Chammas and B. Murmann, "A 12-GS/s 81-mW 5-Bit time-interleaved flash ADC with background timing skew calibration", *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, pp. 157-158, Jun. 2010.
- [17] S. Verma, et al., "A 10.3GS/s 6b flash ADC for 10G Ethernet applications", *IEEE Int. Solid-State Circuits Conf., Dig. Tech. Papers*, pp. 462-463, Feb. 2013.
- [18] W. Cheng, et al., "A 3b 20GS/s ADC-DAC in 0.12um SiGe", IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, vol. 1, pp. 262-263, Feb. 2004.
- [19] S. Boyd and L. Vandenberghe, "Convex Optimization, 7<sup>th</sup> edition", Cambridge, UK, C.U.P., 2009, ch. 1, sec. 3, pp. 7-8
- [20] D. Crivelli, et al., "A 40nm CMOS single-chip 50Gb/s DP-QPSK/BPSK transceiver with electronic dispersion compensation for coherent optical channels", *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, pp. 328-330, Feb. 2012.
- [21] L. Kull, et al., "A 32mW 8 b 8.8 GS/s SAR ADC with low-power capacitive reference buffers in 32nm Digital SOI CMOS" *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, pp. C260-C261, Jun. 2013.