# Spectral Purification Techniques for Clock Generation Circuits



Yi-An Li

Electrical Engineering and Computer Sciences University of California, Berkeley

Technical Report No. UCB/EECS-2022-238 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-238.html

December 1, 2022

Copyright © 2022, by the author(s).

All rights reserved.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.

#### Spectral Purification Techniques for Clock Generation Circuits

by

#### Yi-An Li

A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

in

Engineering - Electrical Engineering and Computer Sciences

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Ali M. Niknejad, Chair Professor Elad Alon Professor Martin White

Summer 2020

|       | sertation of Yi-An, is approved: | Li, titled | l Spectral | Purification | Techniques | for  | Clock | Generat | tion |
|-------|----------------------------------|------------|------------|--------------|------------|------|-------|---------|------|
|       |                                  |            |            |              |            |      |       |         |      |
|       |                                  |            |            |              |            |      |       |         |      |
|       |                                  |            |            |              |            |      |       |         |      |
|       |                                  |            |            |              |            |      |       |         |      |
|       |                                  |            |            |              |            |      |       |         |      |
| Chair |                                  |            |            |              | Dat        | e .  |       |         |      |
|       |                                  |            |            |              | Dat        | e .  |       |         |      |
|       |                                  |            |            |              | Dat        | ie . |       |         |      |
|       |                                  |            |            |              | _          |      |       |         |      |

University of California, Berkeley

## Spectral Purification Techniques for Clock Generation Circuits

Copyright 2020 by Yi-An Li

#### Abstract

Spectral Purification Techniques for Clock Generation Circuits

by

#### Yi-An Li

Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences
University of California, Berkeley
Professor Ali M. Niknejad, Chair

Clock generation circuits are essential building blocks that need to provide precision timing and frequency references for the whole system. Spur and phase noise are two of the most critical impairments of clock spectral purity that ultimately limit the performance of a communication system such as spectral emission, error-vector-magnitude (EVM) for transmitters and the blocker tolerance for receivers. Therefore, it would be worthwhile to develop a post-processing module cascaded after the clock source and perform the spectral purification to recover and even boost the performance.

Two techniques for spur and phase noise cancellation have been proposed. With analog-signal-processing by delay lines we can synthesis the desired shape of the transfer function for spectral filtering, such as notches to reject far-out spurs, and high-pass filtering to suppress close-in phase noise. A fully integrated design achieves a measured spur cancellation of 15dB at 250MHz and 750MHz offset as well as phase noise cancellation from 4MHz to 200MHz offset with maximum 25-dB cancellation depth for a 1-GHz clock. The proposed ideas have been verified through a fabricated 65-nm CMOS prototype with power consumption of 11mW from a supply voltage of 1.2V.

Furthermore, we will demonstrate a novel clock multiplier architecture that achieves low jitter and also insensitive to frequency drift without a continuous frequency tracking loop (FTL). With the proposed digital spur calibration techniques, the spurs can be effectively suppressed down to -50.9dBc. Fabricated in 28-nm CMOS technology, this prototype presents an integrated jitter of  $138fs_{rms}$  while consuming only 6.5mW from a 1-V/0.8-V supplies and achieving -249dB FoM. A detailed study on the mechanisms of jitter performance affected by frequency drift is included, which provides a theoretical justification to the approach. Also, the time domain/frequency domain analysis on digital spur calibration are discussed as well. Finally, an improved version with lower power consumption and generalized multiplication ratio is also realized in a test chip in 28-nm CMOS technology.

# Contents

| C             | onter                                  | nts                                                                                                                                         | i                                      |
|---------------|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
| $\mathbf{Li}$ | $\mathbf{st}$ of                       | Figures                                                                                                                                     | iii                                    |
| Li            | $\mathbf{st}$ of                       | Tables                                                                                                                                      | vi                                     |
| 1             | INT<br>1.1<br>1.2                      | Spectral Purity                                                                                                                             | 1<br>1<br>4                            |
| 2             | On-<br>2.1<br>2.2<br>2.3<br>2.4<br>2.5 | chip Phase Noise Cancellation Techniques  Motivations Proposed Idea Non-ideality Considerations Circuit Implementation Experimental Results | 6<br>7<br>9<br>11<br>14                |
| 3             |                                        | PVT-Insensitive and Low-Jitter Clock Multiplier with Digital Spur ibration  Introduction                                                    | 17<br>17<br>19<br>21<br>23<br>27<br>33 |
| 4             | Imp<br>4.1<br>4.2<br>4.3<br>4.4        | Architecture Rethinking                                                                                                                     | 38<br>38<br>38<br>40<br>42             |

| •   | • |
|-----|---|
| 1   | 1 |
| - 1 | 1 |

| 5  | Conclusions5.1 Summary of Contributions5.2 Future Work |           |
|----|--------------------------------------------------------|-----------|
| Bi | ibliography                                            | <b>52</b> |

# List of Figures

| 1.1<br>1.2<br>1.3 | Clock generation circuits in (a) wireline, and (b) wireless transceivers [1][2]  The clock spectrum of an ideal and real one [3]                                                    | 3           |
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|
| 1.4<br>1.5        | [4]                                                                                                                                                                                 | 3<br>4<br>5 |
| 2.1<br>2.2<br>2.3 | Spur and phase noise cancellation goals                                                                                                                                             | 6<br>7      |
| 2.4               | Phase noise transfer function with non-ideal effects took into account                                                                                                              | 10          |
| 2.5               | Overall architecture and testing setup                                                                                                                                              | 12          |
| 2.6               | Test chip micrograph                                                                                                                                                                | 13          |
| 2.7               | Phase noise is cancelled from 4MHz to 200MHz offset with maximum cancellation of 25dB on a 1-GHz clock                                                                              | 14          |
| 2.8               | Far-out spurs at 250MHz and 750MHz offset frequencies can be cancelled up to 15dB by the notches (top figure). And far-out noise can also be attenuated by notches (bottom figure). | 15          |
| 3.1               | (a) The simple architecture of an ILCM, and (b)(c) jitter and phase noise degradation due to $\omega_j$ shrink when the natural frequency drifts. [17]                              | 18          |
| 3.2               | Time, phasor, and PDR diagram of natural frequency aligned ( $\Delta f = 0$ ) and off-tune ( $\Delta f \neq 0$ ) cases                                                              | 19          |
| 3.3               | Jitter transfer of $S_{out}/S_{ref}$ (solid line) and $S_{out}/S_{vco}$ (dashed line) with different                                                                                |             |
|                   | $\beta = 0.01, 0.5, \text{ and } 1.$                                                                                                                                                | 20          |
| 3.4               | (a) A comparison between ILCM and LCCM. (b) PDRs of them and how phase                                                                                                              |             |
|                   | noise is affected by PDR slope.                                                                                                                                                     | 22          |
| 3.5               | The schematic of the LC-tank based clock multiplier                                                                                                                                 | 23          |
| 3.6               | Top level ideal of spur calibration                                                                                                                                                 | 24          |
| 3.7               | (a) The schematic of LCCM, (b) spur calibration flow, and (c) the period distri-                                                                                                    | o .         |
|                   | bution before/after calibration                                                                                                                                                     | 24          |

| 3.8  | Proposed architecture of LCCM with digital spur calibration circuity                                                                                                                  | 25 |
|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.9  | Modified falling edge generator with tunable duty cycle                                                                                                                               | 26 |
| 3.10 | Instantaneous period over time with calibration loop running                                                                                                                          | 27 |
| 3.11 | (a) The settling of delay coefficients $(d_0 \sim d_7)$ , (b) the coefficient profile when                                                                                            |    |
|      | calibrating the clock from LCCM, and (c) the coefficient profile when calibrating                                                                                                     |    |
|      | the clock that is sinusoidal modulated                                                                                                                                                | 28 |
| 3.12 | (a) The transfer functions of the blocks around the calibration loop, and (b) the                                                                                                     |    |
|      | overall loop transfer function                                                                                                                                                        | 29 |
| 3.13 | Digital N-path filter [29]                                                                                                                                                            | 29 |
| 3.14 | (a) Pole/zero map, (b) root-locus plot, and (c) the frequery resonse of $\beta(z)$ and $G(z)$                                                                                         | 32 |
| 3.15 |                                                                                                                                                                                       | 33 |
|      | · -                                                                                                                                                                                   | 34 |
|      | Measured phase noise plot that achieves an integrated jitter of $138fs_{rms}$ from 1k                                                                                                 | 34 |
| 2 18 | Measured jitter and spur (without/with spur re-cal.) over supply voltage and                                                                                                          | J- |
| 3.10 |                                                                                                                                                                                       | 35 |
| 3 10 |                                                                                                                                                                                       | 36 |
|      |                                                                                                                                                                                       | 36 |
| 0.20 | Test clip die photo                                                                                                                                                                   | 00 |
| 4.1  | Four major improvements in version $2:(1)$ power reduction; $(2)$ multiplication ratio $(N)$ extension; $(3)$ LMS settling ripple reduction; and $(4)$ extension to fractional- $N$ . | 39 |
| 4.2  | Power reduction (from 6.5mW to 2.9mW) by replacing the DTC with direct                                                                                                                |    |
|      | 1 0                                                                                                                                                                                   | 36 |
| 4.3  | The settling of LMS co-coefficients without (left) and with (right) the second differentiator.                                                                                        | 40 |
| 4.4  | A example showing $N=4$ with $T_0T_1T_2T_3=3454$ , which should ideally be equal-                                                                                                     |    |
|      | ized to 4 4 4 4. The old algorithm (left) wiggles around the optimum with repeated                                                                                                    |    |
|      |                                                                                                                                                                                       | 41 |
| 4.5  | Supporting higher multiplication $ratio(>Q)$ by selected pulse feedback on every                                                                                                      |    |
|      |                                                                                                                                                                                       | 43 |
| 4.6  |                                                                                                                                                                                       | 44 |
| 4.7  | A toy example of $N=2.75$ for "off-grid" direct injection (top) and "on-grid"                                                                                                         |    |
|      | v · · · · · · · · · · · · · · · · · · ·                                                                                                                                               | 44 |
| 4.8  | The details of the delay modulation block composed of a q-noise generator, an                                                                                                         |    |
|      | , 1                                                                                                                                                                                   | 45 |
| 4.9  | The timing diagram of three cases: under-compensated (top), optimum compen-                                                                                                           |    |
|      | sated (middle), and over-compensated (bottom). Note the relation of the $\Delta\phi[k]$                                                                                               |    |
|      | ( / mj / out-a/                                                                                                                                                                       | 47 |
| 4.10 |                                                                                                                                                                                       | 48 |
| 4.11 | Instantaneous frequencies during gain settling with initial $g_0 > g_{opt}$ and $g_0 < g_{opt}$                                                                                       | 48 |

| 4.12 | The spectrum of the 3-GHz output with $f_{ref} = 210.4109589$ MHz and $N = 1/4 +$     |    |
|------|---------------------------------------------------------------------------------------|----|
|      | 1/128. With the initial gain value, the large quantization noise can be clearly       |    |
|      | seen (shown in the gray curve). After the gain settle to $g_{opt}$ , the quantization |    |
|      | noise is cancelled out by the DTC effectively (shown in the blue curve)               | 49 |

49

# List of Tables

| 2.1 | Performance summary                | 16 |
|-----|------------------------------------|----|
| 3.1 | Comparison to the state-of-the-art | 3  |

#### Acknowledgments

First, I would like to express my highest gratitude to my advisor, Prof. Ali Niknejad, for his supporting and mentoring throughout my six-years research journey. I am really lucky to have the chance to work with Ali for learning not only on the state-of-the-art RF techniques but also on his humble and kind personality. I really appreciate Ali's advising style which gives me high flexibility to explore a variety of crazy ideas. Although some of them are too crazy to make them work eventually, but they are not totally wasted. Instead, some of those failure trails help me gaining lots of experiences and serve as foundation stones that even turn into new ideas later on. This won't happen without the patient and encouraging form Ali. Also, I would like to thank Prof. Elad Alon, Prof. Vladimir Stojanovic and Prof. Martin White for serving as my qualify exam and dissertation committees. Their insightful feedbacks inspired me a lot on my research works.

Secondly, it is really my honor to be part of Berkeley Wireless Research Center (BWRC), where I got nourished by many talented graduation students and scholars around the globe. Especially, I would like to thank Jun-Chau Chien, Ping-Chen Hunag and Nai-Chung Kuo for getting me familiar with the research environments and lab logistics since my first day in BWRC; Pi-Feng Chiu for teaching me the whole digital synthesis flow, which is most essential infrastructure to realize my digital calibration idea; Lorenzo Iotti for his generous experience sharing and handy layout tiles in 28-nm, which save us lots of time, without these I might have missed many tape-out opportunities; Bo Zhao, Sameet Ramakrishnan, Lucas Calderin, Nima Baniasadi, Ali Ameri, and Bonjern Yang for serving as the tape-out shuttle captains and burning the midnight oils with me against the nasty density errors; Luya Zhang, Angie Wang, Zongkai Wang, and Hao-Yen Tang for enormous technical discussions and funny gossips together. Furthermore, I would like to thank the staffs in BWRC: Brian Richards, James Dunn, and Candy Corpus for setting up all the wonderful research environment and holding a series of interesting events in the center.

Last but not least, I would like to thank to my parents in Taiwan for their encouragement on pursing the PhD degree. Countless phone calls with them are my mental supports on this long journey in abroad, which help me passing through all the setbacks encountered. Also, I would like to thank all my friends in Berkeley who enrich my life with the weekend bubble-tea hangouts, restaurant gathering and road trips all around the US ..., etc. All these are composed into cherished memories for my student life in Berkeley.

# Chapter 1

# INTRODUCTION

Clock generation circuits are essential building blocks that needs to provide precision timing and frequency references for both wireline and wireless transceiver as shown in Fig. 1.1. A wireline transceiver needs to sample the data stream at the correct time, such precision time stamps necessitate a low variance on each clock edge such that the data is not corrupted by the data from pre-cursor or post-cursor. On the other hand, what a wireless transceiver does is basically to down- or up-convert the data stream from/to the right frequency carriers, such precision carriers necessitates low variances on frequencies such that the data is not interrupt by the data from adjacent channels.

### 1.1 Spectral Purity

To qualify how good a clock is, we typically rely on analyzing its spectrum. Fig. 1.2 [3] show the clock spectrum of an ideal and a real one. Ideally, the clock spectrum should consist of only a pure single tone (or only comes with its harmonics), which will looked like a delta function in spectrum. In reality, however, the clock is corrupted by both flicker and thermal noise that results in a skirt-like phase noise and by interference causing spurious tones around it. Fig. 1.3(a) [4] shows the phase change due to a perturbation impulse, we can see that the phase changes indefinitely due to this perturbation, which corresponding to a impulse response of  $\phi_{out}(t) = u(t) * \delta(t)$ . By Laplace transform [7], we know that u(t) corresponds to 1/s in frequency domain. Therefore, when a white noise (comes from either the circuit itself or from electric/magnetic coupling) is injected into an oscillator, it will result in a phase noise spectrum with a shape of  $1/\omega^2$  (i.e. "skirt" shape) as shown in Fig. 1.3(b). By the same token, when an flicker noise (i.e with a power spectrum density proportional to 1/f) is injected, the spectrum will show up with a more steep slope of  $1/f^3$  in close-in part (Fig. 1.3(b)).

Here, we can take a deeper look on how the phase noise and spur affect the performance of wireless transceiver. Fig. 1.4 depicts how phase noise impacts the signal quality. By comparing the modulation constellation without and with phase noise, we can see that the



Figure 1.1: Clock generation circuits in (a) wireline, and (b) wireless transceivers [1][2].



Figure 1.2: The clock spectrum of an ideal and real one [3].



Figure 1.3: (a) Phase response by a perturbation impulse, and (b) typical phase noise profile [4].

phase noise rotates the constellations around since it causes phase uncertainties. As a result, the constellations deviate from their ideal positions with a much worse error vector magnitude (EVM) and therefore much worse signal quality. The spur can also be problematic. At transmit side as shown in Fig. 1.5(a), the spurs serve as parasitic carriers that results in unwanted signal leakages to other channels and violates the emission mask. At receive side, these parasitic carriers also down-converts the adjacent blockers to in-band and overwhelms the relatively faint wanted signal. Note that, same phenomenons shown in Fig. 1.5 also applies to phase noise, since we can treat the phase noise as a continuous of spurs.

The cases above are just two of the most notorious cases that illustrate how the bad spectral purity hits the system, there are actually even more and not covered here. Nevertheless, we can still see that the spectral purity of the clock source is crucial to the overall performance and worthy for investigation on spectral purification techniques.



Figure 1.4: Modulation constellations without (left) and with (right) phase noise [5].

### 1.2 Exploration Topics and Thesis Organization

In this thesis, we will go through three major exploration topics:

- 1. The first topic is a study of the delay-line-based spur and phase noise cancellation techniques. Here, the goal is the build a post-process module that can be cascaded after the clock source to do the spectral purification. In this study, we try to using the delay lines with feed-forward path to synthesis the "desired shape" of the phase transfer function, such as notches and high-pass filtering. We also verified these ideas through a test chip with the measurement result. The results is published in [8].
- 2. The second exploration is a little bit different. Here, we try a proposed novel clock multiplier that achieves excellent and robust jitter but with substantial spur level. In this chapter, we will first have a brief review on state-of-the-art clock multiplier architectures and we study on the mechanism on phase noise degradation by frequency drift. And then, we propose a novel digital spur calibration techniques that can effectively suppress the spurs down. The new clock multiplier and the spur calibration ideas that achieve low jitter and low spur are both silicon verified and published in [9].
- 3. And finally, an improved version of the the clock multiplier is presented. Here, we further improve its power efficiency with smarter architecture planning, refine the LMS algorithm with lower residual spur level and also further generalize its multiplication ratio (N) to higher N and fractional-N. The ideas are also implemented into a test





Figure 1.5: The issues caused by spurs on (a) transmitter, and (b) receiver [6].

chip. However, due to the COIVD-19 and the limited access to the lab, we only have simulation results by the time of this thesis is written. Nevertheless, we can still finish the measurements and hopefully publish the materials in the future.

# Chapter 2

# On-chip Phase Noise Cancellation Techniques

#### 2.1 Motivations

Spur and phase noise are two of the most critical specifications that can ultimately limit the performance of communication system. For example, spurs resulting from device mismatches in the LO generation circuit could cause a transmitter to fail its spectral mask requirement. On the receiver side, blocker induced reciprocal mixing of phase noise is the ultimate sensitivity limit, even for a perfectly linear receive chain. As a result, we tend to spare more power budget on clock sources to meet the worse case corners and scenarios, which might, in fact, rarely happen. Therefore, it would be beneficial to have a post-process module cascaded after the clock source and turned active only when needed. Our goals are to generate notches against far-out spurs, and to produce high-pass filtering on the phase



Figure 2.1: Spur and phase noise cancellation goals.



Figure 2.2: Delay-and-interpolate spur cancellation technique.

noise of the clock to suppress close-in phase noise, as conceptually shown in Fig. 2.1. In this way, we can relax the specifications of the clock source and achieve lower power design to potentially extend battery life considerably.

### 2.2 Proposed Idea

#### **Spur Cancellation**

To simplify illustrations, we can first consider a special case that a clock has spurs with offset frequency of half carrier frequency, as shown in Fig. 2.2. The spurs affect the clock by fluctuating clock edges periodically. Thus, if we can delay this clock by half of the jitter period (i.e. one clock period in this case), and interpolate it with the original clock, then the periodic jitter can be cancelled out. Such an operation in the time domain is an average on the past and present phase, which can be described in frequency domain as

$$\Phi_{\text{out}} = \frac{|\Phi_{in}(s)(1 + e^{-sT})|}{2} = |\Phi_{in}(s) \cdot \cos \pi f T|, \tag{2.1}$$

where  $\Phi_{in}(s)$  is the input phase noise and T is the delay time. We notice that this transfer function create notches at the offset frequencies of 1/2T, 3/2T..., etc. By programming the time delay, we can line up the notch frequencies on top of the spurs to reject them. Note that such rejection not only applies to spurs, but also applies to phase noise.



Figure 2.3: Evolutions of phase noise cancellation architecture: (a) basic design (b) with DLL to avoid VC from saturating, and (c) using series  $C_1$  and  $C_2$  to break the trade-off on loop capacitance value selection.

Although it is possible to push the notches into low offset frequency with large delay, for close-in phase noise cancellation, the notch bandwidth (which is proportional to 1/T) shrinks and therefore has little impact on the integrated phase noise. In addition, the noise accumulated by the long delay line also limits the notch depth. Nevertheless, it is still effective for far-out spurs that only require short delay. The issues above can be circumvented if we can change  $1 + e^{-sT}$  into  $1 - e^{-sT}$ . In this way, the transfer function becomes  $|\sin \pi f T|$ , which has its first notch locating at dc and is more suitable for close-in phase noise cancellation. To realize such a transfer function, we will introduce a delay line discriminator method in next section.

#### Phase Noise Cancellation

In order to perform phase noise cancellation, the first step is to extract the phase noise information from the clock source, and then apply it back to the original clock with opposite polarity [10]. The delay line discriminator is a good candidate to serve this function and is used widely in spectrum analyzers with high sensitivity for phase noise measurement [11]. When a clock,  $\cos(\omega t + \phi_{in}(t))$ , passes through a delay line of delay  $T_1$ , a frequency dependent phase shift is then applied on its spectrum. By comparing the phase difference between the two ends of delay line, the resulting signal at the output of the phase detector (PD) is given by

$$\Delta \Phi_{PD}(s) = \Phi_{in}(s)(1 - e^{-sT_1}) \approx \Phi_{in}(s)sT_1$$
 (2.2)

This signal is a differentiated version of input phase information and is down-converted into baseband. We can then recover such a signal by an integrator and then feed-forward it to modulate a voltage control delay line (VCDL) where the same clock is passing through. In this way, the baseband phase noise information can be up-converted back to the clock frequency to cancel out the original phase noise. The operations above can be realized in a simplified circuit shown in Fig. 2.3(a), which incorporates a VCDL and a PD for phase comparison, a charge pump (CP), a capacitor as an integrator, and another VCDL at the output for phase noise cancellation. In this circuit, the output phase noise can be expressed as

$$|\Phi_{out}(s)| \approx |-\Phi_{in}(s)sT_1\frac{K_pK_d}{sC} + \Phi_{in}(s)e^{-sT_2}|,$$
 (2.3)

where  $K_p$ ,  $T_2$ , and  $K_d$  are the gain of PD/CP, the delay of VCDL2, and the gain of VCDL2, respectively. When the cancellation condition

$$T_1 K_p K_d / C = 1 \tag{2.4}$$

is met, it can be further reduced to

$$|\Phi_{out}(s)| \approx |\Phi_{in}(s)(1 - e^{-sT_2})| = |\Phi_{in}(s) \cdot 2\sin(\pi f T_2)|$$
 (2.5)

In this way, the close-in phase noise can be cancelled out largely by the high-pass filtering on original phase noise. Note that, although (2.2) and (2.5) have the same form, they are totally different, since the signal in (2.2) is at the baseband, whereas the signal in (2.5) is at clock frequency.

### 2.3 Non-ideality Considerations

Although the result in (2.5) looks promising theoretically, there are some possible issues in circuit realization. Therefore, it is worthwhile to analyse how they impact the performance and how to work around them.

#### Mismatch

First, since we perform the cancellation in the analog domain, the cancellation depends on the device matching and would be limited by mismatches and variations over process, voltage, and temperature (PVT) variations. Therefore, the variations from  $T_1, K_p, K_d$ , and C will make (2.4) becomes



Figure 2.4: Phase noise transfer function with non-ideal effects took into account.

$$T_1 K_p K_d / C = 1 + \epsilon \tag{2.6}$$

and (2.5) becomes

$$|\Phi_{out}(s)| \approx |\Phi_{in}(s)| \cdot |\epsilon + 2\sin(\pi f T_2)|$$
 (2.7)

, where  $\epsilon$  is the mismatch (between  $T_1K_pK_d/C$  and 1) due to the device variations and it will limit the cancellation depth to only  $20\log(|\epsilon|)$ . Fortunately, for >20dB cancellation depth (Fig. 2.4),  $\epsilon$  only needs to be controlled within 10%, which is not difficult to achieve with careful design and layout. Furthermore, calibration knobs can be added, if needed, to track over PVT.

#### DC Balancing Loop

Second, due to the nature of integrator, any small constant phase offset will finally pump the control voltage  $(V_C)$  into saturation. Since we already have PD, CP, and VCDL, we can mitigate this issue by looping them into a delay-locked loop (DLL), as shown in Fig. 2.5(b), such that  $V_C$  would settle to a proper voltage level by the feedback and make  $T_1$  to be well-defined by the loop. Rather than introduce another CP and variable-gain amplifier (VGA) [12] for the dc balancing loop, this sharing method shows zero power and minimum area penalty.

It is worth taking a closer look on the transfer function of a DLL:

$$\left| \frac{\Phi_{DLL,out}(s)}{\Phi_{in}(s)} \right| = \frac{1 + \frac{s}{\omega_{DLL}} \cdot e^{-sT_1}}{1 + \frac{s}{\omega_{DLL}}}$$

$$\approx \begin{cases} 1 & , \omega \ll \omega_{DLL} \\ e^{-sT_1} & , \omega \gg \omega_{DLL} \end{cases}$$
(2.8)

, where  $\Phi_{DLL,out}(s)$  is the phase at the output of the delay line (VCDL1 in Fig. 2.3), and  $\omega_{DLL}$  is loop bandwidth of the DLL. At the frequency much lower than  $\omega_{DLL}$ , (2.8) becomes almost 1, since the DLL locks the output phase to match the input. On the other hand, at the frequency much higher than  $\omega_{DLL}$ , (2.8) becomes  $e^{-sT_1}$ , which degenerates to simply a delay line without being affected by the loop. To our circuit, according to (2.8), the DLL wipes out low frequency phase information within its bandwidth  $\omega_{DLL}$  (since  $\Delta \phi = \phi_{DLL,out} - \phi_{in} = 0$ ) and makes the phase noise cancellation at low offset frequency malfunction, as can be shown by the green curve in Fig. 2.4. Such an impact might be less critical within a PLL, where the close-in phase noise will be cleaned by the phase noise of external reference clock. Nevertheless, we still prefer to keep the DLL bandwidth as small as possible in this design.

### 2.4 Circuit Implementation

According to the analysis above, we have to meet the cancellation condition and to keep low DLL bandwidth simultaneously. However, it would lead to a trade-off on capacitance value. To break this trade-off, we proposed a new loop filter composed of series capacitors  $C_1$  and  $C_2$ , shown in Fig. 2.3(c). In this way, the DLL loop sees a large  $C_2$  to achieve small bandwidth, and the feed forward path sees a capacitance of  $C_1C_2/(C_1+C_2)$  that can still meet the cancellation conditions by this extra freedom. Also, in order to define the dc



Figure 2.5: Overall architecture and testing setup.



Figure 2.6: Test chip micrograph.

voltage and compensate any possible leakage with minimum effect on the transfer function, a tune-able large resistor  $R_1$  (from 100k to 1M $\Omega$ ) is connected in shunt with  $C_1$ .

The final architecture is shown in Fig. 2.5, which combines both spur and phase noise cancellation techniques with a shared delay line. The phase interpolator (PI) for spur cancellation can be easily implemented by directly shorting the two inverters' outputs that are driven by the clocks to be interpolated, since the DLL has aligned them nominally. Note that in this test chip, the output of the two techniques are separated for testing purpose, and can be implemented in cascade by feeding Vout1 into the input of VCDL2. The delay line of VCDL1 can cover the delay range from 1.25nsec to 2.75nsec by 6-bit capacitor banks switching and  $\pm$  70psec by varactor tuning. To minimize possible leakage on the internal node between C1 and C2, we use thick oxide varactor in VCDL1. The resulting low tuning sensitivity is not an issue, since small varactor gain is desired to minimize DLL bandwidth. The output VCDL2 is also made of the inverter chain loaded by normal varactors, but with opposite varactor polarity and much larger tuning gain to achieve cancellation. In addition to DLL bandwidth, the output resistance of CP also limit the cancellation at low offset frequency. Therefore, long channels are selected for the  $M_1/M_2$  in CP to keep high output resistance, and  $M_3/M_4$  are added to help driving the extra capacitance loading (Fig. 2.5).



Figure 2.7: Phase noise is cancelled from 4MHz to 200MHz offset with maximum cancellation of 25dB on a 1-GHz clock.

### 2.5 Experimental Results

This chip has been fabricated in 65nm CMOS technology, which occupies  $0.3 \times 0.25 mm^2$  core area (Fig. 2.6). The circuit consumes a total power of 11mW, of which 5mW dissipates in the delay line, 2mW in the PD/CP, and 4mW in the output VCDL. Fig. 2.5 shows the measurement setup. The spur and white noise waveforms are generated externally by 33600A waveform generator and then modulate the control line of a 1-GHz testing voltage-controlled oscillator (VCO) on the chip to mimic a low power VCO with relaxed performance. The phase noise cancellation with delay-line-discriminator method is demonstrated in Fig. 2.7. The cancellation applies from 4MHz to 200MHz offset frequency <sup>1</sup> and achieves a maximum of 25dB cancellation at 40MHz offset frequency. Moreover, the delay-and-interpolate method for spur cancellation is also verified. Fig. 2.8 shows the results of spur cancellation. In this design, the delay line is set to be 2nsec, which generates notches at 250MHz (1/2 $T_1$ ) and 750MHz (3/2 $T_1$ ) offset frequency. The spurs at those two offset frequencies can be rejected by 15dB (from -56 to -71dBc). Such rejections also apply to phase noise. To facilitate the

<sup>&</sup>lt;sup>1</sup>Note that the lower bound frequency should have achieved to  $ω_{DLL} ≈ 100 \text{kHz}$  as mentioned, but the charge-pump output resistance ( $R_{CP}$  of 2 k Ω) also limits it. Since  $R_{CP}$  makes the integrator to be lossy, and degrades to lower frequency cancellation to 4MHz. On the other hand, the upper bound frequency (200MHz) is limited by  $1/2T_1$ , since the approximation in (2.2) is longer hold beyond 200MHz.





Figure 2.8: Far-out spurs at 250MHz and 750MHz offset frequencies can be cancelled up to 15dB by the notches (top figure). And far-out noise can also be attenuated by notches (bottom figure).

|                                    | TMTT 2015[3]            | RFIC 2012[4]              | IMS 2016[6]                  | This work                |
|------------------------------------|-------------------------|---------------------------|------------------------------|--------------------------|
| Frequency                          | 1.5GHz                  | 5GHz                      | 10GHz                        | 1GHz                     |
| Delay Line<br>Type                 | Off-chip<br>FBAR        | On-chip inverter          | On-chip LC +<br>Off-chip SAW | On-chip inverter         |
| Phase Noise<br>Cancellation BW     | 1k ~ 2MHz               | 100k ~ 20MHz              | 100k ~ 10MHz                 | 4M ~ 200MHz              |
| Max Phase Noise Cancellation Depth | 40dB                    | 12.5dB                    | 15.5dB                       | 25dB                     |
| Far-out Spur<br>Cancellation       | N/A                     | N/A                       | N/A                          | 15dB                     |
| Power Consumption (excluding VCO)  | 340mW                   | 20.9mW                    | 102mW                        | 11mW                     |
| Core Area                          | 1.8×1.2 mm <sup>2</sup> | 0.38×0.32 mm <sup>2</sup> | 1.68×1.5 mm <sup>2</sup>     | 0.3×0.25 mm <sup>2</sup> |
| Technology                         | 130nm CMOS              | 90nm CMOS                 | 65nm CMOS                    | 65nm CMOS                |

Table 2.1: Performance summary.

observation at high offset frequencies, we use external clock with broadband noise as testing clock instead of on-chip VCO. As shown in Fig. 2.8, two notches at 250MHz and 750MHz are clearly shown on phase noise plot after applying the technique. Table 2.5 summarizes and compares our work with recent publications [12–15].

# Chapter 3

# A PVT-Insensitive and Low-Jitter Clock Multiplier with Digital Spur Calibration

#### 3.1 Introduction

A clock multiplication circuit is an essential part of high speed communication systems, that provides desire clock frequencies for the whole system from an external low frequency reference clock. A phase-locked loop (PLL) is one of the most commonly used architecture to achieve such task. Typically, a higher jitter tracking bandwidth  $(\omega_j)$  of a PLL is desired, since it helps high-pass filtering out the phase noise from an local oscillator, which is typically noisy due to limited on-chip quality factor(Q) and power budget. Indeed, higher  $\omega_j$  also allows in more phase noise from the reference clock, but it is typically not a problem, since we tend to use a clean clock source (such as crystal oscillators) as the a reference clock. However, due to the loop stability limitation, there is an upper limit on  $\omega_j$  to about 1/10 of the reference clock frequency, which is well known as Gardner's limit [16].

As a result, people switch to other architectures that is free from this bandwidth limitation. Among them, the most popular one is an injection-locked clock multiplier (ILCM) [17–24], which achieves much wider  $\omega_j$  (without stability concern on the explicit loop) and demonstrates the potential of low jitter with a simpler circuit architecture over traditional PLLs, as shown in Fig. 3.1(a) [17]. Furthermore, the virtue of wide  $\omega_j$  enables ring oscillators in low-jitter application with much lower silicon cost. However, one major issue of ILCM is the robustness. Since the excellent jitter performance can be achieved only when the oscillator's natural frequency matches the N-th harmonic of the injection frequency (i.e. the reference clock frequency). The natural frequency of an oscillator is very sensitive to process, supply voltage, and temperature (PVT) variations. As can be shown in the Fig. 3.1(b)(c), the jitter and phase noise raise considerably due to  $\omega_j$  shrinking when the natural frequency drifts. To sustain a steady jitter performance over PVT variations, it is manda-



Figure 3.1: (a) The simple architecture of an ILCM, and (b)(c) jitter and phase noise degradation due to  $\omega_j$  shrink when the natural frequency drifts. [17]

tory to apply continuous frequency tracking loops (FTLs)[19–23] to pull the drift back as discussed below.

#### Frequency Tacking Loops

The simplest way to do frequency tracking is to resort back to PLLs [19, 22, 23]. But the issue of this approach is that there are two loops (one explicit loop form PLL for frequency tracking, and one implicit loop of injection locking itself) operate concurrently that might race with each other [19] in some scenarios and requires to maintain a certain phase between injection and reference phases. This issue can be worked around by introducing a replica VCO [25], but the mismatch between them is yet another issue. Another method introducing a high resolution TDC [20] to measure the period difference of injected cycle and free-running cycle to do frequency tracking, but the TDC is typically a "luxury" option due to its area and power. To sum up, these innovations on FTLs did manage to align the natural frequencies and keep the jitter low over PVT, but they also come with other side-effects and extra noise, power, and design complexity penalty. Therefore, it would be attractive if we can figure out a new architecture that is immune to frequency drift at the first place.



Figure 3.2: Time, phasor, and PDR diagram of natural frequency aligned ( $\Delta f = 0$ ) and off-tune ( $\Delta f \neq 0$ ) cases.

### 3.2 Phase Domain Response

Before diving into the circuit, it would be instructive to go through the analysis technique called phase-domain response (PDR) [26, 27], which serves as a simple yet effective tool to understand and analyze pulsed injection-locking. Here, we can take ILCM as an example on how PDR analysis works. Fig. 3.2 shows the time domain waveform, phasor diagram, and PDR plot of the two cases with natural frequency aligned ( $\Delta f = 0$ ) and off-tune ( $\Delta f \neq 0$ ).

First, we can see the phasor diagram in the middle, that shows the output clock before and after injection pulse applies (i.e.  $CK_{out}(t_{inj}^-)$  with black arrow and  $CK_{out}(t_{inj}^+)$  with blue arrow, respectively). By drawing the triangles, we can soon observe that the amount of phase change  $(\Delta\theta)$  is a function of initial phase error  $(\theta_e$ , between  $CK_{out}$  and  $CK_{inj}$ ). This relation of  $\Delta\theta$  with respect to  $\theta_e$  can be clearly represented in a PDR plot, as shown in the right hand side of Fig. 3.2.

In Fig. 3.2, we can see that if the natural frequency of the oscillator matches N-th harmonic of injection frequency, the phase of out clock will eventually align to injection phase in steady state ( $\theta_{e,ss} = 0$ ). On the other hand, when frequency drifts ( $\Delta f \neq 0$ ), the phase error accumulates from each cycle up to  $\theta_{e,max}$  until the next injection occurs, which yields a non-zero  $\theta_{e,ss}$ . These two cases corresponding to the two points marked on the PDR shown in Fig.3.2 (black dot for  $\Delta f = 0$ , and orange dot for  $\Delta f \neq 0$ ).

Here, we can decompose the total output phase as

$$\theta_{out}(n) = \theta_{vco}(n) + \hat{\theta}_{out}(n-1). \tag{3.1}$$

, where  $\theta_{vco}(n)$  is the instantaneous VCO phase and  $\hat{\theta}_{out}(n)$  is the extra phase from last injection. According to the PDR plot in Fig. 3.2, the phase correction  $\Delta\theta(n) = \beta \cdot \theta_e(n)$ , or



Figure 3.3: Jitter transfer of  $S_{out}/S_{ref}$  (solid line) and  $S_{out}/S_{vco}$  (dashed line) with different  $\beta = 0.01, 0.5, \text{ and } 1.$ 

$$\Delta\theta(n) = \hat{\theta}_{out}(n) - \hat{\theta}_{out}(n-1) = \beta \cdot \left\{ N \cdot \theta_{ref}(n) - \left[ \theta_{vco}(n) + \hat{\theta}_{out}(n-1) \right] \right\}. \tag{3.2}$$

By applying z-transform to (3.2), we can get

$$\hat{\Theta}_{out}(z) = \frac{-\beta}{1 + (\beta - 1)z^{-1}} \,\Theta_{vco}(z) + \frac{N\beta}{1 + (\beta - 1)z^{-1}} \,\Theta_{ref}(z). \tag{3.3}$$

Therefore, by (3.1), (3.3) and then applying discrete-time to continuous-time conversion<sup>1</sup> the phase noise of the pulse injection locking can be expressed as [18]

<sup>&</sup>lt;sup>1</sup>By setting  $z=e^{-j\omega T}$ , and convolved by zero-order-hold which corresponding to multiplying  $\frac{\sin(\omega T/2)}{\omega T/2}e^{-j\omega T/2}$  in frequency domain.

$$S_{out}(j\omega) = \left| 1 - \frac{\beta}{1 + (\beta - 1)e^{-j\omega T}} \frac{\sin(\omega T/2)}{\omega T/2} e^{-j\omega T/2} \right|^2 S_{vco}(j\omega)$$

$$+ \left| \frac{N\beta}{1 + (\beta - 1)e^{-j\omega T}} \frac{\sin(\omega T/2)}{\omega T/2} e^{-j\omega T/2} \right|^2 S_{ref}(j\omega)$$
(3.4)

where  $\beta$  is the slope of PDR, N is the multiplication ratio, T is the reference period,  $S_{out} = |\Theta_{out}|^2$ ,  $S_{ref} = |\Theta_{ref}|^2$  and  $S_{vco} = |\Theta_{vco}|^2$ . Eq. (3.4) shows how the "implicit loop" offers a low-pass jitter transfer on reference noise and high-pass jitter transfer on VCO noise similar to a PLL. To better visualize (3.4), Fig. 3.3 illustrates the the jitter transfers of  $S_{out}/S_{ref}$  and  $S_{out}/S_{vco}$  with different  $\beta$ . As we can see form it, the  $\omega_j$  is highly related to the slope  $\beta$ , which can also be interpreted as "injection strength". For maximum slope ( $\beta = 1$ ), each injection pulse fully corrects the phase error (i.e.  $\Delta \theta = \theta_e$ ) and achieve the widest  $\omega_j$  (green curves in Fig.3.3). On the other hand, when the slope close zero, it means there will be almost no correction by the injection pulses and results in low  $\omega_j$  (red curves in Fig.3.3).

With the understating on how the slope affects  $\omega_j$ , we can now move back to the PDR plot in Fig. 3.2. The PDR of ILCM yields an S-shaped curve with a steepest slope around the origin and a flattened slope as phase error increases. The black dot (i.e  $\Delta f = 0$ ) locates at the origin with steepest slope and widest  $\omega_j$ , while the orange dot (i.e  $\Delta f \neq 0$ ) deviates from the origin with less slope and smaller  $\omega_j$ . This explains why the phase degradation shown in Fig. 3.1 since the natural frequency drifts leads to a non-zero  $\theta_{e,ss}$  with less high-pass rejection on VCO phase noise.

### 3.3 Proposed LC-Tank-Based Clock Multiplier

In this work, we propose an LC-tank-based clock multiplier (LCCM) that is inherently insensitive to frequency drift. Fig. 3.4(a) shows a brief comparison between the ILCM and LCCM. Both architectures utilize a pull-low switch for injection and an LC tank centered around desired N-th harmonic of injection frequency, and the only difference is a negative  $g_m$  cell. As the waveform illustrates, the former sustains its oscillation by the negative  $g_m$  cell like an ordinary oscillator, whereas the latter relies purely on the injected energy and oscillates with a damped envelope until the next injection pulse comes in. The damping of LCCM makes it more faithfully track to the injected phase. We can see this from Fig. 3.4(a), in ILCM, since there's no damping, the magnitude of injection is less than  $||CK_{out}(t_{inj}^-)||$ , when  $\theta_e$  rotates from 0 to  $2\pi$  (which are all the possible angles), there's an limit on the range

of achievable phase correction (i.e.  $|\Delta \theta| \leq |\sin^{-1}(\frac{\|\text{Inj}\|}{\|CK_{out}(t_{inj}^-)\|})|$ ). On the other hand, in



Figure 3.4: (a) A comparison between ILCM and LCCM. (b) PDRs of them and how phase noise is affected by PDR slope.

LCCM counterpart, since there's a damping on the amplitude, before each injection instant, the injection magnitude is larger than the residual swing (i.e.  $||CK_{out}(t_{inj}^-)||$ ). Again, when  $\theta_e$  rotates from 0 to  $2\pi$ , the achievable phase correction not only fully covers from 0 to  $2\pi$  but also tracks  $\theta_e$  well (i.e.  $\Delta\theta \approx \theta_e$ ), which explains why LCCM has an almost linear PDR with the slope of about 1. As discussed in previous section, the dynamics of them can be characterized by their PDRs. As shown in Fig. 3.4(b), the PDR of the ILCM yields an S-shaped curve with a varying slope depends on the steady state phase error, while the PDR of LCCM is close to a straight line and achieves a constant slope (and therefore a constant jitter tacking bandwidth) regardless of which point it settles on the PDR due to frequency drift. As a result, we only need to ensure the LC tank falls within a coarse frequency band (that is  $(N-1)f_{inj} < f_{LC} < (N+1)f_{inj}$ ) rather than a precise frequency. In this way, we can eliminate the continuous FTLs, since only a one-time initializing frequency calibration



Figure 3.5: The schematic of the LC-tank based clock multiplier.

is sufficient.

Fig. 3.5 depicts the schematic of the LCCM, which consists of a pulse generator that converts the input clock into a series of narrow pulses, an LC tank, a pull-low transistor, and a limiting amplifier<sup>2</sup> to restore the swing back to rail-to-rail. Due to the modest Q of an on-chip tank, the envelope decay is significant and causes not only AM but also PM spurs, due to AM-PM conversion through the non-linear capacitance. In addition, since the injection only aligns the edges on every N cycle, the edges in between are unconstrained. When the LC tank is off-tuned, the period of the last cycle must absorb the accumulated phase error and also results in PM spurs (the same issue occurs with ILCMs shown in Fig. 3.2). Although the limiting amplifier helps to reject AM spurs, PM spurs still remain. This motivates us to develop a spur calibration method in this work. Note that, although a similar architecture of LCCM has been reported in [28], the solution to PM-spur issue is not provided.

### 3.4 Digital Spur Calibration

The PM spur is caused by unequal period of the free running cycles. To eliminate these spurs, we can actually equalize the periods by adjusting the each cycle through a controlled delay line. Fig. 3.6 illustrates the high level idea of digital spur calibration. Here, we cascade a digital-to-time converter (DTC) to offer the delay adjustments, and a rising edge spur calibration block that monitors the clock edges and figures out the exact amount of delay control code to DTC.

<sup>&</sup>lt;sup>2</sup>The limiting amplifier is composed by an 5-stage single-ended inverter chain.



Figure 3.6: Top level ideal of spur calibration.



Figure 3.7: (a) The schematic of LCCM, (b) spur calibration flow, and (c) the period distribution before/after calibration.



Figure 3.8: Proposed architecture of LCCM with digital spur calibration circuity.

The detailed algorithm inside the rising edge spur calibration block can be demonstrated as follows (Fig. 3.7(b)). First, we delay the clock  $(CK_{LA})$  by one period nominally, such that we can weigh adjacent periods by only comparing the edges between  $CK_{LA}(t)$  and  $CK_{LA}(t-T)$ . In the example shown here, the period of the previous cycle  $(T_0)$  is larger than the present one  $(T_1)$ . Thus, after the comparison, we accelerate the upcoming "Edge-1", which in turn decreases  $T_0$  and increases  $T_1$  (i.e. equalizes  $T_0$  and  $T_1$ ). The same process above applies to each cycle sequentially such that each cycle gets compared and equalized to its neighbors. In steady-state, all periods are redistributed to  $T_{inj}/N$ , and the PM spurs are removed.

Fig. 3.8 shows the overall circuit of the proposed calibration for N=8 realized in this work <sup>3</sup> which is sightly backed-off from the achievable on-chip Q of 12. First, the one-period

<sup>&</sup>lt;sup>3</sup>The circuit is implemented as single-ended in this test chip, but it can be implemented as differential



Figure 3.9: Modified falling edge generator with tunable duty cycle.

delay is ensured by a digital delay-locked loop (DLL). The adjacent period comparison is done by a bang-bang phase detector reused from the DLL, and the compared result  $(e_{bb})$  is then assigned sequentially to the accumulator bank which stores the delay coefficients  $(d_0 \sim d_7)$  for each edge. Finally, the output MUX picks up and forwards the delay coefficients to a digital-to-time converter (DTC) to adjust the corresponding edges. Both the input and output MUXes are controlled by a counter with an offset on their selection indices to match the timing. The DTC-aligned edges flow back to the calibration block again and again, and eventually settle to an equilibrium for each period. After calibration, the delay coefficients are frozen and replayed to the DTC by the output MUX, while the DLL and input MUX (gray-shaded blocks) are turned off to save power. Therefore, the quantization noise will appear only as residual spurs without impacting the phase noise.

After the rising edges are corrected (i.e. green edges of  $CK_{cal}$ ), we can simply replicate them into falling edges by a falling edge generator as shown in Fig. 3.8. The falling edge generator ANDs the input clock with its inverted and delayed version. In this way, we can get a clock with both equally-spaced rising and falling edges. The DTC is realized by two inverters loaded by a 9-bit (with 5 thermometer-encoded MSB and 4 binary LSB) capacitor bank for each and buffers for rise time recovery (Fig. 3.8). Note that the DTC's linearity is non-critical here as long as it is monotonic, since the non-linearity will be absorbed into calibration coefficients. One remark is that that the duty cycle of  $CK_{out}$  is not 50%. Although not implement in this test chip, the duty cycle issue which can be easily fixed by the modified version shown in Fig. 3.9. Just like the phase-frequency-detector (PFD) in PLL, the pulse width of  $CK_{out}$  is the delay between  $CK_R$  and  $CK_S$ . In the way, we can adjust the duty cycle of  $CK_{out}$  at will or even introduce a calibration loop, if necessary.



Figure 3.10: Instantaneous period over time with calibration loop running.

The calibration be can verified by monitoring the instantaneous periods of each cycle. As shown in Fig. 3.10, there are large periodic jumps on the plot due to highly unequal period distribution and causing spurs. As the calibration comes into play, the deviation shrinks, and eventually settling to small noise-like toggles. The calibration coefficients are also monitored. Fig. 3.11(a) shows the settling of the each coefficients  $(d_0 \sim d_7)$ . Since the coefficients absorb the phase modulation profile of the un-calibrated clock, it is informative to plot the coefficient value over time. As shown in Fig. 3.11(b), coefficients presents a decaying shape that further confirms the AM-PM modulation from the delaying envelope of LCCM. As a sanity check, when we feed an external clock that is phase modulated by a sinusoidal wave, the coefficients does reflect the sinusoidal shape to counteract on such modulation and cancel the PM spurs as shown in Fig. 3.11(c).

## 3.5 Frequency Domain Interpretation and Stability Analysis

In the previous section, we explained how the calibration works from the time-domain perspective, which can be analysed in frequency domain as well. Fig. 3.12 (a) recapitulates the calibration related circuits which actually form a closed loop and mandate the frequency domain analysis to ensure the stability.

We can start from  $CK_{cal}$  whose phase can be represented as  $\phi_{cal}$ . The transfer function of DCDL and BBPD can be expressed as  $1-z^{-1}$  since they are basically a differentiator. The transfer function of the accumulator bank is a little bit non-straightforward, but we can show



Figure 3.11: (a) The settling of delay coefficients  $(d_0 \sim d_7)$ , (b) the coefficient profile when calibrating the clock from LCCM, and (c) the coefficient profile when calibrating the clock that is sinusoidal modulated.



Figure 3.12: (a) The transfer functions of the blocks around the calibration loop, and (b) the overall loop transfer function.



Figure 3.13: Digital N-path filter [29].

that it is actually a digital N-path filter. Fig. 3.13 shows exactly the same case as ours that is analysed thoroughly in [29]. The input x(n) get demultiplexed into  $x_1(n), x_2(n), ..., x_N(n)$  and then fed sequentially into each sub filter with that same transfer function H(z). After the filtering, the output y(n) collects the filtered result by multiplexing. Therefor, we can express the z-transform of x(n) as

$$X(z) = \sum_{n=0}^{\infty} x(n)z^{-n}$$

$$= x(0) + x(N)z^{-N} + \dots$$

$$+ x(1)z^{-1} + x(N+1)z^{(N+1)} + \dots$$

$$+ \dots$$

$$+ x(N-1)z^{-(N-1)} + x(2N-1)z^{-(2N-1)} + \dots$$

$$= \sum_{i=1}^{N} z^{-(i-1)} X_i(z^N),$$
(3.5)

where

$$X_i(z) = \sum_{i=0}^{\infty} x(nN + i - 1)z^{-nN}.$$
 (3.6)

By the same token, we can write y(n) as

$$Y(z) = \sum_{i=1}^{N} z^{-(i-1)} Y_i(z^N).$$
(3.7)

where

$$Y_i(z) = \sum_{i=0}^{\infty} y(nN + i - 1)z^{-nN}.$$
 (3.8)

Therefore, the relation between input and output in each sub filter is given by

$$Y_i(z) = H(z)X_i(z) \tag{3.9}$$

$$Y_i(z^N) = H(z^N)X_i(z^N).$$
 (3.10)

From (3.5),(3.7),and (3.10), we can express the the relation between overall input and output as

$$Y(z) = \sum_{i=1}^{N} z^{-(i-1)} H(z^{N}) X_{i}(z^{N})$$

$$= H(z^{N}) X(z).$$
(3.11)

Equation (3.10) shows that the MUX and de-MUX operation convert the sub filter H(z) into  $H(z^N)$ .

Returning to our accumulator bank, we can now apply this result. The sub filter is simply an accumulator with transfer function of  $\frac{1}{1-z^{-1}}$ , therefore the transfer function of the accumulator will be  $\frac{1}{1-z^{-8}}$ , for N=8 case here. Then we can combine the overall the transfer function of the feedback path (Fig. 3.12(b)) as

$$\beta(z) = K_{DTC} \cdot \frac{1 - z^{-1}}{1 - z^{-8}} \tag{3.12}$$

or

$$\beta(z) = \frac{K_{DTC}}{1 + z^{-1} + z^{-2} + z^{-3} + z^{-4} + z^{-5} + z^{-6} + z^{-7}}.$$
(3.13)

Fig. 3.14(a) shows the pole zero plot of the calibration loop, and the stability can be readily verified by the root-locus plot shown in Fig. 3.14 (b) with all the loci inside the unity circle. Fig. 3.14(c) plots the transfer functions of  $\beta(z)$  and the overall closed-loop response G(z) (from  $CK_{LA}$  to  $CK_{cal}$  in Fig. 3.12(b)) of the calibration circuits. We can see that the shape of  $\beta(z)$  is actually a comb filter with peaks at the harmonics of clock frequencies (i.e. 375MHz) because of the poles on the unit circle. The overall closed-loop response G(z) can be expressed as



Figure 3.14: (a) Pole/zero map, (b) root-locus plot, and (c) the frequery resonse of  $\beta(z)$  and G(z).



Figure 3.15: Testing setup

$$G(z) = \frac{\Phi_{cal}}{\Phi_{LA}}$$

$$= \frac{1}{1 + \beta(z)},$$
(3.14)

which can be approximated as  $1/\beta(z)$  when the loop gain is high. Therefore G(z) presents a shape opposite to  $\beta(z)$  as a notch filter with notches at the harmonics of clock frequencies. This frequency response plot further justifies the reference spur rejection capability of the calibration loop.

#### 3.6 Experimental Results

This chip has been fabricated in 28-nm CMOS technology. Fig. 3.15 shows the measurement setup. Multiplying a 375-MHz clock from E8257D by 8 to 3GHz, it consumes a total power of 6.5mW after calibration, of which 1.7mW is dissipated in LCCM, 3.6mW in DTC and falling edge generator from a 1-V supply, and 1.2mW in the digital core from a 0.8-V supply. Note that during calibration, an extra 2.8mW power is consumed by the DLL and more active digital circuit switching, which are only needed once and contribute zero power after calibration. Fig. 3.16 shows the measured spectrum of the output clock by N9030A and E5052A.

The intrinsic 3-GHz output clock contains substantial spurs at the offset of 375-MHz's harmonics with the highest spur level of -30.3dBc. After applying the calibration, the spurs can be suppressed considerably down to -50.9dBc. Phase noise is also measured in Fig.



Figure 3.16: Measured spectra before/after calibration (> 20dB improvement).



Figure 3.17: Measured phase noise plot that achieves an integrated jitter of  $138 f s_{rms}$  from 1k to 40MHz.



Figure 3.18: Measured jitter and spur (without/with spur re-cal.) over supply voltage and output frequency variation.

3.17. With the DTC modulated by calibrated coefficients, the 3-GHz output presents a phase noise of -127.7 dBc/Hz at 1-MHz offset and the integrated jitter of 138fsrms (from 1k to 40MHz) that achieves a -249.1 dB FoM.

Furthermore, we verify the jitter immunity to frequency deviations by varying either the supply (0.95 to 1.05V) or the injection frequency (372.5 to 377.5MHz). As shown in Fig. 3.18, the jitter performance remains steady over the whole variations without extra frequency tuning. Note that even though the spur level regrows under PVT, the spur levels can be suppressed again by re-applying spur calibration as shown and verified through measurement. Table 2.5 summarizes the state-of-art integer-N multipliers, where most work stabilize their jitter performance by a variety of FTL architectures. Here we benchmark the robustness on frequency drift induced by 5% supply voltage change, since this is the only common test item across these works that can collected and compared. Our work show steady jitter without FTL thanks to the virtue of inherent insensitive to frequency drift. The Figure-of-Merit (FoM) on jitter and power is also compared in Fig. 3.19. Our work is among the FoM region closed to -250 dB in the frontier. Fig. 3.20 shows the die micrograph, which occupies  $0.26mm^2$  active area.



Figure 3.19: FoM comparison with state-of-the-art integer-N clock multipliers.



Figure 3.20: Test-chip die photo

|                                       | JSSC'09 [3]                         | ISSCC'13 [4]                       | ISSCC'15 [5]                         | ISSCC'18 [6]                        | This work                          |
|---------------------------------------|-------------------------------------|------------------------------------|--------------------------------------|-------------------------------------|------------------------------------|
| Architecture                          | LC ILCM                             | LC ILCM                            | LC ILCM                              | Ring MDLL                           | LCCM                               |
| FTL                                   | GRO-TDC                             | SSPD                               | Pulse Gating                         | Block Sharing                       | FTL-less                           |
| Technology                            | 130nm                               | 65nm                               | 65nm                                 | 28nm                                | 28nm                               |
| Supply                                | 1.2V                                | 1.2V                               | 0.9V                                 | V8.0                                | 1.0V/0.8V                          |
| Output Freq.                          | 3.2GHz                              | 2.4GHz                             | 6.75~8.25GHz                         | 3GHz                                | 3GHz                               |
| Ref. Freq.                            | 50MHz                               | 150MHz                             | 105~129MHz                           | 200MHz                              | 375MHz                             |
| Spur                                  | –63.9dBc                            | –49dBc                             | –40dBc                               | –44dBc                              | −50.9dBc                           |
| Int. Jitter                           | 130fs <sub>rms</sub><br>(100~40MHz) | 188fs <sub>rms</sub><br>(1k~40MHz) | 190fs <sub>rms</sub><br>(10k~100MHz) | 292fs <sub>rms</sub><br>(10k~40MHz) | 138fs <sub>rms</sub><br>(1k~40MHz) |
| ∆Jitter under<br>±5% ∆V <sub>DD</sub> | N/A                                 | 136%* / 14%<br>(FTL off / on)      | >78%* / 13%<br>(FTL off / on)        | >100% / 9%<br>(FTL off / on)        | 7%<br>(no FTL)                     |
| Power                                 | 28.6mW                              | 5.2mW                              | 2.25mW                               | 1.45mW                              | 6.5mW                              |
| FoM**                                 | −243.2dB                            | -247.0dB                           | –250.9dB                             | –249.1dBc                           | –249.1dB                           |
| Area                                  | 0.4mm <sup>2</sup>                  | 0.25mm <sup>2</sup>                | 0.25mm <sup>2</sup>                  | 0.0056mm <sup>2</sup>               | 0.26mm <sup>2</sup>                |

\*Estimated from figures. \*\*FoM =  $10\log(\frac{\sigma_{ms}^2}{1\text{sec}} \cdot \frac{P}{1\text{mW}})$ 

Table 3.1: Comparison to the state-of-the-art.

# Chapter 4

# Improved Version

### 4.1 Architecture Rethinking

In the last chapter, we introduced a new clock multiplier architecture (i.e. the LC-tank base clock multiplier) and silicon verified its capability of PVT-insensitivity and low jitter. Furthermore, the most undesired issue of substantial intrinsic spur generation is effectively mitigated by a novel digital spur calibration technique. However, there are some remaining limitations with the LCCM, and plenty of room for improvement. As illustrated in Fig. 4.1, the version-2 test chip incorporates four major improvements: (1) power reduction, (2) LMS settling ripple reduction, and (3) multiplication ratio (N) generalization (i.e. supporting larger N) and (4) fractional N operation. In this chapter, we will go through the details of each improvement.

#### 4.2 Power Reduction

Based on the measured power breakdown of the version-1 test chip, we can easily notice that the DTC accounts for almost half of the total power, as shown on Fig. 4.2. Therefore, the DTC is undoubtedly our number one target to cut the power consumption. Instead of diving into circuit level optimization for the DTC, we can actually make a smarter planning on the architecture level.

The major task of DTC is nothing but to provide phase modulation on each clock edge to align them back to their ideal positions such that the PM spurs get eliminated. In fact, we can achieve exactly the same function even without the DTC. Just like modern digital polar transmitter architectures, the phase modulation can be realized by the direct frequency modulation (DFM) technique ([30, 31]). Just as its name implies, the phase modulation is performed by directly modulating on the oscillator's frequency, rather than cascading a DTC or by phase interpolator behind for phase modulation. The power and cost benefit is quite apparent. Since we already invest power and area on the oscillators, the DFM technique



Figure 4.1: Four major improvements in version 2 : (1) power reduction; (2) multiplication ratio (N) extension; (3) LMS settling ripple reduction; and (4) extension to fractional-N.



Figure 4.2: Power reduction (from 6.5mW to 2.9mW) by replacing the DTC with direct frequency-modulation.



Figure 4.3: The settling of LMS co-coefficients without (left) and with (right) the second differentiator.

can reuses the existing tuning knobs with the negligible overheads on digital control circuit thanks to the scaling in advanced technology.

Fig. 4.2 shows the modification with DFM technique. The DTC is eliminated, and the controlling bit (previously controlling the DTC) is then feed into the tuning cap-banks in LC-tank for frequency modulation. The calibration core is also modified accordingly by simply inserting a differentiator (i.e.  $1-z^{-1}$ ) on the signal path to convert the phase information into frequency. Although the differentiator can be placed anywhere in the signal path, cascading after the BBPD minimizes the hardware cost since it is only one-bit. In this way, the power can be cut from 6.5mW down to 2.9mW and the FoM can be improved from  $-249 \, \mathrm{dB}$  to  $-252 \, \mathrm{dB}$  with a negligible cost of an one-bit differentiator.

### 4.3 LMS Settling Ripple Reduction

The digital spur calibration introduced in the last chapter has proven its efficacy to suppress the spur level down to less than -50 dBc. Although it is good enough for most applications, there are still demands for even lower spur level such as RF transceiver with tough co-existence specification. Theoretically, the minimum achievable spur level is limited by the number of tuning bits onthe DTC (or on cap-bank in DFM approach), since LSB quantization error will show up as residual spurs. Ideally, with a 9-bit resolution, we should be able to achieve about -54 dBc (=  $6 \times 9$ ), which means that there is a 3-dB performance gap between measurement (i.e. -51 dBc). The reason of this performance gap can be explained by the settling behavior of the LMS loops. Fig. 4.3 depicts the settling of each coefficient



Figure 4.4: A example showing N=4 with  $T_0T_1T_2T_3=3454$ , which should ideally be equalized to 4 4 4 4. The old algorithm (left) wiggles around the optimum with repeated pattern whereas the new algorithm (right) settles to optimum successfully.

stored in the accumulator bank. We can see that the coefficients did settle well to some stable values, but we can also discover that there are some residual ripples (i.e. small random toggling) of about a couple of LSB. Therefore, at the moment we shut down the calibration and freeze the coefficients, such "random" ripples also got sampled and further increase the residual spur level.

Although found accidentally, we found that by adding another differentiator in the signal path helps to reduce the ripples (Fig. 4.3)<sup>1</sup>. We can understand why it helps as follows.

#### Interpretation

As described in chapter 3, the calibration algorithm is *comparing adjacency periods* by a BBPD, which in turn accounts for the root cause of the ripples. Due to the binary nature,

<sup>&</sup>lt;sup>1</sup>The "outlier" of the coefficients in Fig. 4.3 is corresponding to the last cycle. Since as explain in Fig. 3.2, when the natural frequency is off, the accumulated phase error from the first to the (N-1)-th cycle need to be absorbed in the last cycle, which make the last cycle period much different to the others.

the output of a BBPD is either "+1" (increase) or "-1" (decrease) without a "neutral" state, that is

$$e_{bb}[k] = \begin{cases} +1, & \text{if } T_{k-1} \le T_k \\ -1, & \text{if } T_{k-1} > T_k. \end{cases}$$

$$(4.1)$$

Fig. 4.4 shows the case that some coefficients are already close to their optimal values, ideally they should just stand still and wait other coefficients to reach optimal values. However, due to the output of BBPD (i.e. there's no equal), the already optimal coefficients are forced to change. Then the coefficients have re-settle back and then repeat the story above again, which results in a limit cycle. On the other hand, the extra differentiator turns the original algorithm into finding "a peak" or "a valley" in 3-consecutive cycles, that is

$$e_{bb}[k](1-z^{-1}) = \begin{cases} +2, & \text{if } T_{k-2} > T_{k-1} \le T_k \\ -2, & \text{if } T_{k-2} \le T_{k-1} > T_k \\ 0, & \text{otherwise.} \end{cases}$$
(4.2)

As we can see from (4.2), there is an extra "neutral" state generated that helps to hold the optimal coefficients without interruption.

### 4.4 Multiplication Ratio Generalization

As a general clock multiplier, there should not be any limitation on its multiplication ratio to support a variety of crystal frequencies. In this section, we will discuss how we extend the ratio and even make it to be fractional.

#### N Extension

Except for the spur issue (which has been solved effectively by the proposed spur calibration), the other weakness being questioned most is the limited multiplication ratio (N). With an LC-tank of quality factor Q, the LCCM can at most sustains about Q cycles until the next injection pulse is needed. Due to the limited quality factor (which is typically about  $11 \sim 15$ ) achievable on CMOS technology, N can hardly surpass beyond 10, which prevents the LCCM from the applications with high multiplication ratio.

To mitigate this issue, we can allow some feedback, which make it somewhat close to (but not) an oscillator. Fig. 4.5 demonstrates this idea. Here, with N=16, we can allow



Figure 4.5: Supporting higher multiplication  $\operatorname{ratio}(>Q)$  by selected pulse feedback on every 4th-cycle.

feedback pulses on every 4-cycles  $^2$  and insert them between each injection pulses. In this way, regardless how long the "true" injection pulses comes, the LC-tank can always get replenished in every fourth cycle. The start-up is guaranteed, since the  $10\sim15~Q$  would be more than sufficient to ensure the first "artificial" pulse generated by the 16-phase divider and gets fed back to the tank for sustaining oscillation.

As shown in Fig. 4.5, the extra circuit is just a N-phase divider (for N-phase generation), a MUX (selecting desired phase every 4-cycle), and an OR-gate for combining the feedback pulses with the "true" injection pulses. With this technique, the N can be extended arbitrarily to fit the desired application scenarios.

### 4.5 Fractional-N

In a real application, it is desired to support a variety of crystal frequencies with the same output frequency, which mandates the fractional-N operation. Due to the similar phase correction mechanism by injection pulses, our LCCM architecture shares the same limitation as ILCM that can only support integer-N. Fortunately, lots of previous work published recently ([32–34]) demonstrate "add-on" circuits that can turn an integer-N clock multiplier into a fractional-N.

As shown in Fig. 4.6, a DTC controlled by a delay-modulation block is inserted between the reference signal and the input of the multiplier core. We will go through the operating

<sup>&</sup>lt;sup>2</sup>The number of local feedback cycle (M) of 4 is arbitrarily picked here, and it can be  $1 \sim 8$  with the N equals the a multiple of M (i.e. N = kM). Here, we just demonstrate an example of N = 16 and M = 4.



Figure 4.6: The "add-on" circuit to turn an integer-N multiplier into fractional-N



Figure 4.7: A toy example of N = 2.75 for "off-grid" direct injection (top) and "on-grid" injection with proper delay inserted (bottom).

principle and the details inside the delay-modulation block in this section.

#### Operating Principle

To demonstrate how the circuit works, we can consider a toy example shown in Fig. 4.7 with N=2.75, where the reference period  $T_{out}=T_{ref}\times 2.75$  (or  $f_{out}=f_{ref}/2.75$ ). Without the "add-on" circuits, we just directly inject the reference clock into the multiplier. As we can see from the waveform, the first injection aligns the output edge with the reference and the output oscillates for the next two cycles as usual. But just before the third cycle completes, another injection is coming, which squeezes the third cycle. Such an "off-grid"



Figure 4.8: The details of the delay modulation block composed of a q-noise generator, an LMS estimator, and a set of phase monitoring circuits.

injection causes phase quantization noise and fractional spurs. Therefore, we need to insert varying delays (as shown in gray color) in the waveform to fill the gap such that we can inject "on-grid". Fig. 4.8 shows the detail of the delay modulation block composed by a q-noise generator, an LMS estimator, and a set of phase monitoring circuits. The desired varying delay is then provided by the DTC controlling the delay modulation block and can be expressed as

$$d[k] = Q_p[k] \times g[k] \tag{4.3}$$

, where  $Q_p[k]$  is the phase quantization noise generated by the q-noise generator and g[k] is the gain estimated by a LMS correlator.

#### Phase Quantization Noise Generator

In fact, the quantization noise we want to compensate is deterministic, and can be generated by the q-noise generator (Fig. 4.8). The fractional part (0.F) is then fed into a

 $\Sigma$ - $\Delta$  modulator. With the example mentioned earlier (i.e. N=2+0.75), the  $\Sigma$ - $\Delta$  modulator generates a repeated stream of 1, 1, 1, 0 with an average equal to 0.75. Subtracted by the input (0.75), we can get a sequence of the frequency quantization noise ( $Q_f[k]$ ) of -0.25, -0.25, -0.25, 0.75 and the phase quantization noise ( $Q_p[k]$ ) of 0.25, -0.5, -0.75, 0 by an accumulator.

#### LMS Gain Estimator

Up to now, we already have the phase quantization noise, the remaining task is to find a proper gain to modulate the DTC. Consider the timing diagrams of three cases: under, optimum, and over compensation (Fig. 4.9). We can soon discover from the plots: for undercompensation, the phase error is negatively correlated to  $Q_f[k]$  (i.e. $\Delta \phi[k] \times Q_f[k] < 0$ ); for optimal-compensation, the phase error is uncorrelated to  $Q_f[k]$  (i.e. $\Delta \phi[k] \times Q_f[k] = 0$ ); for over-compensation, the phase error is positively-correlated to  $Q_f[k]$  (i.e. $\Delta \phi[k] \times Q_f[k] > 0$ ). With this property, we can implement the LMS algorithm as

$$g[k+1] = g[k] - \mu_{\text{LMS}} \cdot (\Delta \phi[k] \times Q_f[k]) \tag{4.4}$$

,which updates the gain by accumulating the real-time correlations with a step size of  $\mu_{LMS}$ . Because whenever the gain the is under-estimated, the negative correlation causes the gain to increase and vice versa. As a result, the gain will settle to its optimum value in steady state. In this design, since we are using a BBPD for phase detection, and (4.4) becomes to

$$g[k+1] = g[k] - \mu_{\text{LMS}} \cdot (e[k] \times Q_f[k])$$
 (4.5)

, where  $e[k] = sgn(\Delta\phi[k])$ , as shown in Fig. 4.8.

#### Simulation Results

Fig. 4.10 shows the gain settling by the LMS loop. Here, we set two initial gain values  $(g_0 > g_{opt})$  and  $g_0 < g_{opt}$  to verify the convergence from the two directions. As shown in Fig. 4.10, the LMS loop settles to  $g_{opt}$  as expected. Meanwhile, we also monitor the instantaneous frequency during the whole settling process as shown in Fig. 4.11. We can see large variances for both cases at the beginning due to the non-optimal delay compensations. And then the variance narrows down as the gain coefficient approach  $g_{opt}$ . The final spectrum of the output clock is also shown in Fig. 4.12, where we set the  $f_{ref} = 210.4109589 \text{MHz}$  with N = 1/4 + 1/128. Since we want to observe the spectrum by FFT, we reversely set the "integer"  $f_{out}$  (=3-GHz) and "fractional"  $f_{ref}$  (=210.4109589MHz), such that  $f_{out}$  is right on the FFT bin without introducing a skirt that would overwhelm the closed-in information.



Figure 4.9: The timing diagram of three cases: under-compensated (top), optimum compensated (middle), and over-compensated (bottom). Note the relation of the  $\Delta \phi[k]$  (=  $\phi_{\rm inj} - \phi_{\rm out\_d}$ ) with  $Q_f[k]$ .



Figure 4.10: Gain co-efficient settling with initial  $g_0 > g_{opt}$  and  $g_0 < g_{opt}$ .



Figure 4.11: Instantaneous frequencies during gain settling with initial  $g_0 > g_{opt}$  and  $g_0 < g_{opt}$ 



Figure 4.12: The spectrum of the 3-GHz output with  $f_{ref} = 210.4109589 \text{MHz}$  and N = 1/4 + 1/128. With the initial gain value, the large quantization noise can be clearly seen (shown in the gray curve). After the gain settle to  $g_{opt}$ , the quantization noise is cancelled out by the DTC effectively (shown in the blue curve).

With the initial gain value, the large quantization noise can be clearly seen (shown in the gray curve). After the gain settles to  $g_{opt}$ , the quantization noise gets cancelled out by the DTC effectively (shown in the blue curve). The residual fractional spurs are around -43 dBc, and the integer spur is -40.4 dBc.

#### **Experimental Results**

The test chip has been tape-out last December in 28-nm CMOS technology. And we are planning to have the same test setup as Fig. 3.15. But the chip returned back in the end of April after the lab locked down due to COVID-19. As a result, the measurement has not completed yet.

## Chapter 5

## Conclusions

In this dissertation, we have explored the potential of a variety of spur and phase noise cancellation techniques, that deal with the problems by both analog and digital approaches as well as from frequency domain and time-domain perspectives with detailed theoretical and lab measurement results. Furthermore, we also explored a novel clock multiplier architecture with higher performance and better robustness compared to state-of-the-art. Finally, a further refined version improves on power, settling performance. Furthermore multiplication ratio generalization is also implemented.

### 5.1 Summary of Contributions

- Proof the concept of analog-signal-processing using delay line and feed-forward method to generate transfer function of  $|\cos(\pi fT)|$  that provides notches to reject the spurs, and the transfer function of  $|\sin(\pi fT)|$  that that provide high-pass filtering to suppress close-in phase noise.
- Demonstrated a novel clock multiplier with the same low jitter virtue as ILCM yet with much better immunity to frequency drift. Such immunity is further justified by a complete theoretical analysis.
- Proposed an effective digital spur calibration technique. Actually, this spur calibration method can be applied generally to the clock that suffer from the integer-N spur. For example, in the wireless transceiver, to avoid pulling, we typically introduce offset LO scheme (such as  $f_{LO} = 5/4f_{VCO}$  by mixing  $f_{VCO}$  and  $f_{VCO}/4$ ). Then we the can adopt our spur calibration loop the suppress the unwanted  $f_{VCO}/4$  harmonic spurs by set N=5 in the calibration loop. The interpretation on the working principle from both time-domain and frequency domain perspective were also studied.
- Presented a further improved version of the clock multiplier with generalized multiplication ratio with lower power and better settling performance.

#### 5.2 Future Work

For the phase noise cancellation part, we found that there are some limitations on this idea. The biggest one is that the achievable phase noise cancellation depth is limited by the delay line noise floor, since when we cancelling the phase noise we also adding the new ones, which contributed mainly from the delay line. Therefore, if we want to pursue higher performances, we need to have a low-noise delay line such as slow-wave transmission line [35], rather than active circuit intensive inverter-chain delay line. Moreover, we did attempt to implement a exotic delay line with high group delay by a high-Q tank realized by N-path filtering [36–39], but the high delay can achieved only with low phase noise LO. This produces a logical contradiction, since if we have such a clean LO, than we don't even need the phase noise cancellation by the delay line. Nevertheless, the concept itself (i.e. the delay by high-Q N-path filter) is correct, and the problem is how to make it work with the noisy LO.

There are also some interesting areas to further refine the spur calibration. Such as how to design a spur monitoring scheme that can check the spur level by itself and then do re-calibration when needed. Another direction is that the digital spur calibration runs at full-rate of the input clock, which limits its application frequency to less than 10GHz. To mitigate this issue, we can explore other correction scheme that does not require per-edge correction but still achieves similar performance. Or we can try to use custom design output MUX instead of using standard cell to push the clock frequency higher.

# **Bibliography**

- [1] J. Lee, P. Chiang, P. Peng, L. Chen, and C. Weng, "Design of 56 Gb/s NRZ and PAM4 SerDes Transceivers in CMOS Technologies," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 9, pp. 2061–2073, 2015.
- [2] S.-K. Wong, "Performance Analysis of Direct Conversion and Heterodyne RF Transceiver for W-CDMA User Equipment," Oct. 2003.
- [3] B. Razavi, RF Microelectronics. Upper Saddle River, NJ, 2011.
- [4] A. Hajimiri and T. H. Lee, "A General Theory of Phase Noise in Electrical Oscillators," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 179–194, 1998.
- [5] R. W. Lowdermilk and F. J. Harris, "Vector Signal Analyzer Implemented as a Synthetic Instrument," *IEEE Transactions on Instrumentation and Measurement*, vol. 58, no. 2, pp. 281–290, 2009.
- [6] M. S.-W. Chen, "Tutorial T6: Digital Fractional-N Phase Locked Loop Design," in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 519– 521.
- [7] D. V. Widder, Laplace transform (PMS-6). Princeton university press, 2015.
- [8] Y. Li, M. Mar, B. Nikolić, and A. M. Niknejad, "On-chip spur and phase noise cancellation techniques," in 2017 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2017, pp. 109–112.
- [9] Y. Li and A. M. Niknejad, "A 138fsrms-Integrated-Jitter and 249dB-FoM Clock Multiplier with -51dBc Spur Using A Digital Spur Calibration Technique in 28-nm CMOS," in 2019 Symposium on VLSI Circuits, 2019, pp. C42-C43.
- [10] F. Aflatouni, M. Bagheri, and H. Hashemi, "Design Methodology and Architectures to Reduce the Semiconductor Laser Phase Noise Using Electrical Feedforward Schemes," *IEEE Transactions on Microwave Theory and Techniques*, vol. 58, no. 11, pp. 3290– 3303, 2010.
- [11] D. Scherer, "The art of phase noise measurement," RF Microwave Measurement Symposium and Exhibition, Hewlett-Packard, 1985.

BIBLIOGRAPHY 53

[12] A. Imani and H. Hashemi, "An FBAR/CMOS Frequency/Phase Discriminator and Phase Noise Reduction System," *IEEE Transactions on Microwave Theory and Techniques*, vol. 63, no. 5, pp. 1658–1665, 2015.

- [13] S. Min, T. Copani, S. Kiaei, and B. Bakkaloglu, "A 90-nm CMOS 5-GHz Ring-Oscillator PLL With Delay-Discriminator-Based Active Phase-Noise Cancellation," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 5, pp. 1151–1160, 2013.
- [14] S. Hao, T. Hu, and Q. J. Gu, "A 10 GHz delay line frequency discriminator and PD/CP based CMOS phase noise measurement circuit with 138.6 dBc/Hz sensitivity at 1 MHz offset," in 2015 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2015, pp. 63–66.
- [15] S. Hao and Qun Jane Gu, "A 10 GHz phase noise filter with 10.6 dB phase noise suppression and 116 dBc/Hz sensitivity at 1 MHz offset," in 2016 IEEE MTT-S International Microwave Symposium (IMS), 2016, pp. 1–4.
- [16] F. Fardner, "Phaselock Techniques," New York: WileySons, 1970.
- [17] A. Elkholy, M. Talegaonkar, T. Anand, and P. Kumar Hanumolu, "design and analysis of low-power high-frequency robust sub-harmonic injection-locked clock multipliers," *IEEE Journal of Solid-State Circuits*,
- [18] Sheng Ye, L. Jansson, and I. Galton, "A multiple-crystal interface PLL with VCO realignment to reduce phase noise," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 12, pp. 1795–1803, 2002.
- [19] J. Lee and H. Wang, "Study of Subharmonically Injection-Locked PLLs," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 5, pp. 1539–1553, 2009.
- [20] B. M. Helal, C. Hsu, K. Johnson, and M. H. Perrott, "A Low Jitter Programmable Clock Multiplier Based on a Pulse Injection-Locked Oscillator With a Highly-Digital Tuning Loop," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 5, pp. 1391–1400, 2009.
- [21] C. Liang and K. Hsiao, "An injection-locked ring PLL with self-aligned injection window," in 2011 IEEE International Solid-State Circuits Conference, 2011, pp. 90–92.
- [22] Y. Huang and S. Liu, "A 2.4-GHz Subharmonically Injection-Locked PLL With Self-Calibrated Injection Timing," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 2, pp. 417–428, 2013.
- [23] I. Lee, Y. Chen, S. Liu, C. Jou, F. Hsueh, and H. Hsieh, "A divider-less sub-harmonically injection-locked PLL with self-adjusted injection timing," in 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, 2013, pp. 414–415.

BIBLIOGRAPHY 54

[24] J. Chien, P. Upadhyaya, H. Jung, S. Chen, W. Fang, A. M. Niknejad, J. Savoj, and K. Chang, "2.8 A pulse-position-modulation phase-noise-reduction technique for a 2-to-16GHz injection-locked ring oscillator in 20nm CMOS," in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 52–53.

- [25] H. Chang, Y. Yeh, Y. Liu, M. Li, and K. Chen, "A Low-Jitter Low-Phase-Noise 10-GHz Sub-Harmonically Injection-Locked PLL With Self-Aligned DLL in 65-nm CMOS Technology," *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 3, pp. 543–555, 2014.
- [26] D. Dunwell and A. C. Carusone, "Modeling Oscillator Injection Locking Using the Phase Domain Response," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 11, pp. 2823–2833, 2013.
- [27] P. Maffezzoni and S. Levantino, "Phase Noise of Pulse Injection-Locked Oscillators," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 10, pp. 2912–2919, 2014.
- [28] S. Yoo, S. Choi, T. Seong, and J. Choi, "An Ultra-Low Power and Compact *LC*-Tank-Based Frequency Tripler Using Pulsed Input Signals," *IEEE Microwave and Wireless Components Letters*, vol. 26, no. 2, pp. 140–142, 2016.
- [29] K. Sugahara, K. Hayashi, K. Hirano, and S. Mitra, "N-path digital filters," in ICASSP '84. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 9, 1984, pp. 460–463.
- [30] C. Wang, P. Wang, L. Ke, D. Yu, B. Ong, C. Sun, H. Chen, Y. Chen, C. Kuo, J. Lin, T. Wang, and Y. Chen, "A direct digital frequency modulation PLL with all digital on-line self-calibration for quad-band GSM/GPRS transmitter," in 2009 Symposium on VLSI Circuits, 2009, pp. 190–191.
- [31] G. Marzin, S. Levantino, C. Samori, and A. L. Lacaita, "A 20 Mb/s Phase Modulator Based on a 3.6 GHz Digital PLL With 36 dB EVM at 5 mW Power," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 12, pp. 2974–2988, 2012.
- [32] A. Elkholy, A. Elmallah, M. G. Ahmed, and P. K. Hanumolu, "A 6.75–8.25-GHz 250-dB FoM Rapid ON/OFF Fractional-N Injection-Locked Clock Multiplier," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 6, pp. 1818–1829, 2018.
- [33] W. Chang, P. Huang, and T. Lee, "A Fractional-N Divider-Less Phase-Locked Loop With a Subsampling Phase Detector," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 12, pp. 2964–2975, 2014.
- [34] K. Raczkowski, N. Markulic, B. Hershberg, and J. Craninckx, "A 9.2–12.7 GHz Wideband Fractional-N Subsampling PLL in 28 nm CMOS With 280 fs RMS Jitter," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 5, pp. 1203–1213, 2015.
- [35] H. Cho, T. Yeh, S. Liu, and C. Wu, "High-performance slow-wave transmission lines with optimized slot-type floating shields," *IEEE Transactions on Electron Devices*, vol. 56, no. 8, pp. 1705–1711, 2009.

BIBLIOGRAPHY 55

[36] A. Mirzaei and H. Darabi, "Analysis of imperfections on performance of 4-phase passive-mixer-based high-q bandpass filters in saw-less receivers," ( *IEEE Transactions on Circuits and Systems I: Regular Papers*,

- [37] A. Mirzaei, H. Darabi, and D. Murphy, "architectural evolution of integrated m-phase high-q bandpass filters," *IEEE Transactions on Circuits and Systems I: Regular Papers*,
- [38] E. A. M. Klumperink, H. J. Westerveld, and B. Nauta, "N-path filters and mixer-first receivers: A review," in 2017 IEEE Custom Integrated Circuits Conference (CICC), 2017, pp. 1–8.
- [39] A. Ghaffari, E. A. M. Klumperink, M. C. M. Soer, and B. Nauta, "Tunable High-Q N-Path Band-Pass Filters: Modeling and Verification," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 5, pp. 998–1010, 2011.