Energy-Efficient 60GHz Phased-Array Design for Multi-Gb/s Communication Systems

Lingkai Kong

Electrical Engineering and Computer Sciences
University of California at Berkeley

Technical Report No. UCB/EECS-2014-191
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-191.html

December 1, 2014
Energy-Efficient 60GHz Phased-Array Design for Multi-Gb/s Communication Systems

by

Lingkai Kong

A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

in

Electrical Engineering and Computer Science

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Elad Alon, Chair
Professor Ali M. Niknejad
Professor Paul K. Wright

Fall 2012
Energy-Efficient 60GHz Phased-Array Design for Multi-Gb/s Communication Systems

Copyright 2012
by
Lingkai Kong
Abstract

Energy-Efficient 60GHz Phased-Array Design for Multi-Gb/s Communication Systems

by

Lingkai Kong

Doctor of Philosophy in Electrical Engineering and Computer Science

University of California, Berkeley

Professor Elad Alon, Chair

Recent advance in wireless technologies has enabled rapid growth of mobile devices. Consequently, emerging applications for mobile devices have begun demanding data rates up to multiple Gb/s. Although advanced WiFi systems are approaching such data rates, the narrow bandwidth at ISM band fundamentally limits the achievable data-rate. Therefore, the unlicensed 7GHz of bandwidth at 60GHz band provides an opportunity to efficiently implement these communication systems with a potential to achieve >10Gb/s throughput. Besides the wider bandwidth, operating at higher frequency theoretically has higher achievable signal-to-noise ratio in area limited applications. This is because the maximum achievable antenna gain within limited aperture increases with frequency and it can be achieved using phased-array technique. This thesis therefore focuses on the design of 60GHz phased-array transceivers to support energy-efficient high data-rate communication systems.

Despite the advantages of 60GHz, mobile applications often require low power consumption as well as low cost implementation, making the design of 60GHz phased-array systems challenging. Taking into account the limited power budget, this research investigates the design choices of the number of elements in phased-array transceivers, and identifies that the overhead power is the bottleneck of energy efficiency. In order to reduce the overhead power in the transmitter, a new architecture using a fast start-up oscillator is proposed, which eliminates the need of explicit modulator and 60GHz LO delivery. Measurements has shown that the transmitter efficiency is boosted by more than 2X. More importantly, the overhead power is significantly reduced down to 2mW, making this architecture a good candidate for large number phased-array. On the other hand, suffering from the similar overhead problem, the receiver unfortunately could not share the same architecture. A different architecture that stacks the mixer on top of LO generation is thus proposed to reduce the power consumption in the receiver. This approach demonstrated a 2X power reduction in receiver overhead, and the resulted optimum number of receiver elements is close to 4.

Besides using CMOS technologies, on-chip antenna is also studied in order to further reduce the system cost. Slot-loop antenna is identified as a good candidate because that its intrinsic ground plane eases the integration with the rest of circuitry. Although the
simulation shows an efficiency as high as 30%, the planar nature of the on-chip antenna limits its coverage in end-fire directions. Antenna diversity is thus proposed to overcome this limitation by utilizing multiple drive points on the same antenna. Because the antenna is fully integrated on-chip, antenna diversity can be implemented without extra high frequency I/Os, eliminating the loss that would be introduced otherwise.

Using the proposed transceiver architectures, a 4-element phased-array with on-chip antennas was fabricated on TSMC’s 65nm CMOS technology as a test vehicle. Consuming 50mW in the transmitter and 65mW in the receiver, this 10.4Gb/s phased-array covers a range larger than 45cm in all directions. This achieves a state-of-art energy-efficiency of 11pJ/bit. The 29mW/element power consumption also demonstrates the lowest power of a single phased-array element.
To My Parents
# Contents

## Contents

List of Figures v

List of Tables viii

1 Introduction 1
   1.1 The 60GHz Band 2
   1.2 Link Budget with Limited Area 3
   1.3 Design Challenges 3
   1.4 Structure of Thesis 4

2 Phased-Array Architectures for Energy-Efficient Communication 5
   2.1 Introduction to Phased-Array 5
   2.2 Choice of Number of Elements in a Phased-Array 7
      2.2.1 Transmitter 7
      2.2.2 Receiver 9
   2.3 Phased-Array Architectures 11
      2.3.1 RF Phase Shifting 11
      2.3.2 LO Phase Shifting 14
      2.3.3 Baseband Phase Shifting 15
      2.3.4 Transmitter Phased-Array Architecture 16
   2.4 Phase Resolution 16
   2.5 Modulation Scheme 18
   2.6 Summary 19

3 Implementation of Baseband Phase Shifting 20
   3.1 Implementation of Phase Shifter Resolution 20
   3.2 Effect of I/Q Mismatch 21
   3.3 Implementations of VGAs 22
      3.3.1 Variable Current Source 22
      3.3.2 Segmentation 23
4 Energy-Efficient Phased-Array Transmitter

4.1 Low Overhead Transmitter Architecture

4.2 Proposed Transmitter Architecture

4.3 Circuit Implementation

4.3.1 Oscillator Implementation

4.3.1.1 Oscillator Startup Time

4.3.1.2 Faster Startup

4.3.1.3 Shut-down of Oscillation

4.3.2 Power Amplifier Design

4.4 Timing Generation

4.5 Transmitter Demonstration

4.5.1 Waveform Measurement

4.5.2 Phased-Array Functionality Validation

4.6 Conclusion

5 Energy-Efficient Phased-Array Receiver

5.1 Stacked Mixer with LO Buffer

5.2 Hybrid Design

5.3 Incorporating the Mixer with the 60GHz LO Generation

5.3.1 Stacked 30GHz VCO with Mixer

5.3.2 Stacked Push-Push with Mixer

5.3.2.1 Optimization of Push-Push Inductance

5.3.2.2 Sharing LO between Elements

5.3.2.3 Sizing of Push-Push

5.3.3 30GHz Generation

5.4 Receiver Demonstration

5.4.1 Bandwidth Measurement

5.4.2 LO Performance

5.5 Summary

6 On-Chip mm-Wave Antennas

6.1 On-Chip Antenna Efficiency

6.1.1 Slot-Loop Antenna Design

6.1.1.1 Diameter

6.1.1.2 Gap Width

6.1.1.3 Substrate Thickness

6.2 Multiple-Access On-Chip Antenna
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>6</td>
<td>6.2.1 Antenna Multiplexing</td>
<td>75</td>
</tr>
<tr>
<td>6</td>
<td>6.2.2 Reducing Loading Effect</td>
<td>76</td>
</tr>
<tr>
<td>6</td>
<td>6.3 Measurement Results</td>
<td>77</td>
</tr>
<tr>
<td>6</td>
<td>6.3.1 Measured Transmitter Power and Receiver Gain</td>
<td>78</td>
</tr>
<tr>
<td>6</td>
<td>6.3.2 Antenna Pattern Measurement</td>
<td>80</td>
</tr>
<tr>
<td>6</td>
<td>6.4 Summary</td>
<td>80</td>
</tr>
<tr>
<td>7</td>
<td>Fully Integrated 4-Element Phased-Array</td>
<td>82</td>
</tr>
<tr>
<td>7</td>
<td>7.1 Phased-Array Measurement</td>
<td>83</td>
</tr>
<tr>
<td>7</td>
<td>7.1.1 Transmitter Phased-Array Measurement</td>
<td>83</td>
</tr>
<tr>
<td>7</td>
<td>7.1.2 Receiver Phased-Array Measurement</td>
<td>84</td>
</tr>
<tr>
<td>7</td>
<td>7.2 Link Measurement</td>
<td>84</td>
</tr>
<tr>
<td>7</td>
<td>7.3 Summary</td>
<td>86</td>
</tr>
<tr>
<td>8</td>
<td>Conclusions</td>
<td>89</td>
</tr>
<tr>
<td>8</td>
<td>8.1 Thesis Summary</td>
<td>89</td>
</tr>
<tr>
<td>8</td>
<td>8.2 Future Directions</td>
<td>91</td>
</tr>
<tr>
<td></td>
<td>Bibliography</td>
<td>92</td>
</tr>
</tbody>
</table>
List of Figures

1.1 Emerging wireless applications at mm-wave band. (Source: Wireless Gigabit Alliance (WiGig).) ...................................................... 1
1.2 Data rate of 802.11 standards. (Source: IEEE 802.11.) ................. 2
2.1 Operating principle of a phased array transmitter. ................................ 6
2.2 Principle of phased array II: peak and null........................................ 7
2.3 SNR improvement versus number of receiver elements. ...................... 11
2.4 Phased-array architectures: RF, LO and baseband phase shifting. ........ 12
2.5 Directivity at different directions with different phase shifter resolution. 17
2.6 Peak to side lobe ratio at different directions with different phase shifter resolution. 17
2.7 Bit error rate (BER) versus peak SNR. ........................................... 18
3.1 Baseband phase shifter utilizing variable gain amplifier. ...................... 21
3.2 Variable gain amplifier controlled by variable bias current. .................. 24
3.3 Variable gain amplifier controlled by segmentation. ........................ 25
3.4 Phase shifter constellations using (a) VGA with variable current source (b) segmentation. ................................................................. 25
3.5 Phase shifter constellations using (a) straightforward I/Q summation and (b) proposed I/Q partial sharing scheme. ....................... 26
3.6 Phase shifter implementation of (a) conventional (b) proposed architectures. 27
3.7 Quadrature implementation using (a) butterfly switches (b) 4-to-1 MUX. 28
3.8 Measurement result of a phase shifter with 5-bit per quadrant resolution. 29
4.1 Transmitter architectures for (a) conventional and (b) proposed approaches. . 31
4.2 Oscillator phase modulation by baseband delay. ............................. 32
4.3 Oscillator phase modulation by baseband delay to support phased-array functionality. ................................................................. 32
4.4 Transmitter spectrum of (a) proposed architecture and (b) conventional architecture. ................................................................. 33
4.5 Transmitter block diagram. ............................................................. 34
4.6 Cross coupled oscillator and its small signal model. .......................... 35
4.7 Oscillator startup with (a) $Q_0 = 5$ and (b) $Q_0 = 0.25$. .................. 37
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>6.3</td>
<td>Simulated efficiency versus antenna diameter.</td>
<td>72</td>
</tr>
<tr>
<td>6.4</td>
<td>Simulation setup of antenna gap, feed line, and port.</td>
<td>72</td>
</tr>
<tr>
<td>6.5</td>
<td>Simulated S11 bandwidth versus gap width.</td>
<td>73</td>
</tr>
<tr>
<td>6.6</td>
<td>Simulated radiation efficiency versus silicon thickness.</td>
<td>74</td>
</tr>
<tr>
<td>6.7</td>
<td>Slot loop antenna and its radiation pattern.</td>
<td>74</td>
</tr>
<tr>
<td>6.8</td>
<td>Antenna diversity using two driving ports.</td>
<td>75</td>
</tr>
<tr>
<td>6.9</td>
<td>Antenna switching as well as T/R switch.</td>
<td>75</td>
</tr>
<tr>
<td>6.10</td>
<td>Input multiplexer embedded in low noise amplifier.</td>
<td>76</td>
</tr>
<tr>
<td>6.11</td>
<td>(a) Schematic of source degenerated LNA and its impedance with bias (b) on and (c) off.</td>
<td>77</td>
</tr>
<tr>
<td>6.12</td>
<td>(\lambda/4) transmission line inserted for impedance transformation while also serving the purpose of routing the antenna ports to a single location.</td>
<td>78</td>
</tr>
<tr>
<td>6.13</td>
<td>Antenna measurement setup.</td>
<td>78</td>
</tr>
<tr>
<td>6.14</td>
<td>Transmitter radiated power versus distance.</td>
<td>79</td>
</tr>
<tr>
<td>6.15</td>
<td>Receiver gain versus distance.</td>
<td>80</td>
</tr>
<tr>
<td>6.16</td>
<td>Transmitter radiation pattern with different configurations; receiver antenna pattern with different configurations.</td>
<td>81</td>
</tr>
<tr>
<td>7.1</td>
<td>Die photo of complete transceiver.</td>
<td>82</td>
</tr>
<tr>
<td>7.2</td>
<td>Transmitter antenna array pattern inside PCB plane.</td>
<td>83</td>
</tr>
<tr>
<td>7.3</td>
<td>Receiver antenna array pattern inside PCB plane.</td>
<td>84</td>
</tr>
<tr>
<td>7.4</td>
<td>Link measurement setup of (a) end-fire direction and (b) broadside direction.</td>
<td>85</td>
</tr>
<tr>
<td>7.5</td>
<td>Receiver eyediagram of I-channel data at 5.2GS/s.</td>
<td>85</td>
</tr>
<tr>
<td>7.6</td>
<td>BER measurement versus distance, different directions.</td>
<td>86</td>
</tr>
<tr>
<td>7.7</td>
<td>Link budget calculation with 4-TX/4-RX phased-array at 0.5m distance.</td>
<td>87</td>
</tr>
</tbody>
</table>
List of Tables

4.1 Comparison table of transmitter design. .......................... 46
7.1 Comparison table of mm-wave transceiver designs. ............... 88
Acknowledgments

My life at UC Berkeley has been a wonderful journey and I was fortunate to have many great companions throughout this five and half years. This work would not have been possible without their help and support, so I would like to take this opportunity to express my gratitude to each and every of them.

First I would like to thank my advisor Prof. Elad Alon. Coming from a communication background, I have learned most of my circuit knowledges from him, either during his wonderful lectures or from the discussions with him. During our discussions, I was always amazed that he could decompose my research problems into the most fundamental pieces and provide the most effective solutions. Without all these inspiring conversations, this work would not have been possible.

I would like to thank Prof. Ali M. Niknejad for his extensive guidance and feedback over numerous projects. His suggestions and support on my job search have been invaluable. I would also like to thank Prof. Ahmad Bahai and Prof. Paul K. Wright for being a part of my qualification exam committee and providing valuable feedback.

The multiple chip designs wouldn’t be possible without my colleagues’ support and collaboration. It was truly my honor and pleasure to work with Bagher Afshar, Jiashu Chen, Debopriyo Chowdhury, Antoine Frappe, Kwangmo Jung, Shinwon Kang, Yue Lu, Cristian Marcu, Jungdong Park, Dongjin Seo, Maryam Tabesh, Chintan Thakkar, and Yanjie Wang.

I have also learnt more than I could have imagined during my research in BWRC, and most of it is because of my classmates and other senior colleagues: Ehsan Adabi, Louis Alarcon, Amin Arbabian, Omar Bakr, David Chen, Stanley Chen, John Crossley, Zhiming Deng, Yida Duan, Simone Gambini, Tsung-Te Liu, Michael Mark, Rikky Muller, Ji-Hoon Park, Jesse Richmond, Dusan Stepanovic and Zhengya Zhang.

BWRC has been a wonderful working place and it would not be this way without the kind support from all the staff at the center. I would like to express my gratitude to all of them but especially to Tom Boot, Gary Kelson, Susan Mellers and Brian Richards for their support on my project.

Finally I would like to acknowledge the love and support from my parents Yiqin and Xiangsheng. Without their love, encouragement and support, I could have never been where I am now, and for that I dedicate this thesis to them.
Chapter 1

Introduction

In the past decade, wireless technology has had a significant impact on our daily life. Ubiquitous wireless connectivity provides us connectivity anywhere at anytime. Moving forward, emerging applications including wireless docking, wireless HD streaming, and near-field file sharing/exchange will demand data rates of up to multiple Gb/s. In order to enable these applications on mobile platforms such as smartphones and tablets, the cost as well as energy consumption of these systems have to be minimized while achieving higher performance. This research therefore investigates design trade-offs in such systems and develops techniques to achieve these goals.

Figure 1.1: Emerging wireless applications at mm-wave band. (Source: Wireless Gigabit Alliance (WiGig).)
1.1 The 60GHz Band

The wireless industry has been pushing the envelope of data rate for years. Based on the IEEE 802.11 standard operating in the 2.4GHz ISM band, existing WLAN solutions provide service at data rate as high as 150Mb/s. The upcoming 802.11ac utilizes 160MHz bandwidth at the 5GHz ISM band and improves data rate up to 877Mb/s for mobile applications in the range of 10s of meters (Fig. 1.2). However, the achievable data rate is limited by the available bandwidth in these bands.

![Graph showing the data rate of 802.11 standards](image)

Figure 1.2: Data rate of 802.11 standards. (Source: IEEE 802.11.)

The 7GHz unlicensed bandwidth at 60GHz therefore provides a good opportunity to achieve much higher data rate. Despite its wide bandwidth, the fractional bandwidth is only ~10%, allowing a conventional implementation using tuned circuitry. Notice that the 60GHz was abandoned before due to its extra loss in air for long-haul applications. However, for the short-range applications we are interested in, the additional air loss (0.002dB/m) [1, 2] is negligible.
1.2 Link Budget with Limited Area

One of the concerns of using mm-wave frequency is its high free-space-path-loss (FSPL). In area-limited applications, however, moving up in spectrum may actually improve the efficiency despite its higher path loss. This is because the reduced wavelength at high frequency increases the achievable gain of an antenna within a limited area. As shown in [3], for commonly used antennas, including dipole, loop, and etc., the maximum gain is related to the physical size by

\[ G_{\text{max}} = \frac{4\pi A}{\lambda^2} \]  

where \( A \) is the physical size of the antenna or the aperture and \( \lambda \) is the wavelength. Plugging the gain of the antenna into the Friis transmission equation [4], the signal-to-noise ratio (SNR) can be derived as:

\[ \text{SNR} = \frac{P_{TX}G_{TX}G_{RX}}{N_o} \left( \frac{\lambda}{4\pi d} \right)^2 \]

\[ = \frac{P_{TX}A_{TX}A_{RX}\lambda^2}{d^2N_o} \]  

where \( P_{TX} \) is the transmitted power; \( N_o \) is the effective noise power at the receiver input and \( d \) is the communication distance.

Equation (1.2) clearly shows that, with constrained area for the antennas, the SNR improves with frequency quadratically. Thus, besides wide bandwidth, this proves the potential for high SNR at 60GHz, making it an even better candidate for an energy-efficient high-data-rate wireless system.

1.3 Design Challenges

The benefit of higher SNR provided in equation (1.2) could be potentially eliminated if the system is not well designed. First, the maximum antenna gain can only be achieved when the entire area is efficiently utilized. To make full use of the aperture, phased-array techniques can be employed, where multiple transmitters and receivers are implemented with independent antennas. Originally proposed for military applications, most phased-array systems simply replicate the transceivers with extra phase-shifting functionality, resulting in a rather high power consumption[5, 6]. Therefore, this research focuses on optimizing the performance of phased-array systems within a limited power budget.

Secondly, operating at mm-wave frequency challenges the performance of existing devices. SiGe technologies have successfully demonstrated transceivers at these frequencies [7, 8], but suffer from high power consumption as well as high cost when integrated with existing digital signal processing components, which are usually implemented in CMOS. On the other hand, typical CMOS technologies today provide a maximum oscillation frequency \( f_{\text{max}} \) of around
250GHz, which is only 4 times higher than the operating frequency of the RF front-end. When operating at such frequencies close to $f_{\text{max}}$, transistors tend to be less efficient in delivering power as well as more noisy. As shown in equation (1.2), these two effects will significantly reduce the SNR. Although technology advances have provided us transistors with higher and higher $f_{\text{max}}$, the speed of improvement has slowed down recently. With this limited device performance, circuit level techniques at mm-wave frequencies are necessary to achieve our requirements with high energy-efficiency. Significant efforts to reduce power consumption have been made recently [9, 10, 11]. Employing techniques like high-impedance matching and etc., these designs have successfully reduced power consumption of single element transceiver to 300mW~400mW. This thesis carries these existing techniques and describes more circuit level techniques dedicated to phased-array design, in order to further improve the energy efficiency.

1.4 Structure of Thesis

As discussed earlier, this thesis begins with a discussion of phased-array architectures in Chapter 2, including a study of the number of elements as well as phase-shifting schemes and phase resolution requirements. Overhead power from phase shifters, modulator in the transmitter, and downconverter in the receiver is identified as the bottleneck of the system efficiency. Chapter 3 therefore investigates circuit implementations to minimize power consumption of the phase shifters. A transmitter architecture eliminating modulator is proposed in Chapter 4, and a receiver architecture is described in Chapter 5 to reduce the power consumption of the downconverters. Design procedures as well as experimental results are provided. In Chapter 6, on-chip antennas are discussed in order to further reduce the cost of the system. Integrating antenna on die also provides opportunities to achieve antenna diversity for better coverage. To verify these ideas, a complete transceiver is designed and fabricated in TSMC’s 65nm CMOS technology. Measurement results are then shown in Chapter 7, followed by the conclusion for the entire thesis in Chapter 8.
Chapter 2

Phased-Array Architectures for Energy-Efficient Communication

Phased-array techniques essentially provide an electronically steerable antenna to focus energy in certain directions selectively. The implementation of phased-arrays has been well studied in the past for military or scientific usages, where performance is the major driving force and high power consumption is tolerable. In order to enable the applications mentioned in Chapter 1, the energy efficiency of phased-arrays has to be significantly improved. This chapter therefore studies the optimization of the phased-array architecture within a constrained power budget. To enable this analysis, we first revisit the basics of phased-array operation.

2.1 Introduction to Phased-Array

A phased-array system consists of multiple transceiver elements. Each of them has its own antenna and phase shifter that can control the phase of input/output signals independently. The antennas are placed in various array patterns, but 1D or 2D arrays are commonly used for their simplicity. The spacing between antennas is typically $\lambda/2$ to perform Nyquist sampling in space [12].

Fig. 2.1 illustrates the operating principle of a phased-array transmitter. For simplicity, the left two transmitters are considered and each transmitter transmits its own signal through an omnidirectional antenna. In an arbitrary direction $\theta$, due to the physical distance between these two antennas, a phase difference between signals from these two elements occurs. Far away from the antennas, the phase difference can be approximated as:

$$\phi = \frac{2\pi d}{\lambda} \cos \theta.$$  \hfill (2.1)

Here $d$ and $\lambda$ denote the distance between elements and the wavelength, respectively.

Due to this phase difference, if the antennas are driven with the same phase, the E-field generated by these two antennas are not added in phase in this direction, resulting in less...
than the maximum amplitude. However, this phase difference can easily be compensated by phase shifters within the transmitter. Specifically, either phase shifting the signal from the left element by $\phi$ or phase shifting the signal from the right one by $-\phi$ can realign the signals in space. Assuming replicated transmitters, this results in E-field amplitude twice as large as that with a single transmitter. Because energy is proportional to the square of E-field amplitude, the energy delivered in this direction is boosted by a factor of 4 compared to the single antenna case. In a generalized system with $N_{TX}$ elements, the transmitted energy is increased by $N_{TX}^2$.

It is worth noting that the total energy radiated into space is only increased by the number of transmitters. The square law relation is due to the fact that energy is now focused (beam-formed) in this direction. As a result, energy radiated in other directions is reduced. For example, in the direction $\theta'$ that satisfies

$$2 \pi \frac{d}{\lambda} \cos \theta' = \pi - 2 \pi \frac{d}{\lambda} \cos \theta,$$

(2.2)

signals from these two elements are out of phase (Fig. 2.2). Therefore the energy delivered in this direction is zero. Because of this feature, the phased-array technique also provides nulling capabilities to reduce the interference caused to other receivers.

Similar behavior occurs on the receiver side. In the case of two elements, the received signal in the desired direction is enhanced by a factor of 4 compared with the single element
case. However, unlike the transmitter where signal power is the only metric, noise power affects the receiver significantly. Assuming the noise from two different antennas are uncorrelated, they are thus added in power (rather than in voltage). The output noise power is therefore double the single element case. The net effect is that the SNR is improved by a factor of 2 rather than 4. In a generalized case where $N_{RX}$ elements are employed in the receiver, SNR will be enhanced by $N_{RX}$.

## 2.2 Choice of Number of Elements in a Phased-Array

The previous section highlighted that the number of elements in transmitter and receiver have different impact on the system SNR. This section continues the discussion by optimizing the number of elements on each side given a constraint on total power consumption.

### 2.2.1 Transmitter

One widely used metric for transmitters is effective isotropic radiated power (EIRP), which measures the radiated power in a certain direction compared to an ideal isotropic antenna. In the case of a phased-array, EIRP can be calculated in terms of a single transmitter element’s
output power $P_{TX}$, the number of elements $N_{TX}$, and the antenna gain $G_{TX}$ as:

$$EIRP = P_{TX} N_{TX}^2 G_{TX}. \quad (2.3)$$

The DC power consumption of a transmitter usually consists of two components. One component represents the power amplifier stages, which is in general proportional to the output power; the other component has a relatively fixed DC power consumption, representing power from LO generation, LO buffers, baseband, etc. Therefore, the overall power consumption of the transmitter can be written in the form of

$$P_{DC,TX} = \left( \frac{P_{TX}}{\eta_{PA}} + P_{TX,OH} \right) N_{TX}, \quad (2.4)$$

where $\eta_{PA}$ denotes the efficiency of the power amplifier and $P_{TX,OH}$ is the overhead power consumption. Plugging equation 2.4 into equation 2.3, the EIRP can be expressed in this modified format as:

$$EIRP = \left( \frac{P_{DC,TX}}{N_{TX}} - P_{TX,OH} \right) \eta_{PA} N_{TX}^2 G_{TX}
= \frac{P_{DC,TX}^2}{4P_{TX,OH}} \eta_{PA} G_{TX} - \left( N_{TX} - \frac{P_{DC,TX}}{2P_{TX,OH}} \right)^2 P_{TX,OH} \eta_{PA} G_{TX}. \quad (2.5)$$

The above equation shows that, with constrained total DC power consumption $P_{DC,TX}$, the maximum EIRP happens when

$$N_{TX} = \frac{P_{DC,TX}}{2P_{TX,OH}}. \quad (2.6)$$

This optimum condition states that half of the DC power should be consumed by the overhead circuitry, and the other half by the power amplifiers. The maximum achievable EIRP is

$$EIRP_{\text{max}} = \frac{P_{DC,TX}^2}{4P_{TX,OH}} \eta_{PA} G_{TX}. \quad (2.7)$$

Clearly the maximum achievable EIRP is proportional to the efficiency of the PA and antenna gain. More importantly, the overhead power plays a significant role in this equation. As proved here, reducing the overhead power by a factor of 2 will essentially double the achievable EIRP.

One common design practice is to optimize the number of elements with a desired EIRP to achieve the lowest overall DC power consumption. Equation 2.5 can be rewritten to solve this optimization problem easily:

$$P_{DC,TX} = \frac{EIRP}{\eta_{PA} G_{TX}} \frac{1}{N_{TX}} + P_{TX,OH} N_{TX} \quad (2.8)$$

As before, the minimum DC power consumption is achieved when the overhead power equals the power amplifier power consumption. The minimum DC power and the optimum
number of elements are thus:

\[
P_{DC,TX,min} = 2\sqrt\frac{EIRP \times P_{TX,OH}}{\eta_PA_G_TX}
\]

\[
N_{TX,opt} = \sqrt\frac{1}{\eta_PA_G_TX \times P_{TX,OH}}.
\] (2.9)

With a given EIRP, this minimum DC power consumption is limited by the efficiency of the power amplifier and antenna gain, but more importantly, by the overhead power as well. Significant efforts have gone into improving PA efficiency at different power levels [13, 14, 15, 16]; unfortunately the overhead power consumption in phased-arrays is equally important but often overlooked in system designs.

It is worthwhile to mention the exact values of power levels implied by the analysis, to make sure the order of magnitudes are in a reasonable range. In CMOS designs, due to the lossy passive components at high frequencies, the impedance transformation ratio is usually limited to be lower than \( \sim 3 \) before losses rise substantially. With a 50Ω antenna impedance at 1V supply voltage, this limits the output power of a single transmitter to be somewhere between 3dBm and 13dBm with a 1V supply voltage. Techniques like power combining [17] can bring the power level up by another 6dB and lowering down supply can extend the low-end to \( \sim 0dBm \). On the other hand, a typical 60GHz transmitter design includes roughly 10~20 mW overhead power. Assuming an EIRP of 12dBm to achieve a range of 1m, per-element overhead power of 16mW, 30% efficiency in the PA and 30% efficiency in the antenna, the minimum DC power consumption can be evaluated as 100mW with a 3-element array. Each of the transmitters uses a 16mW power amplifier delivering 5mW to the antenna and 1.6mW into space. This is indeed within the range of output power discussed before.

### 2.2.2 Receiver

On the receiver side, because the SNR is proportional to \( N_{RX} \) rather than \( N_{RX}^2 \), the benefit of increasing the number of elements is limited. Furthermore, doubling the SNR would require doubling the number of receiver elements, therefore increasing the receiver power consumption. Assuming we have a system where the receiver noise power is inversely proportional to the DC power consumption, i.e.,

\[
N_o = \frac{N_{o,Device}}{P_{RX,Single}} = \frac{N_{o,Device}N_{RX}}{P_{RX,DC}},
\] (2.10)

the receiver part of the SNR equation can be rewritten as

\[
SNR \propto \frac{G_{RX}P_{RX,DC}N_{RX}^2}{N_{o,device}N_{RX}^2} = \frac{G_{RX}P_{RX,DC}}{N_{o,device}}.
\] (2.11)
Here $N_{o,Device}$ is the noise power normalized with DC power, with a unit of $W^2$. It is not hard to conclude that in the scenario where a fixed power budget is given for the receiver, increasing the number of elements does not improve the SNR.

This surprising result may not hold in practice however, because the assumption that noise is inversely proportional to the DC power consumption is usually not entirely accurate. This is due to the fact that a non-negligible portion of noise is contributed from the antenna impedance, which does not scale with the receiver power consumption. Furthermore, similar to the transmitter, there is an overhead power consumption associated with each receiver that needs to be taken into account. Thus, a more precise model is

$$N_o = N_{o,Antenna} + \frac{N_{o,Device}}{P_{RX,Single} - P_{RX,OH}} = N_{o,Antenna} + \frac{N_{o,Device}N_{RX}}{P_{RX,DC} - N_{RX}P_{RX,OH}},$$  \hspace{1cm} (2.12)

where $P_{RX,OH}$ denotes the overhead power in the receiver chain. This modifies equation 2.11 to be

$$\text{SNR} \propto \frac{N_{RX}(P_{RX,DC} - N_{RX}P_{RX,OH})}{N_{o,Antenna}(P_{RX,DC} - N_{RX}P_{RX,OH}) + N_{o,Device}N_{RX}}.$$ \hspace{1cm} (2.13)

Unlike the case of the transmitter, the optimization of a receiver phased-array is rather complicated, heavily depending on the exact value of the overhead power and noise performance. Based on a sample design in [18], a $\sim 30$mW receiver achieves 7dB noise figure with about 10mW overhead power consumption. Therefore $N_{o,device}$ can be extracted to be roughly $N_{o,Antenna} \times 80$mW. Shown below is a sample calculation based on a total power consumption of 100mW:

$$\text{SNR} \propto \frac{N_{RX}(100mW - N_{RX}P_{RX,OH})}{[100mW + N_{RX}(80mW - P_{RX,OH})]N_{o,Antenna}}.$$ \hspace{1cm} (2.14)

SNR improvement over a single element receiver with zero overhead power is plotted versus number of elements in Fig. 2.3 with different overhead power values. This plot shows that the effect of overhead power consumption is also critical in the receiver. When overhead power is small, the SNR improvement can be as high as 3dB compared to a single element design. When overhead power is 15mW or higher, the maximum improvement is limited to be lower than 1dB. In other words, there is not much motivation to implement a phased-array.

As discussed above, compared to a single element system, phased-array techniques can indeed improve the SNR within a limited power budget. Besides typical metrics including PA efficiency and receiver noise figure, the analysis shows that the overhead power is critical in phased-array designs. Therefore, in Chapters 4 and 5, transceiver architecture targeting low overhead power consumption will be proposed and studied.

It should be noted that even though a receiver phased-array does not improve SNR in certain conditions, it can still help to achieve spatial filtering, which can potentially allow multiple users to share the same medium and improve the system capacity. This emphasizes even further the need to reduce per-receiver overhead power so that systems that require a certain degree of directivity do not suffer from low efficiency.
CHAPTER 2. PHASED-ARRAY ARCHITECTURES FOR ENERGY-EFFICIENT COMMUNICATION

2.3 Phased-Array Architectures

The above analysis uses a simple model for the power consumption of a phased-array transceiver without differentiating it from a single element transceiver. However, the key difference between them is the functionality of phase-shifting and signal combining, which leads to extra power consumption. This extra power consumption can potentially eliminate the benefits of the phased-array. There are multiple architectures implementing this functionality classified based on where phase shifting is achieved. As illustrated in Fig. 2.4, this section discusses the impact on power consumption of the most common three implementations. We will start with the receiver architecture and then discuss the transmitter briefly.

2.3.1 RF Phase Shifting

The RF Phase Shifting approach [19, 20, 21, 22] (Fig. 2.4(a)) modifies the phase on the RF path directly. In the receiver elements, the phase-shifted signals are then combined before...
CHAPTER 2. PHASED-ARRAY ARCHITECTURES FOR ENERGY-EFFICIENT COMMUNICATION

Figure 2.4: Phased-array architectures: RF, LO and baseband phase shifting.

- **RF Phase Shifting**
  - Single LO, Spatial filtering before mixer
  - RF distribution
  - High loss, Large area, Sensitive to gain variation

- **LO Phase Shifting**
  - Insensitive to gain variation
  - LO distribution
  - High loss, Large area, Spatial filtering after mixer

- **BB Phase Shifting**
  - Low loss/power, Small area
  - LO distribution, Spatial filtering after mixer
arriving at the downconverter. Because there is signal loss associated with the phase shifter, it is desired to have the phase shifting done in later stages to avoid noise penalties. In CMOS, varactors are commonly used to achieve such functionality by changing the capacitance. The range and loss of the phase shifting are therefore determined by the range of the varactor \( C_{on}/C_{off} \) as well as the quality factor of the varactor. Assuming the phase shifter works in an impedance environment of \( R_o \) and varactor on-off ratio is denoted as \( F \), a simplified model for phase shifter loss is

\[
G = \frac{R_{cap}}{R_o + R_{cap}}, \tag{2.15}
\]

where \( R_{cap} \) is the effective parallel resistance of the varactor. On the other hand, the range of the phase variation \( \phi \) is related to the capacitance change:

\[
tan\frac{\phi}{2} = \omega R_o (C_{on} - C_{off})/2. \tag{2.16}
\]

Combining the above equations to eliminate \( R_o \), the range and loss can be related as

\[
1 - \frac{G}{\omega R_{cap} C_{avg} \times \frac{F - 1}{F + 1}} = tan\frac{\phi}{2}. \tag{2.17}
\]

This can be further simplified using the quality factor \( Q = \omega R_{cap} C_{avg} \),

\[
\frac{1}{G} = 1 + \frac{1}{Q} \frac{F + 1}{F - 1}tan\frac{\phi}{2}. \tag{2.18}
\]

In a typical 65nm CMOS technology where \( F \approx 3 \) and \( Q \approx 3 \) at 60GHz, the equation can be simplified to

\[
\frac{1}{G} = 1 + \frac{2}{3} \frac{tan\phi}{2}. \tag{2.19}
\]

In a phased-array system, full coverage of 360° is usually required. One can take advantage of differential signaling to cut the range by half. However, the model here assumes single resonance and therefore can not achieve 180° coverage. Cascading two stages of identical phase shifter of 90° can fulfill the requirement, with a loss of

\[
(\frac{1}{G})^2 = (1 + \frac{2}{3} tan45°)^2 = 0.36 \approx 8.9dB. \tag{2.20}
\]

An on-chip inductor will typically lead to an extra 1dB loss, making the total loss roughly 10dB. Prior publications [23, 24] have shown performance close to this prediction with different implementations. On the other hand, a one stage power matched cascode amplifier provides about 10dB gain at 60GHz while consuming \( \sim 3-4\)mW. This gain cancels out the loss of the phase shifter. Therefore, a varactor based RF phase shifter effectively requires approximately 3mW of power.

Besides loss, the bandwidth of the RF phase shifter is also important. Given that it is inserted on to the signal path, the phase shifter needs to handle the bandwidth of the RF signal, which can also be challenging and demand extra power consumption.
Despite all the challenges of RF phase shifting, the advantage of this scheme is that it requires one single mixer, and therefore the LO signal can be generated relatively close to the receiver and does need to be split or routed.

### 2.3.2 LO Phase Shifting

In heterodyne systems, phase shifting can also be achieved on the LO path [5, 6, 25, 26]. This is because the baseband signal is a multiplication of the LO signal and the RF. The received baseband signal with a phase shift of $\phi_{\text{shift,RF}}$ in the RF path is

$$\text{LPF}\{\cos(\omega_{RF}t + \phi_{\text{shift,RF}}) \times \cos(\omega_{LO}t)\} = \cos(\omega_{BB}t + \phi_{\text{shift,RF}}), \quad (2.21)$$

while the baseband signal obtained by a phase shifted LO signal is:

$$\text{LPF}\{\cos(\omega_{RF}t) \times \cos(\omega_{LO}t + \phi_{\text{shift,LO}})\} = \cos(\omega_{BB}t - \phi_{\text{shift,LO}}). \quad (2.22)$$

In the case where $\phi_{\text{shift,LO}} = -\phi_{\text{shift,RF}}$, the output of these two schemes are identical. The requirement for the phase shifter, however, is quite different. Compared with the RF phase shifting architecture, the LO path is less sensitive to the bandwidth because a single tone LO is usually delivered. The bandwidth only translates to conversion gain difference in the mixer while using different LO frequencies and can be compensated easily by boosting up the gain in the LO driver or at the baseband.

The penalty of LO phase shifting, on the other hand, is that it typically consumes significantly higher power. Although the same phase shifter used in RF phase shifting can be used here, the power needed to compensate the loss is much higher. This is because LO path usually handles the highest RF power in receiver chain to hard switch the mixer. Therefore the amplifier compensating the loss needs to provide real power rather than just gain as in the previous case. As an example, assuming the same phase shifter with 10dB insertion loss is used in both schemes. In LO phase-shifting scheme, if the point where phase shifter is inserted handles -5dBm power, extra power consumption is needed to deliver +5dBm, which is about 10mW assuming 30% efficiency. This can be improved by reducing the phase shifting range to 90° and the other quadrants are handled in the baseband assuming a complex modulation scheme. Nevertheless, the LO shifting scheme using varactor based varactors still consumes a higher power consumption than its RF counterpart.

An alternative approach to implement LO phase shifting is to use a Cartesian structure where two LO signals with 90° phase are summed together with different weights [29, 21]. Arbitrary phase of the LO signal can be generated by varying these two weights as:

$$\cos(\omega_{LO}t + \phi_{\text{shift,LO}}) = \cos(\phi_{\text{shift,LO}})\cos(\omega_{LO}t) - \sin(\phi_{\text{shift,LO}})\sin(\omega_{LO}t). \quad (2.23)$$

Compared to a normal LO buffer, the phase shifter consumes $\sqrt{2}$ more power when both LO signal experiences the same weight. This results in a $\sim$40% power penalty per phase shifter. In a complex modulation system, the total power penalty is thus 80% power consumption of a normal LO buffer. Because the LO amplifiers handle a maximum impedance of $\sim 300\Omega$, they typically consume $\sim3$mW power, and therefore $3mW \times 80\% = 2.4mW$ extra power needs to be consumed in the phase shifters.
2.3.3 Baseband Phase Shifting

The other option in heterodyne systems is to implement phase shifting at the baseband [27, 28, 18]. This can be shown again by comparing the original phase shifting result with the alternative approach. Here we shall look at a complex demodulation making both I/Q channel outputs available.

\[ I: \text{LPF}\{ (I(t)\cos(\omega_{RF}t + \phi_{Shift}) + Q(t)\sin(\omega_{RF}t + \phi_{Shift})) \times \cos(\omega_{LO}) \} = I(t)\cos(\phi_{Shift}) + Q(t)\sin(\phi_{Shift}). \]

\[ Q: \text{LPF}\{ (I(t)\cos(\omega_{RF}t + \phi_{Shift}) + Q(t)\sin(\omega_{RF}t + \phi_{Shift})) \times \sin(\omega_{LO}) \} = Q(t)\cos(\phi_{Shift}) - I(t)\sin(\phi_{Shift}). \] (2.24)

The downconverted signal without any phase shifting is

\[ I: \text{LPF}\{ (I(t)\cos(\omega_{RF}t) + Q(t)\sin(\omega_{RF}t)) \times \cos(\omega_{LO}) \} = I(t). \]

\[ Q: \text{LPF}\{ (I(t)\cos(\omega_{RF}t) + Q(t)\sin(\omega_{RF}t)) \times \sin(\omega_{LO}) \} = Q(t). \] (2.25)

In order to obtain the same baseband signal as the phase-shifted case, a simple vector rotation can be achieved by multiplying the received signals with a rotation matrix:

\[
\begin{bmatrix}
\cos(\phi_{shift}) & \sin(\phi_{shift}) \\
-sin(\phi_{shift}) & \cos(\phi_{shift})
\end{bmatrix}
\begin{bmatrix}
I(t) \\
Q(t)
\end{bmatrix}
= \begin{bmatrix}
I(t)\cos(\phi_{shift}) + Q(t)\sin(\phi_{shift}) \\
Q(t)\cos(\phi_{shift}) - I(t)\sin(\phi_{shift})
\end{bmatrix}. \] (2.26)

Unlike RF and LO phase shifting requiring varactors, baseband phase shifting can be implemented using variable gain amplifiers. Despite extra parasitics due to the gain control, the worst case power consumption happens when I/Q channel signals have the same gain, i.e. at 45° phase setting. To achieve the same gain \(G\) as other settings, each channel has a gain of \(G/\sqrt{2}\), resulting in a power consumption of \(\sqrt{2}\) times a normal baseband amplifier power. The extra power consumption is therefore roughly 40% of a baseband amplifier per I/Q channel and 80% in total. Depending on the loading at baseband, the required power consumption of a differential amplifier is known to be

\[ I_{DC} = \frac{GBW \times V^* C_L}{1 - GBW \gamma/\omega_T}, \] (2.27)

where GBW is the required gain bandwidth of the amplifier and \(\gamma\) is the ratio between drain capacitance and gate capacitance. Therefore with 10GHz gain bandwidth, 100GHz \(f_T\) at a \(V^*\) of 200mV will result in about 13mW/pF power consumption. This implies that even when driving 100fF load, the extra power consumption due to the phase shifting is only \(0.8 \times 1.3mW \approx 1mW\), significantly smaller than the other candidates.

One benefit of RF phase shifting scheme is that the interference from non-desired directions is canceled out before the mixer; in contrast, the interfere proceeds towards the mixer in LO and BB phase shifting schemes. Because the mixer is normally a highly non-linear block, it can potentially create many undesired mixing products between interference and
the desired input signal. This is an issue in many radar systems to avoid intentional jammer. However, in a collaborative scenario where all system use beam-forming techniques, interference should be minimal.

Therefore, the rest of this thesis will focus on the design of baseband phase shifting scheme due to its promising power consumption, and a detailed design study will be demonstrated in the next chapter.

2.3.4 Transmitter Phased-Array Architecture

The comparison in the transmitter is even simpler than the receiver. This is because the RF path in the transmitter always handles high power levels, and thus there is no difference between LO and RF phase shifting. Therefore any loss on the RF path will turn out to be a large power penalty. Since the baseband signal is in general digital, baseband phase shifting is straightforward to implement. The interference issue mentioned in the receiver part does not exist in the transmitter, which makes baseband phase shifting definitively the best candidate.

2.4 Phase Resolution

In the discussion above, it was assumed that the phase can be adjusted with infinite resolution. However, all implementations have limited phase shifting resolutions. In particular, the finite resolution in phase shifter might affect the phase alignment at desired direction, causing less than maximum radiation and directly affecting the SNR. It is therefore important to quantify what resolution is needed in phase shifters.

There are multiple criteria to select the number of bits in phase. One of the most commonly used ones is based on the peak directivity [12]. This is defined as the ratio between radiation intensity in the desired direction and the total radiated power. Fig. 2.5 shows the directivity when pointing at different directions with different resolutions in the phase shifters. With an 8-element phased-array as an example, when the resolution is low, instead of getting the desired 9dB directivity, the directivity is reduced by the misalignment in phase. When the resolution is 5-bit, however, there is a negligible difference between the achievable directivity and the ideal case.

Another common measure of phased-array performance is to measure the ratio between peak directivity and the first side lobe. This provides information about how well the phased-array rejects interfering signals. Similar to before, an 8-element array is used here as a demonstration and 5-bit resolution also shows sufficient resolution for this purpose.

The resolution requirement is actually dependent on the number of elements. A detailed analysis can be found in [30]. Intuitively, this is because the quantization noise is averaged and thus smeared out when multiple of them are added together. However, based on the simulated system performance, for a reasonable size array (4-16 elements), 5-bit resolution is usually sufficient, which translates to a $\sim 10^\circ$ phase step.
Figure 2.5: Directivity at different directions with different phase shifter resolution.

Figure 2.6: Peak to side lobe ratio at different directions with different phase shifter resolution.
CHAPTER 2. PHASED-ARRAY ARCHITECTURES FOR ENERGY-EFFICIENT COMMUNICATION

Figure 2.7: Bit error rate (BER) versus peak SNR.

2.5 Modulation Scheme

Another important design parameter at system level is the modulation scheme. Conventional approaches use high-order modulation scheme to achieve high data rate in a limited bandwidth. In the case of 60GHz, we propose to utilize the entire 7GHz bandwidth to achieve 10Gb/s with QPSK modulation. It is desired to compare different modulation schemes based on the same power consumption. The relation between BER and SNR for different modulation schemes was well studied [31]. Fig. 2.7 modifies the conventional plot by using Peak SNR rather than average SNR of the symbols. This is because the power consumption of the transmitter is often proportional to the peak amplitude rather than the average. As a result, shown in Fig. 2.7, with the same BER of $10^{-10}$, QPSK requires about 8dB lower SNR than 16-QAM. Notice that when targeting the same data rate, the noise power of QPSK is 3dB higher than 16-QAM and 5dB higher than 64-QAM due to its higher bandwidth. QPSK modulation therefore has a net 5dB SNR advantage over 16-QAM, which makes it appealing for our system.
CHAPTER 2. PHASED-ARRAY ARCHITECTURES FOR ENERGY-EFFICIENT COMMUNICATION

2.6 Summary

This Chapter discussed the system level design considerations for phased-array architectures. Besides phase resolution and modulation scheme, our analysis concluded that the key to achieving high efficiency phased-array is to reduce the overhead power consumption in the transceivers. Compared to a single-element transceiver, in a phased-array system, phase shifter is one common overhead building block in both transmitter and receiver. Since our proposed baseband phase shifter is compelling compared to other candidates, the next chapter will discuss in detail the challenges as well as design procedures to implement such a scheme. Techniques for the rest of the transmitter and the receiver are then discussed in the following chapters.
Chapter 3

Implementation of Baseband Phase Shifting

As discussed in Chapter 2, the baseband phase shifting scheme is promising for energy-efficient phased-array systems. This chapter therefore discusses the implementation details at the circuit level. Much like normal amplifiers, a baseband phase shifter should meet certain gain, bandwidth, and noise specifications. The key difference is that the phase shifter needs to provide a certain resolution to adjust the phase of the input signal. Throughout this chapter, this requirement will be elaborated and then mapped to various implementation schemes. Understanding the trade-off helps to select the appropriate scheme in different scenarios. Methods to reduce the parasitics are also proposed and implemented for validation.

3.1 Implementation of Phase Shifter Resolution

Based on equation (2.26), it is straightforward to implement the phase shifter using four variable gain amplifiers (VGAs) and then summing two of the outputs respectively (Fig. 3.1). The gain settings of the variable gain amplifiers are determined by the phase shifting settings, denoted in Fig. 3.1. The resolution of the phase shifting therefore dictates the resolution of the gain control of each VGA. The relation between them is analyzed below. Assuming that only the I-to-I VGA changes its gain with one single step, the phase change is

\[
\Delta \phi = \arctan \left( \frac{A_I + \Delta A_I}{A_Q} \right) - \arctan \left( \frac{A_I}{A_Q} \right)
\]

\[
= \frac{\Delta A_I}{A_Q} \times \arctan' \left( \frac{A_I}{A_Q} \right)
\]

\[
\approx \frac{\Delta A_I / A_Q}{1 + \frac{A_I^2}{A_Q^2}}
\]

\[
= \frac{\Delta A_I \times A_Q}{A_I^2 + A_Q^2}.
\]

(3.1)
where $A_I$ and $A_Q$ are the gain setting of the two amplifiers respectively and $\Delta A_I$ is the single step gain variation.

It is common to keep the amplitude $A_I^2 + A_Q^2$ relatively constant ($= A^2$) so that the gain variation is small. Therefore, given the form of equation 3.1, the phase change is proportional to the gain of the Q-to-I VGA, which is maximized when its gain reaches its peak. This indicates the largest step size in the phase resolution happens when quadrant switching occurs - i.e.:

$$\Delta \phi_{\text{max}} = \frac{\Delta A_I A_{Q,\text{max}}}{A^2} = \frac{\Delta A_I}{A}. \hspace{1cm} (3.2)$$

Assuming a constant step size in VGA gain control, in order to achieve a resolution of $\delta \phi$ with a gain coverage from $-A$ to $A$, the number of steps in the gain control needs to be

$$N_{\text{step}} = \frac{2A_{I,\text{max}}}{\Delta A_I} = \frac{2}{\delta \phi(rad)}. \hspace{1cm} (3.3)$$

In the case of $\delta \phi = 10^\circ = 0.17 rad$, this translates into $N_{\text{step}} = 12$, or less than 4-bit resolution equivalently.

### 3.2 Effect of I/Q Mismatch

The calculation above assumes that the I/Q signals are correctly down-converted to baseband. However, impairments in the RF system can easily violate these assumptions. Fortunately, these errors can also be corrected using the proposed phase shifter by adjusting the gain in the VGAs accordingly. From a design perspective, this will increase the necessary resolution in gain control of the VGA. Quadrature phase error is examined here first.
CHAPTER 3. IMPLEMENTATION OF BASEBAND PHASE SHIFTING

Assuming there is a phase error in the LO signal such that without any phase shifting, the output of the down-converting mixer produces

\[
\text{LPF}\{ (I(t)\cos(\omega t) + Q(t)\sin(\omega t)) \times \cos(\omega t) \} = I(t) \\
\text{LPF}\{ (I(t)\cos(\omega t) + Q(t)\sin(\omega t)) \times \sin(\omega t + \phi_{err}) \} = Q(t)\cos\phi_{err} + I(t)\sin\phi_{err}. \tag{3.4}
\]

Intuitively, this means that the I-channel remains correct but that there is a portion of the I-signal leaking into the Q-channel. In a matrix format, this can be written as

\[
\begin{bmatrix}
I'(t) \\
Q'(t)
\end{bmatrix} =
\begin{bmatrix}
1 & 0 \\
\sin\phi_{err} & \cos\phi_{err}
\end{bmatrix}
\begin{bmatrix}
I(t) \\
Q(t)
\end{bmatrix}. \tag{3.5}
\]

Therefore the original signal \(I(t), Q(t)\) can be obtained by inverting the erroneous matrix.

\[
\begin{bmatrix}
I(t) \\
Q(t)
\end{bmatrix} =
\begin{bmatrix}
1 & 0 \\
\frac{1}{\sin\phi_{err}} & \frac{1}{\cos\phi_{err}}
\end{bmatrix}
\begin{bmatrix}
I'(t) \\
Q'(t)
\end{bmatrix}. \tag{3.6}
\]

This inverted matrix indicates that the gain range of the VGA needs to be extended by \(1/\cos\phi_{err}\) to handle the inversion and the original phase shifting functionality. In other words, with the same phase resolution requirement, the number of steps needs to be extended.

Gain error correction can also be easily supported by modifying the gain of the VGA straightforwardly. Assuming the gain difference is \(A_{err}(< 1)\), the complete transfer function of the phase shifter is the multiplication of the three matrices:

\[
\begin{bmatrix}
I(t) \\
Q(t)
\end{bmatrix} =
\begin{bmatrix}
1 & 0 \\
0 & \frac{1}{A_{err}}
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
-\frac{\sin\phi_{err}}{\cos\phi_{err}} & \frac{1}{\cos\phi_{err}}
\end{bmatrix}
\begin{bmatrix}
\cos(\phi_{shift}) & \sin(\phi_{shift}) \\
-\sin(\phi_{shift}) & \cos(\phi_{shift})
\end{bmatrix}
\begin{bmatrix}
I'(t) \\
Q'(t)
\end{bmatrix}. \tag{3.7}
\]

The number of steps required in each VGA is thus determined by

\[
N_{step} \geq \frac{2}{A_{err} \times \cos\phi_{err} \times \delta\phi}. \tag{3.8}
\]

As a sample design, a phase shifter targeting 3dB gain mismatch, 20° phase error, and 10° phase resolution requires a total of 18 steps. This is just slightly larger than the case where no gain/phase error is present, and a 4-bit gain control is sufficient.

3.3 Implementations of VGAs

3.3.1 Variable Current Source

Variable gain amplifiers can be implemented in a variety of ways [32, 33, 34]. One popular topology implements variable gain by adjusting the tail current of a differential amplifier, as
CHAPTER 3. IMPLEMENTATION OF BASEBAND PHASE SHIFTING

shown in Fig. 3.2. The basic concept here is to change the transconductance of the differential amplifier in order to modify the gain. In a square-law device, the transconductance can be related to the bias current as

\[ g_m = \sqrt{I_{bias} \mu C_{ox} W/L}. \]  

Although the transconductance does indeed vary with the current, the transfer curve from current to transconductance is not linear. Fig. 3.2 plots the simulated transconductance of a 1\(\mu\)m differential pair versus tail current in 65nm CMOS technology, which is close to that predicted for a square law device. Importantly, this transfer curve is technology dependent, and therefore the gain control and phase shifting need to be calibrated before use. In the case where a uniform step-size control is used in the bias current, this non-linearity will increase the step size of transconductance step when current is low. As mentioned earlier, the worst phase step is at the quadrant switching; this additional error will further increase the step size. Using the sample device shown in Fig. 3.2, the resulting constellation is shown in Fig. 3.4(a). Clearly, the phase jump is much bigger when close to the quadrant switching compared with the phase jump at 45\(^\circ\). As a result, if uniform resolution in the current bias control is used, extra resolution has to be added to achieve the same phase resolution. Pre-distortion techniques can be potentially used to correct these effects, but this correction is heavily dependent on the linearity of the transistors. Furthermore, the overdrive voltage of the input devices varies with the bias current. This will result in a code-dependent signal non-linearity, which can potentially degrade the performance significantly.

3.3.2 Segmentation

In order to address these non-linearity issues, the VGA can be implemented by segmenting the amplifier into smaller pieces. The gain of the amplifier is controlled by turning on/off the segments independently (Fig. 3.3) as first proposed in [34]. Because each \(g_m\)-cell is turned on/off separately, the transconductance is linear with code. This eliminates the calibration for gain control. The resulted constellation also has much better resolution at quadrant switching, as shown in Fig. 3.4(b).

3.3.3 Comparison

Compared to the variable current source approach, the segmentation approach clearly has the advantage of higher resolution. Because the overdrive voltage of each transconductance cell is kept constant, the linearity of the signal in a segmented approach can be the same regardless of the gain variation. Therefore, the second approach is more suitable for the receiver phase shifter because it requires to handle both the signal and inter-symbol-interference, and thus demands higher dynamic range. On the transmitter side, however, the first approach is sufficient with pre-distortion [18].

The drawback of the segmented approach, however, is that segmenting the amplifier into multiple pieces often introduces significant parasitic capacitance due to the physical
separation between them. In today’s technologies, the parasitics can easily dominate over the intrinsic capacitance from the devices. Furthermore, with a large number of segments, each device can easily hit the minimum device size. These effects will substantially increase the power consumption required to achieve the same gain bandwidth. The next section therefore dives into the details of the implementation, attempting to reduce these parasitics.

3.4 Power Reduction in Segmented Approach

A straightforward implementation of a phase shifter requires one variable transconductance for the I-channel signal and another one for the Q-channel signal. The highest power consumption occurs when both the I and Q signals are amplified by $g_m R_L / \sqrt{2}$ and the effective gain is only $g_m R_L$. The power consumption is therefore $\sqrt{2}$ larger than a normal amplifier with the same $g_m$. Furthermore, the extra $g_m$ presents capacitive loading on the output, resulting in double the self-loading. On a high-speed signal path, this will significantly reduce signal bandwidth, or increase the power consumption with the same gain bandwidth.

Notice that as shown in Fig. 3.5, all of the $g_m$ cells will never be used at the same time in the phase shifter. In this particular example, segmentation of 16-unit $g_m$ cells on each I/Q channel are implemented. The black dots on this constellation are the operating points.
selected for the phase shifter to achieve <1dB gain variation. The grey dots on the top-right corner represents the achievable constellation points that are not used due to their excess gain. In particular, the right-top corner point that represents all 32 cells being on is never used. The maximum number of $g_m$ cells used at the same time in this scheme is only 24. This indicates that the total number of cells can be reduced by 25%, which would result in
improved power consumption.

A partial sharing scheme is therefore proposed to more effectively partition and utilize the $g_m$ cell, as illustrated in Fig. 3.6 [9, 33]. The conventional approach (a) uses 16 cells on each side, while the proposed implementation splits the $g_m$ cells into three groups. In the first group, the input signal can be selected between I and common mode, achieving a variable gain amplifier by itself. Similarly, the third group selects between Q and common mode. The second group, however, implements a selection between the I signal and the Q signal, effectively reusing the same $g_m$ cell when the other side is not at its peak gain. Each group consists of 8 unit $g_m$ cells and the achievable constellations of the structures are shown in Fig. 3.5(b) for comparison. The proposed architecture eliminates the right top corner points which were not used in the conventional approach, while maintaining all of the desired constellation intact.

The previous discussions all focused on operation within the first quadrant. However, as seen in equation 2.26, the VGA needs to provide negative gain as well. Given differential signals are usually available at baseband, this can be easily achieved by adding multiplexers to select the polarity. One common approach is to use a butterfly switch at the output of the $g_m$ cell to flip the sign of the output signal (Fig. 3.7). Often times this can be incorporated into the cascode transistors. This would double the self-loading due to the extra transistors, and therefore reduce the output bandwidth. More importantly, the butterfly switches will flip the offset from the input devices, creating a quadrant dependent offset at the output. Alternatively, one can extend the 2-to-1 MUX at the input to a 4-to-1 MUX to achieve the same functionality (Fig. 3.7), without modulating the input devices’ offset.
CHAPTER 3. IMPLEMENTATION OF BASEBAND PHASE SHIFTING

3.5 Phase Shifter Demonstration and Measurement Results

The proposed phase shifter has been implemented in several designs with various resolutions. In a complete transceiver shown in later chapters, a 4-bit/quadrant phase shifter was implemented using TSMC 65nm CMOS technology. The phase shifter drives roughly 50fF capacitive loading, and therefore, according to equation 2.27, it should approximately consume $\sqrt{2} \times 13mW \times 0.05 = 0.9mW$. With a gain of 3dB and bandwidth of 5GHz, this design employs the input MUX approach and consumes 1mW DC power, close to our expectation.

The design was characterized using a single tone test with a frequency offset between the RF and LO signals. The amplitude of output I/Q signal was then recorded as the constellation. Constellation measurement shown in Fig. 3.8 demonstrated the control of gain/phase settings of the phase shifter, including all the impairments from the RF signal.
CHAPTER 3. IMPLEMENTATION OF BASEBAND PHASE SHIFTING

Figure 3.7: Quadrature implementation using (a) butterfly switches (b) 4-to-1 MUX.

path. It is shown here that the entire 360° range is covered with a better than 5° resolution.

3.6 Summary

This chapter discusses the implementation of the baseband phase shifter using variable gain amplifiers. The required phase resolution is mapped to the gain control resolution, taking into account the I/Q gain and phase mismatches due to RF impairments. Two implementation approaches of VGAs are presented and the segmented approach is chosen because of its better linearity in signal path as well as easier gain control.

To solve the issue of extra parasitic capacitances in the segmented approach, a partial sharing scheme is proposed to reduce the required segmentation to the minimum amount. The phase shifter can therefore be implemented with a power consumption that is \( \sqrt{2} \) higher...
Figure 3.8: Measurement result of a phase shifter with 5-bit per quadrant resolution.

than a normal baseband amplifier with the same gain and bandwidth. An implementation in CMOS demonstrates this approach and the measurement results achieved a better than $5^\circ$ resolution.
Chapter 4

Energy-Efficient Phased-Array Transmitter

Modern wireless transmitters have been extensively studied for numerous applications. In many applications the power amplifier consumes the majority of the DC power and hence the efficiency of the transmitter is limited by the PA efficiency. Significant effort has therefore been spent on optimizing the PA itself. However, as discussed in Chapter 2, the minimum DC power consumption of a phased-array is

\[ P_{DC,TX,\text{min}} = 2 \sqrt{EIRP \times P_{TX,OH} \times \eta_{PA} \times G_{TX}}, \]  

indicating that overhead power is just as critical in a phased-array design. This chapter therefore studies the transmitter architecture, targeting a lower overhead power.

4.1 Low Overhead Transmitter Architecture

Fig. 4.1(a) shows a state-of-art 60GHz phased-array transmitter design[18] utilizing a conventional architecture, including a complex modulator followed by a power amplifier. The power consumption of the mixer and the LO buffers is 18mW, while the power amplifier consumes about 4mW and delivers roughly 0.7mW to the antenna. Therefore the entire 4-element phased-array delivers +10dBm EIRP with power consumption of roughly 88mW. As discussed in Chapter 2, such a system actually prefers less elements due to its high overhead power. For example, if a 2-element array is implemented where each element has 18mW overhead power and a 16mW power amplifier, the total EIRP remains the same as the 4-element system, but the DC power decreases to 68mW. Thus, reducing the modulator and LO power consumption is critical to utilize the benefit of phased-array for better performance.

One approach to reduce the overhead, as sketched in Fig. 4.1(b), modulates the oscillator phase directly using baseband data and then delivers power into the antenna. Since the oscillator power is directly proportional to the output power, the system efficiency is almost
identical to the oscillator’s efficiency, eliminating almost all of the overhead power. Although conceptually simple, this architecture has only been implemented for simple modulation schemes like OOK, by turning on and off the oscillator [35, 36], resulting in a relatively low data rate. Furthermore, simply turning on/off the oscillator has no control over the phase, making this architecture hard to integrate into a phased-array system.

4.2 Proposed Transmitter Architecture

In order to support QPSK modulation as well as phased-array functionality, the phase of the oscillator needs to be controlled. Notice that the phase of an oscillator can be determined by its startup conditions, and hence modulating the startup signal can potentially achieve the desired phase modulation of the oscillator. This concept is further illustrated in Fig. 4.2, where the oscillator is turned on and off during each baseband symbol. In the first cycle, the start-up signal starts at time 0 and therefore the phase of the first symbol is 0°. After turning off the oscillator at the end of the first symbol, the oscillator is turned back on again for the second symbol with a slight delay in the startup signal. Assuming the delay of the startup signal is controlled to be a quarter of the RF waveform period ($T_p/4$), the second symbol has a phase of 90° compared to the first symbol. Similarly, 180° and 270° phase can
be achieved by increasing the delay. Alternatively, if the oscillator is differential, 180° and 270° phase can also be achieved by simply inverting the 0° and 90° symbols.

Because of this capability of modifying the phase of an oscillation by adjusting the startup signal, phased-array functionality is also naturally available. Instead of modifying the phase between different symbols, modifying the relative phase of the startup signals for each element provides the desired phase shifting to form a coherent signal in space. As shown in Fig. 4.3, the time delay between elements is determined by the spacing of the array and the desired angle. As before, the range of delay can be reduced to 180° if a differential oscillator is used.
Although the resulting modulated waveforms are slightly different than those produced by conventional transmitters, the spectrum is very similar. In fact, when the carrier frequency is an integer multiple of the baseband frequency, the proposed architecture is identical to a normal QPSK modulated signal with return-to-zero (RZ) pulse shaping. When the ratio between them is not an integer, the harmonics of the baseband signal are higher compared to a conventional approach. However, because of the RZ shape of the waveform, baseband harmonics are filtered, resulting in a similar spectrum. Fig. 4.4(a) shows a sample spectrum of the proposed transmitter output without any external filtering using a 62GHz carrier at a symbol rate of 5GS/s. Compared to the output of a conventional modulator shown in Fig. 4.4(b), the spectrum is not distinguishable close to the carrier and the difference in harmonics is only 2-3dB.

![Transmitter spectrum](image)

Figure 4.4: Transmitter spectrum of (a) proposed architecture and (b) conventional architecture.

### 4.3 Circuit Implementation

Fig. 4.5 shows a block diagram of a prototype implementation of the proposed transmitter architecture [37]. The core of this transmitter consists of an oscillator that is enabled by a modulated baseband clock. The modulation is performed by selecting two different input
CHAPTER 4. ENERGY-EFFICIENT PHASED-ARRAY TRANSMITTER

Clocks. One of the clocks is delayed by the amount required for the phased-array, and the other one is further delayed by a quarter RF period to achieve $90^\circ$ modulation. Although not shown in this block diagram, the oscillator is differential and hence the $180^\circ$ modulation is performed by selecting the polarity of oscillator. Instead of driving the antenna directly using the oscillator, a power amplifier is added mainly to isolate the oscillator from any impedance variation of the antenna and therefore reduce the uncertainty of the LO frequency.

![Transmitter block diagram](image)

Figure 4.5: Transmitter block diagram.

4.3.1 Oscillator Implementation

Although illustrated in Fig. 4.2 that the oscillation starts immediately after the startup signal, in reality, it takes time to start and stop the oscillation. The data rate is therefore bounded by

$$T_{\text{symbol}} \leq T_{\text{start}} + T_{\text{stop}}.$$  \hfill (4.2)

and the waveform has an envelop of triangular shape at its highest data rate. Insufficient startup time will reduce the amplitude of the output signal thus, decreasing the system efficiency. On the other hand, insufficient time for shutting off the oscillator will introduce a residual signal to the next symbol, creating an IIR inter-symbol-interference (ISI). Therefore, to achieve a high data rate, it is desired to have a fast startup as well as shut-down.

4.3.1.1 Oscillator Startup Time

The startup process of an oscillator has been studied previously for many topologies [38, 39]. In this section, an oscillator utilizing cross-coupled transistors (Fig. 4.6) is discussed because it naturally provides support for $180^\circ$ shift. The small signal model of the oscillator includes
an inductor, a capacitor, load resistance due to both load resistance and loss from the LC network, and the negative impedance $-1/g_m$ from the cross coupled transistor. Although the $g_m$ of the transistor is amplitude dependent, a constant $g_m$ is assumed in this calculation for simplicity. This simplification is reasonable since the oscillation signal is small at the beginning of the oscillation.

![Small Signal Model](image)

**Figure 4.6**: Cross coupled oscillator and its small signal model.

Combining the load resistor $R_L$ and negative impedance $-1/g_m$, the oscillator dynamics are set by:

$$V - \frac{L}{R} \frac{dV}{dt} + LC \frac{d^2V}{dt^2} = 0,$$

(4.3)

where $R = \frac{R_L}{g_m R_L - 1}$.

Assuming the voltage across of the tank has the form of $V_0 e^{kt}$, the above equation can be simplified as

$$1 - \frac{L}{R} k + LC k^2 = 0.$$

(4.4)

Replacing $L$, $C$, and $R$ by $\omega_0 = \frac{1}{\sqrt{LC}}$ as well as $Q_0 = \omega_0 RC = \frac{Q_{tank}}{g_m R_L - 1}$ to further simplify the derivation, the above equation can be converted to the familiar second order system, but with the sign of the second term flipped:

$$\frac{k^2}{\omega_0^2} - \frac{k}{\omega_0 Q_0} + 1 = 0.$$

(4.5)
The roots of the above equation are therefore

\[ k_{1,2} = \omega_0 \left( \frac{1}{2Q_0} \pm \sqrt{\frac{1}{4Q_0^2} - 1} \right). \] (4.6)

Depending on the sign of \( \frac{1}{4Q_0^2} - 1 \), this network has different behavior. In the case where \( Q_0 > 1/2 \), \( k_{1,2} \) are complex conjugates and therefore the voltage waveform is determined by

\[ V(t) = \frac{I_0}{\omega_0 C} \frac{\omega_0 t}{\sqrt{1 - \frac{1}{4Q_0^2}}} e^{2Q_0 \sin(\sqrt{1 - \frac{1}{4Q_0^2}} \omega_0 t)}. \] (4.7)

Here we assume the initial current is \( I_0 \) and 0 initial voltage is applied to the tank.

The waveform, as demonstrated in Fig. 4.7(a), is essentially a sinusoid waveform with an exponentially increasing amplitude. When the amplitude of the oscillation increases, the effective \( g_m \) of the transistor starts to drop, resulting in a decreasing \( g_m R_L \). When \( g_m R_L \) decreases to 1, \( Q_0 \) becomes infinity and therefore the amplitude stops increasing and the frequency stabilizes to \( \omega_0 \).

On the other hand, if the initial \( Q_0 < 1/2 \), \( k_{1,2} \) are both real and thus the voltage needs to be rewritten as

\[ V(t) = \frac{I_0}{2\omega_0 C} \frac{\omega_0 t}{\sqrt{\frac{1}{4Q_0^2} - 1}} e^{2Q_0 \sin(\sqrt{1 - \frac{1}{4Q_0^2}} \omega_0 t)} - e^{\omega_0 t - \sqrt{1 - \frac{1}{4Q_0^2}}}. \] (4.8)

Ignoring the second term which decays exponentially, the resulting waveform is an exponentially increasing signal without any sinusoidal waveform initially. Similarly to the previous case, \( Q_0 \) starts to increase with amplitude and the tank starts to output a sinusoidal waveform once \( Q_0 \) reaches 1/2. Simulated waveforms are shown in Fig. 4.7(b), verifying our derivations.

As seen here, in order to speed up the oscillation process, a low \( Q_0 \) is desired. However, it cannot be lowered arbitrarily. In the case of the oscillator driving a power amplifier, the tank \( Q \) is almost identical to the quality factor of the transistor gate, which is close to 3 at 60GHz. The overall \( Q_0 \) is therefore related to the gate \( Q_g \) as

\[ Q_0 = \frac{Q_g}{g_m(R_L/\sqrt{R_{osc}})} - 1 = \frac{Q_g}{g_mR_{osc}/(1 + f)} - 1 = \frac{Q_g}{\omega T Q_g} - 1, \] (4.9)

where \( f \) is defined as the device size ratio between the power amplifier and the oscillator.

Obviously, the minimum \( Q_0 \) is achieved when no loading is presented from the following stages. In a 65nm CMOS technology at 60GHz, assuming \( \omega T = 3\omega_0 \) and \( Q_g = 3 \), the
resulting $Q_{0,min} = 0.375 < 1/2$. However, with a load twice as big as the oscillator itself (i.e. $f=2$), the $Q_0$ increases to 1.5. Therefore we will focus on the case where $Q_0 > 1/2$. In this case, the time it takes to arrive at peak amplitude can be derived from equation 4.7, assuming the maximum amplitude is $V_{max}$:

$$T_{start} = \frac{2Q_0}{\omega_0} \ln \frac{V_{max} \omega_0 C \sqrt{1 - 1/4Q_0^2}}{I_0}. \quad (4.10)$$

Therefore, initial current $I_0$ can be increased in order to reduce the startup time, as previously demonstrated in [39, 40]. However, increasing the initial current $I_0$ will also increase the size of startup transistors that injects the initial current into the tank (Fig. 4.8). These transistors add parasitic capacitance to the tank and therefore increase $Q_0$. At frequencies close to device $f_T$, this will significantly increase the startup time. At the same time, in order to not load the tank during oscillation, the start-up transistor needs to be shut-off. This also requires a narrow pulse on the gate of the startup transistor.
4.3.1.2 Faster Startup

In order to mitigate these issues, the start-up transistor providing the initial current can be reused to provide negative $g_m$ and therefore $Q_0$ can be kept relatively constant. One way to achieve this is shown in Fig. 4.9(a). The source nodes of the cross-coupled pair are not connected directly but AC-coupled through a capacitor. When oscillation needs to be started, one side of the bottom switch is turned on first, pulling the node down. Once oscillation starts, the other side of switch is also turned on to obtain the full negative impedance. Large coupling capacitance reduces the impedance between the source nodes, thus improving the negative impedance. However, a large coupling capacitor also couples the initial discharging current to the undesired side, reducing the initial differential current. Therefore, the capacitor needs to be sized properly to balance between the initial current and the regenerative impedance. To provide a fast startup in the GS/s range, the capacitor needed is in general large compared to the size of the transistors. The bottom plate parasitic capacitor can be as high as the transistor intrinsic capacitance, slowing down the initial current draw and thus the overall startup time.

Alternatively, Fig. 4.9(b) shows a solution where no capacitor is used. In this implementation, a normal cross-coupled pair is still present, but with modified start-up circuitry. Instead of using transistors directly pulling to ground, two switches are stacked and the top one is configured into positive feedback. When the bottom transistor is on, the effective $g_m$ of the top transistor is

$$g_{m,\text{eff}} = \frac{g_m}{g_m R_{on} + 1} \approx \frac{g_m}{2}. \quad (4.11)$$

Similarly to the original design, the startup transistor pulls one side of the tank and starts the oscillation. The main cross-coupled transistors as well as the startup transistor on the other side are then turned on to increase the strength of the negative impedance. Because the startup transistor also contributes to the oscillation, this approach achieves a much faster
startup speed. Furthermore, this approach allows the start-up transistor to remain on during the oscillation, avoiding the need for short pulses.

![Initial Discharge](image)

Figure 4.9: Modified fast start-up oscillator using (a) capacitive coupling (b) pseudo cross-couple.

### 4.3.1.3 Shut-down of Oscillation

As shown in equation 4.2, the shutdown time of the oscillator is also important. Without any modification of the circuit, if oscillation is turned off by turning off all the current bias, the tank voltage exponentially decays with a time constant of $2Q_{\text{tank}}/\omega_0$. In our design, the time constant $\approx 21$\,ps. Therefore 4\tau settling will take roughly 80\,ps. In order to speed up the shut-down process, an explicit ‘kill’ switch can be added (Fig. 4.10). Although the $f_T$ of the PMOS is much lower than the NMOS, a PMOS switch allows the use of the same clock phase instead of generating an inverted clock. The disadvantage, however, is this introduces extra parasitics into the tank, which further increases $Q_0$ and therefore slows down the startup process. The size of the PMOS transistor must therefore be chosen carefully in order to balance the startup and shutdown time. Shown in Fig. 4.11, the simulation suggests the fastest data rate that can be achieved is about 10\,GS/s, assuming a 4\tau settling.

### 4.3.2 Power Amplifier Design

Since they have been studied extensively, 60GHz power amplifier design was not the major focus of this research. However, because the 180° phase shifting relies on the same start-up behavior but on the opposite side, it is desirable for the power amplifier to present a symmetric load to the oscillator. A balun using transformers can provide single-ended to differential conversion, but at 60GHz, the transformer is usually close to its self-resonant-frequency, resulting in imbalance in the impedance [41, 42]. Therefore, a differential power
amplifier is necessary to present a balanced load to the oscillator. Two designs have been implemented for different output power levels. To put out slightly higher output power, the first one (Fig. 4.12(a)) uses a two-stage common source with the second stage single-ended. Fig. 4.12(b) shows a single stage power amplifier design delivering 3dB lower output power. In order to ease the matching and therefore increase the efficiency, the second design also utilizes a 0.7V supply.

### 4.4 Timing Generation

The above architecture could not succeed without precise timing generation. Much like the phase shifters discussed in Chapter 3, a common circuit to generate high precision delay
from a clock signal consists of two variable gain amplifiers and a summation (Fig. 4.13) [43, 44]. This phase interpolator takes two clock with different phases (usually 90° apart) and interpolates between them to achieve a much higher resolution in phase. Because the clock inputs consists of only information on phase rather than amplitude, the top transistors behave as current switches, and the output current is proportional to the bias current. Therefore, the code-phase transfer curve can be quite linear.

4.5 Transmitter Demonstration

Combining all of the building blocks discussed above, the transmitter was first demonstrated in TSMC’s 65nm CMOS G+ technology (Fig. 4.14). In this first design, the oscillator consumes 8mW, the power amplifier consumes 11mW, and the baseband signaling including phase interpolator consumes 2.5mW, resulting in a total power consumption of 21.5mW. Although the inductors for the oscillator and power amplifier are large, the current DAC used in the phase interpolator also occupies significant area due to its matching requirements.

4.5.1 Waveform Measurement

The output of the transmitter was probed directly and fed into an Agilent 86100C 70GHz sampling scope. Fig. 4.15 shows the measured waveform while sending a constant signal at (a) 5GS/s and (b) 7GS/s. At 5GS/s, the signal amplitude reaches its maximum after ∼60ps and the time required to shut down the oscillation is also ∼60ps. The highest data rate is therefore around 8GS/s. In measurement, signals up to 7GS/s were shown.

To demonstrate QPSK modulation, PRBS sequences on both the I/Q channels were fed into the transmitter and the output signal was recorded on the sampling scope as before. The RF signal is then down-converted in Matlab. The resulting constellation is shown in
Fig. 4.13. At 5GS/s, the measured constellation shows a clean modulated output. However, at 7GS/s, the constellation is spread out, indicating inter-symbol-interference (ISI). The
spreading can therefore be canceled/removed with equalization techniques. In our experiment, with equalization performed in Matlab, a two-tap FIR filter was enough to eliminate the ISI. Because insufficient shut-down time would generate an IIR response, this spreading is suspected to be caused by the limited bandwidth of the power amplifier as well as cable, connector, etc.

**Figure 4.15:** Measured waveform at (a) 5GS/s and (b) 7GS/s.

**Figure 4.16:** Measured transmitter constellation at (a) 5GS/s and (b) 7GS/s.

### 4.5.2 Phased-Array Functionality Validation

To validate the phased-array functionality, the phase interpolator’s performance was first measured. The transmitter output phase was recorded on the sampling scope referenced to
the input baseband clock while sweeping the phase-interpolator code. Shown in Fig. 4.17, the PI achieves more than a full RF period of delay with a resolution of 0.33ps and 0.14ps RMS time error, which is equivalent to 6° resolution with 2.5° RMS phase error at the carrier frequency.

Figure 4.17: Measurement of DNL and INL of the TX phase interpolator.

To demonstrate phased-array functionality, two independent TX chips sharing a single baseband clock reference were measured with their outputs combined externally. The beam pattern shown in Fig. 6 is obtained by sweeping the delay of one element relative to the other element, resulting in a measured peak-null ratio of 24dB.

The phase noise performance was measured by down-converting the output signal to ~ 1GHz and feeding it into an E4440A spectrum analyzer. Fig. 4.19 shows the measured output phase noise as well as the input clock phase noise as a comparison. Much like an injection locked oscillator [45], the phase noise tracks the input at low offset frequency, and oscillator noise dominates at high offset frequency.

The output power of the transmitter was measured by an Agilent E4418B power meter. Measurement shows an output power of 0dBm at 5GS/s and -1dBm at 7GS/s while consuming 21.5mW DC power. Since the power meter only measures average output power, the reduced output power at higher sampling rate is due to the finite rise/fall time of the output waveform.
Figure 4.18: Measurement 2-element phased array.

Figure 4.19: Measured output phase noise of the transmitter; in comparison with input phase noise referred at output frequency.

## 4.6 Conclusion

This chapter presents a 65nm mm-wave transmitter efficiently supporting QPSK modulation and phased array functionality with a proposed oscillator phase modulation technique. The
design delivers an average output power of 1mW at 10Gb/s and 0.8mW at 14Gb/s while consuming 21.5mA DC current from a 1V supply. At 10Gb/s, an overall transmitter efficiency of 4.65% was achieved. As summarized in Table 4.1, this represents 1.8X improvement over prior art.

![Figure 4.20: Measured output average power versus sampling rate.](image)

<table>
<thead>
<tr>
<th>Technology</th>
<th>[46]</th>
<th>[47]</th>
<th>[18]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Modulation</td>
<td>PPM</td>
<td>OOK</td>
<td>QPSK</td>
<td>QPSK</td>
</tr>
<tr>
<td>Phased Array</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>Data Rate</td>
<td>10Mb/s</td>
<td>1-3Gb/s</td>
<td>10Gb/s</td>
<td>10-14Gb/s</td>
</tr>
<tr>
<td>Output Power</td>
<td>-</td>
<td>-20dBm</td>
<td>-1.5dBm</td>
<td>0dBm(10Gb/s)</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>-1dBm(14Gb/s)</td>
</tr>
<tr>
<td>DC Power</td>
<td>0.47mW</td>
<td>0.54mW</td>
<td>27mW</td>
<td>21.5mW</td>
</tr>
<tr>
<td>Energy/Bit</td>
<td>47pJ/b</td>
<td>0.54pJ/b(1Gb/s)</td>
<td>2.7pJ/b</td>
<td>2.15pJ/b(10Gb/s)</td>
</tr>
<tr>
<td>$P_{out}/P_{DC}$</td>
<td>-</td>
<td>1.85%</td>
<td>2.62%</td>
<td>4.65%</td>
</tr>
</tbody>
</table>

Table 4.1: Comparison table of transmitter design.

More importantly, the proposed architecture significantly reduces the overhead power in a phased-array transmitter as shown in Fig. 4.21. The oscillator and the power amplifier consume about 88% of the overall power. The overhead including baseband phase interpolator and drivers consume only 12%. According to the optimization in Chapter 2, this implies
there is still a large room to reduce the power amplifier and oscillator power to achieve the optimum condition. Fig. 4.22 thus shows the power breakdown of another sample design with half of the output power (0dBm). The DC power consumption of the oscillator as well as the PA reduces almost by half, indicating the scalability of the oscillator and PA. Although it will be challenging to implement the oscillator and the PA within 2mW power consumption, an oscillator-only design can be potentially employed to achieve the optimum condition. With the same 12dBm EIRP as a 4-element array using the second design, this will predict a total DC power consumption of

$$P_{DC} = 2\sqrt{\frac{16mW \times 2mW}{0.1}} = 36mW.$$  \hspace{1cm} (4.12)

Compared to the second design which uses total 60mW, this implies another 40% reduction by doubling the number of elements.
Figure 4.22: Power breakdown of the second demonstrated transmitter with 0dBm peak output power.
Chapter 5

Energy-Efficient Phased-Array Receiver

This chapter discusses power reduction techniques in the phased-array receiver design. Similar to the transmitter, the overhead power consumption is the bottleneck for efficient receiver design.

In a receiver chain, the noise of the front-end low noise amplifiers (LNA) is in general inversely proportional to their current consumption. Therefore, these amplifiers are not categorized as overhead, and can be adjusted based on the number of elements in the optimization.

Using the baseband phase shifting scheme, the only inefficient building block left is the down-converter. In 65nm technologies, the maximum inductance can be implemented with a reasonable self-resonant frequency (SRF) is $\sim 300\text{pH}$. Without introducing too much extra loss, this maximum impedance sets the minimum device size to be $\sim 10\text{µm}$, resulting in a minimum $3 \sim 4\text{mW}$ power consumption for each stage of 60GHz amplifier or mixer. Therefore, as shown in Fig. 5.1, the mixer consumes $3\text{mW}$ for each I/Q channel, while each LO buffer consumes $3\text{mW}$ to hard drive the LO switch, resulting in a total of $12\text{mW}$ minimum power consumption for the down-conversion. Depending on the gain provided by the preceding stages, the effective input noise from the mixer is not inversely proportional to the DC power, making approximately this entire $12\text{mW}$ overhead power.

Furthermore, in a baseband phased-array scheme, the LO signal needs to be delivered to each mixer, leading to extra power consumption. In a conventional design, shown in Fig. 5.2, the oscillator output is fed into a LO buffer, whose output is then delivered to each mixer with a transmission-line. In order to achieve low power consumption, the device size in the LO buffer is usually small ($\sim 10\text{µm}$), resulting a relatively high impedance ($\sim 1\text{KΩ}$) compared to the transmission line impedance of typically $50 \sim 80\Omega$. A matching network is therefore necessary to convert between these two impedances. Such a high impedance transformation ratio usually leads to a relatively lossy matching network. After the matching network, the LO signal is then split by low impedance power dividers and delivered to the $90^\circ$ hybrid to generate I/Q signals. A local LO buffer is often inserted to boost the swing at the LO
devices of the mixer to achieve high conversion gain. The gate impedance of this buffer is once again high compared to that of the transmission line. A matching network is therefore again inserted for matching purposes. Although both the LO buffer output and the mixer
CHAPTER 5. ENERGY-EFFICIENT PHASED-ARRAY RECEIVER

LO port are potentially small and therefore high impedance, a high impedance transformer or inductor is needed to resonate with the capacitance. In summary, there will be a total of 2 lossy matching networks in the LO path, increasing the power consumption of the LO path.

Our proposed receiver is therefore designed to reduce the power consumption by reusing LO buffer power in the mixer and reducing the number of matching networks in the LO distribution network.

5.1 Stacked Mixer with LO Buffer

One common method to reduce power consumption in circuits is to lower the supply voltage. A conventional Gilbert mixer places the LO switches on top, requiring 2\( V_{ds} \) across the transistors and limiting the supply voltage. Swapping the input device to the top while using a transformer coupled LO port allows a much lower supply. As sketched in Fig. 5.3, the supply voltage of the mixer only needs to maintain one \( V_{ds} \) and therefore a much lower supply can be used. The LO buffer that supplies the LO current swing can also use a low supply. On the other hand, if the same supply needs to be used throughout the receiver chain, a stacked version can be implemented by stacking the mixer on top of the LO buffer (Fig. 5.4) [48].

![Figure 5.3: Low supply mixer using transformer coupled LO port.](image)

The conversion gain of the mixer is dependent on the current swing of the LO signal. In the case of \( I_{ac} = I_{dc} \), i.e. the current can swing fully between on and off, the conversion gain is

\[
G_{conv} = \frac{1}{2} \times \frac{4}{\pi} \times \frac{I_{dc}}{V_s} \times R_L = \frac{4V_{RL}}{\pi V_s},
\]

where \( V_{RL} \) represents the voltage drop on the load resistor. This conversion gain is identical to a conventional double balanced mixer. The proposed mixer has only RF feedthrough to the output and no LO leakage to the output differential signal. However, there is significant
LO leakage to the output common mode, requiring good common mode rejection from the following stages.

Providing similar gain as a conventional mixer, the main advantage of the proposed mixer is that it eases integration with the LO distribution network. Shown in Fig. 5.5, transmission lines used for LO distribution can now be inserted at the source of the top devices, which is a much lower impedance node compared to the gate. The low impedance hybrid can also be inserted at this point of the circuit. As illustrated in Fig. 5.5, total number of matching network needed is reduced to one. Using a simple matching network transforms the high LO device’s output impedance to the $\sim 50\Omega$ transmission line. The reduced matching network significantly reduces the loss along the LO path, and therefore eliminates the need for a local buffer.

5.2 Hybrid Design

The above mixer architecture requires a hybrid design that passes the DC current. A branch-line coupler [49] is thus suitable for our implementation. In a typical design, the branch-line coupler (Fig. 5.6) consists of four transmission-lines, out of which two have characteristic impedance of $Z_0$ and the other two have characteristic impedance of $Z_0/\sqrt{2}$. However, all four lines need to have a length of $\lambda/4$. 60GHz has a wavelength of 2.5mm on chip, resulting in 625µm per leg for the coupler, which occupies significant area.

In order to minimize the area penalty, the hybrid can be redesigned using slow-wave techniques [50]. Capacitors are thus added at the end of the transmission line to reduce the effective speed of light in the transmission line, therefore reducing the wavelength. In order to replace a transmission line with impedance of $Z_0$ and length of $\lambda/4$, the input impedance of the transmission line with arbitrary load impedance should be the same. The original
CHAPTER 5. ENERGY-EFFICIENT PHASED-ARRAY RECEIVER

Figure 5.5: Proposed LO distribution scheme built inside mixer.

Figure 5.6: Branch-line coupler (a), its miniaturization (b) and physical implementation.

input impedance of
\[ Z_{in,\lambda/4} = \frac{Z_0^2}{Z_L} \]  \hspace{2cm} (5.2)

On the other hand, the slow-wave transmission line with characteristic impedance \( Z'_0 \), length
l, and loading capacitor $C$ has an input impedance of

$$Z'_{in,l} = \frac{1}{sC/Z_0'} \frac{Z_L/1 + jZ_0'\tan(\beta l)}{Z_0' + j(Z_L/1 + sC)\tan(\beta l)}.$$  \hspace{1cm} (5.3)

where $\beta = \frac{2\pi}{\lambda}$. Because this impedances should be identical to $\frac{Z_0^2}{Z_L}$ with any load impedance, two load conditions $Z_L = 0$ and $Z_L = -\frac{1}{sC}$ are selected to obtain the capacitance and transmission line impedance $Z'_0$. When $Z_L = 0$, the above equation can be simplified to

$$Z'_{in,l} = \frac{1}{sC/Z'_0} jZ'_0\tan(\beta l) = \infty.$$  \hspace{1cm} (5.4)

This leads to the capacitance value expressed in terms of $Z'_0$ and transmission line length $l$ as:

$$C = \frac{1}{\omega Z'_0 \tan(\beta l)}.$$  \hspace{1cm} (5.5)

Now if $Z_L = -\frac{1}{sC}$ is chosen instead, the input impedance is:

$$Z'_{in,l} = \frac{1}{sC/Z'_0} \frac{Z'_0}{j\tan(\beta l)} = -sCZ_0^2.$$  \hspace{1cm} (5.6)

Solving the above equation gives the solution of the desired transmission line impedance expressed in terms of $Z_0$, $\beta$, and $l$ as:

$$Z'_0 = Z_0 \frac{\sqrt{1 + \tan^2(\beta l)}}{\tan(\beta l)}.$$  \hspace{1cm} (5.7)

The capacitance can thus be rewritten in the form of the original T-line impedance as

$$C = \frac{1}{\omega Z_0 \sqrt{1 + \tan^2(\beta l)}}.$$  \hspace{1cm} (5.8)

The above derivation provides a guide for selecting transmission line impedance as well as capacitance for a fixed length. Notice that the hybrid can be part of the routing network, and therefore the length of the T-line can be determined by the physical distance between elements. In our sample design, $\lambda/8$ is chosen for the low-impedance T-line. The impedance of this part of the hybrid is chosen to be $Z_0$, and the required capacitance is $1/\omega\sqrt{2}Z_0$. The high impedance line, however, does not serve the purpose of routing because it is between I/Q outputs. Along with the fact that the impedance of this line is higher, it is easier to implement this part of the hybrid with a lumped inductor instead. Similar to the analysis
for the slow-wave T-line, the inductor and capacitor to implement the quarter-wavelength transmission line are [49]:

\[ L = \frac{Z_0}{\omega}, \quad (5.9) \]
\[ C = \frac{1}{\omega Z_0}. \quad (5.10) \]

The layout of the proposed hybrid is sketched in Fig. 5.6(c). Instead of using a normal coplanar waveguide, one side of the ground plane is removed for an even more compact layout. EM simulation with HFSS shows a balanced response at 60GHz with \( \sim 1.2\, \text{dB} \) insertion loss and a return loss better than 20dB (Fig. 5.7).

![Simulated Response](image)

**Figure 5.7:** Simulation result of proposed hybrid.

### 5.3 Incorporating the Mixer with the 60GHz LO Generation

The previous section proposed a mixer design stacked on top of a common source amplifier. The amplifier can potentially be replaced by any LO generation circuitry to save the extra stage. Two different approaches are proposed here to combine the mixer with LO generation.

#### 5.3.1 Stacked 30GHz VCO with Mixer

One way to generate the 60GHz LO is to use a 60GHz oscillator. However, the oscillator prefers high impedance loading and therefore it is hard to stack the mixer on top of the oscillator directly because of its low input impedance. However, a differential 30GHz oscillator
produces a 60GHz common mode signal which can be used as the 60GHz LO [51]. Fig. 5.8 shows an example where a cross-coupled 30GHz oscillator is implemented and the supply node is tied to the transmission line, supplying both DC current and AC switching. In this configuration, the 30GHz nodes of the oscillation are not loaded by the low impedance at the source of the mixer, therefore it does not change the oscillation criteria.

![Diagram of stacked mixer on 30GHz oscillator](image)

Figure 5.8: Stacked mixer on 30GHz oscillator.

The effective source impedance from the oscillator is essentially the common mode impedance of the oscillator ($2/g_m$). This low impedance further eases the impedance transformation requirement.

Although it is energy-efficient for the LO power generation and delivery, the frequency of the VCO still needs to be locked to a reference. Therefore, varactors are necessary in order to tune the frequency. Although the quality factor of the varactor is much higher at 30GHz compared to 60GHz, this will still result in a reduced quality factor of the tank, and therefore reduced power into the LO port of the mixers.

### 5.3.2 Stacked Push-Push with Mixer

In order to break this tradeoff, it is desirable to de-couple the varactors for tuning from the circuitry generating LO power. The proposed mixer is therefore stacked on a 30GHz
push-push doubler [52] that does not require a varactor. Unlike the subharmonic mixer proposed in [53], two inductors are added at the device’s drain, as drawn in Fig. 5.9. These two inductors, together with parasitic capacitance from the devices, act as an impedance transformation network to lower the output impedance of the transistors, therefore increasing the AC current flowing to the mixer. Common approach [54] uses a single inductor after push-push. Because the drain of the transistors are connected together, they have to operate in saturation region as transconductors. This split version separates the drain connection, allowing the transistors to operate in the linear region for a higher efficiency.

5.3.2.1 Optimization of Push-Push Inductance

To set this series inductance value correctly, the transistors are modeled as ideal switches with parasitic capacitance, and the load is simply modeled as a load resistor to the supply (Fig. 5.10).

![Figure 5.9: Stacked mixer on 30GHz push-push doubler.](image)
The voltage waveform on each node can be expressed in a Fourier series as:

\[
V_1(t) = \sum_{n=-2}^{2} A_n e^{jn\omega t}, \tag{5.11}
\]

\[
V_2(t) = \sum_{n=-2}^{2} (-1)^n A_n e^{jn\omega t}. \tag{5.12}
\]

where \( \omega = 2\pi \times 30\text{GHz} \) in our case. The current through the capacitor \( C_1 \) is therefore the derivative of the voltage,

\[
I_{C,1} = C \frac{dV_1}{dt} = C \sum_{n=-2}^{2} (jn\omega) A_n e^{jn\omega t}. \tag{5.14}
\]

Notice that this circuit is symmetric, but that the current on \( C1 \) and \( C2 \) are spaced by half period in time. Thus the current through capacitor \( C_2 \) should have a similar form as

\[
I_{C,2} = C \frac{dV_2}{dt} = C \sum_{n=-2}^{2} (-1)^n (jn\omega) A_n e^{jn\omega t}. \tag{5.15}
\]
On the other hand, the current flowing through the inductors are:

\[ I_{L,1} = \sum_{n=-2}^{2} B_n e^{jn\omega t}, \]  
\[ I_{L,2} = \sum_{n=-2}^{2} (-1)^n B_n e^{jn\omega t}. \]  
(5.16)  
(5.17)

The DC current is therefore the total DC current of \( L_1 \) and \( L_2 \).

\[ I_{DC} = 2B_0. \]  
(5.18)

The amplitude of the second harmonic current is

\[ I_{2\omega} = |2(B_{-2} e^{-2j\omega t} + B_{2} e^{2j\omega t})|. \]  
(5.19)

Therefore our goal here is to solve the ratio between coefficients \( B_{\pm 2}, B_0 \) because it represents how much AC current we can obtain. Going through the operation of the push-push, there are two equations governing the operation. First, when the switch is off, the inductor current should be equal to the capacitor current, or mathematically,

\[ I_{C,1} \times S(t) = I_{L,2} \times S(t). \]  
(5.20)

Where \( S(t) \) is a square wave that can be expanded as

\[ S(t) = \frac{1}{2} - 2 \sum_{n=1,3,5,...}^{\infty} \frac{1}{n} \sin(n\omega t) = \frac{1}{2} + \frac{j}{\pi} \sum_{n=1,3,5,...}^{\infty} \frac{1}{n} (e^{j\omega t} - e^{-j\omega t}). \]  
(5.21)

The second equation relates the output voltage to the internal voltage \( V_1, V_2 \) as

\[ L \frac{dI_{L,1}}{dt} = V_{out} - V_1 = V_{dd} - (I_{L,1} + I_{L,2})R_L - V_1. \]  
(5.22)

Similar to [55], the approach we use here to solve these two equations is to plug in the Fourier series coefficients and then match the coefficients of each harmonic on the two sides. Focusing on the first three terms, equation 5.20 can be rewritten as a matrix multiplication.

\[
\begin{bmatrix}
-\frac{j}{\pi} & 1/2 & \frac{j}{\pi} & 0 & \frac{j/3\pi}{16} \\
0 & -\frac{j}{\pi} & 1/2 & \frac{j}{\pi} & 0 \\
-\frac{j/3\pi}{16} & 0 & -\frac{j}{\pi} & 1/2 & \frac{j/3\pi}{16} \\
\end{bmatrix}
\begin{bmatrix}
B_{-2} + 2j\omega CA_{-2} \\
B_{-1} + j\omega CA_{-1} \\
B_{0} \\
B_{1} - j\omega CA_{1} \\
B_{2} - 2j\omega CA_{2} \\
\end{bmatrix}
= 0. 
\]  
(5.23)

This can be easily solved to relate coefficients \( A_n \) and \( B_n \).

\[
\begin{bmatrix}
B_{-2} + 2j\omega CA_{-2} \\
B_{-1} + j\omega CA_{-1} \\
B_{0} \\
B_{1} - j\omega CA_{1} \\
B_{2} - 2j\omega CA_{2} \\
\end{bmatrix} = B_0
\begin{bmatrix}
(24 - 3\pi^2)/16 \\
-j\pi/4 \\
1 \\
j\pi/4 \\
(24 - 3\pi^2)/16 \\
\end{bmatrix}.
\]  
(5.24)
Similarly, equation 5.22 can be written in a matrix form as well:

\[
L \begin{bmatrix}
-2j\omega B_{-2} \\
-j\omega B_{-1} \\
0 \\
j\omega B_1 \\
2j\omega B_2
\end{bmatrix} = \begin{bmatrix}
-2B_{-2}R_L - A_{-2} \\
-A_{-1} \\
V_{dd} - 2B_0R_L \\
-A_1 \\
-2B_2R_L - A_2
\end{bmatrix}.
\]

(5.25)

Coefficients \(B_{\pm2}\) can therefore be obtained by combining the above two equations,

\[
B_{-2} = B_2^* = B_0 \frac{24 - 3\pi^2}{16(1 - 4\omega^2LC - 4j\omega R_L C)}.
\]

(5.26)

Notice the total DC current is \(2B_0\), and therefore the amplitude of the double-frequency current flowing to the load is

\[
I_{2\omega} = |2(B_{-2}e^{-2j\omega t} + B_2e^{2j\omega t})| = I_{DC} \frac{3\pi^2 - 24}{8} \frac{1}{\sqrt{(1 - 4\omega^2LC)^2 + 16\omega^2R_L^2C^2}}.
\]

(5.27)

In order to maximize the second harmonic AC current, the inductor is sized to resonate with the capacitor at a frequency of \(2\omega\), achieving a maximum AC current of:

\[
I_{2\omega,max} = I_{DC}Q \frac{3\pi^2 - 24}{16} \approx 0.35QI_{DC}.
\]

(5.28)

Notice here we replaced \(\frac{1}{2\omega R_L C}\) with \(Q\).

In order to achieve the maximum conversion gain of the mixer, the current should be switched fully on and off, therefore it is desirable to have AC current as large as the DC current; meanwhile, a much higher AC current does not improve the conversion gain much because it clips the transconductance. Therefore, as shown below, \(Q \approx 3\) will be sufficient to generate a full swing.

\[
I_{2\omega,max} \approx 0.35QI_{DC} = I_{DC}.
\]

(5.29)

5.3.2.2 Sharing LO between Elements

All the elements in a phased-array should be synchronized together, and hence it is desirable to share the same LO signal. This can be achieved simply by sharing the push-push structure with a larger transistor size and smaller inductor. The LO signals are routed to different mixers using transmission lines. Fig. 5.11 shows the configuration of our complete receiver Mixer with LO delivery. The output of the push-push doubler directly connects to the 4 transmission lines to avoid Wilkinson power dividers, and therefore the interface impedance is \(Z_0/4\) - i.e., 12.5Ω in our design.
5.3.2.3 Sizing of Push-Push

This 12.5Ω impedance then determines the tolerable parasitic capacitance from the switches as shown in 5.28. Due to the hybrid, AC current flowing through each I/Q mixer is $1/\sqrt{2}$ of the total AC current generated by the push-push and the DC current of each mixer is half of the total DC current. The full swing requirement therefore becomes

$$\frac{1}{\sqrt{2}} I_{2\omega,\text{max}} = \frac{3\pi^2 - 24}{32\sqrt{2}\omega R_L C} I_{DC} = \frac{1}{2} I_{DC}. \quad (5.30)$$

This determines the tolerable capacitance as

$$C = \frac{\sqrt{2}(3\pi^2 - 24)}{32\omega R_L} = 105fF. \quad (5.31)$$

This limits the total switch size to be close to $\approx 80\mu m$. The on impedance of the device is roughly 4Ω. At the frequency of $2\omega$, this impedance is transformed into a lower impedance.
4Ω/Q^2 to the load, which is sufficiently small. The inductor size is then selected to resonate out the capacitor as indicated in equation 5.27. Calculation shows the optimum happens with ∼ 75pH inductor size and is verified by simulation as shown in Fig. 5.12.

5.3.3 30GHz Generation

Many different schemes can be employed to generate the 30GHz signal, which then drives the push-push structure. In our implementation, for simplicity, an injection lock tripler (similar to [56]) is used to multiply a 10GHz input clock up to a 30GHz clock. Shown in Fig. 5.13, the tripler consists of a 30GHz LC oscillator and a differential pair to inject current into the tank directly. In order to maximize the injection locking range, it is desirable to have more 3rd order harmonic content generated from the differential pair [45].

Using a pseudo differential pair, the design in [56] relies on two components to inject third-order harmonic current. The first one is the sharp edge of the input clock, containing rich third order harmonic that will be amplified and passed through the tank. Depending on the sharpness of the input clock, this component is usually small for high frequency clocks. Another component is the device non-linearity that converts the fundamental frequency to the third order harmonic, which is still limited in modern CMOS devices. To enhance its 3rd order harmonic, a tail current source is proposed. Notice that the tail node presents a signal of 2f_0 when a f_0 signal is driven at the gate, which will be mixed with the input signal generating a strong 3rd order harmonic at the output. Because this mechanism utilizes only 2nd order non-linearity of the transistor, it produces more of the desired 3rd order harmonic.
To numerically evaluate this, let’s assume a perfect square law device is used with the familiar I-V equation:

\[ I = K(V_{gs} - V_{th})^2. \]  

(5.32)

Clearly without the current source, a pseudo diff pair will not generate 3rd harmonic by itself at the output. With an ideal current source, however, the tail node \( V_x \) is free to move, and thus:

\[ I_{\text{tail}} = K(V_+ - V_x - V_{th})^2 + K(V_- - V_x - V_{th})^2 \]

\[ = 2K(V_{cm} - V_x - V_{th})^2 + 2KV_{\text{sig}}^2/4. \]  

(5.33)

The output differential current is

\[ I_{\text{out}} = K(V_+ - V_x - V_{th})^2 - K(V_- - V_x - V_{th})^2 \]

\[ = 2K(V_{cm} - V_x - V_{th})V_{\text{sig}}. \]  

(5.34)

\( V_{cm} - V_x - V_{th} \) can be replaced using equation 5.33, leading to

\[ I_{\text{out}} = 2KV_{\text{sig}}\sqrt{\frac{I_{\text{tail}}}{2K} - \frac{V_{\text{sig}}^2}{4}} \]

\[ \approx \sqrt{2KI_{\text{tail}}V_{\text{sig}}(1 - \frac{K}{4I_{\text{tail}}}V_{\text{sig}}^2)} \]  

(5.35)

\[ = g_mV_{\text{sig}}(1 - \frac{V_{\text{sig}}}{2V_{ov}})^2, \]

where \( V_{ov} \) denotes the overdrive voltage at static bias.
As shown here, the third order generation can be significant compared to relying on transistor non-linearity. When a full swing signal is driven onto the gate, the signal can be as high as $2V_{ov}$ and therefore the effective $g_m$ of the third order harmonic is almost the same as the fundamental. In reality, the finite impedance of the current source at the second harmonic will degrade the performance. So, it is critical to optimize the channel length of the current source device to obtain the highest impedance at $2f_0$. Nevertheless, this proposed injection locked tripler still generates a stronger 3rd order harmonic tone at the output compared to a pseudo differential approach, resulting in widened injection locking range compared to a pseudo differential pair. In order to verify this idea, sample designs using different approaches are implemented, while the injection current is constrained to be the same as the oscillator current. The result in Fig. 5.14 shows the improvement in locking range is close to 50%.

![Figure 5.14: Locking range with and without current source.](image)

Although much wider compared to other candidates, this injection locking range is still nowhere near sufficient to cover 10% of the carrier frequency. A varactor bank is therefore added at the tank to tune the natural oscillation frequency to be close to the desired carrier frequency. Because the tripler output resonates at 30GHz instead of 60GHz, the quality factor of the varactors is doubled, improving the efficiency as well as the phase noise significantly.
5.4 Receiver Demonstration

The complete receiver is demonstrated in TSMC 65nm CMOS G+ technology. The receiver uses two stages of low noise amplification in front of the proposed mixer. The first stage is designed using an inductively source degenerated architecture [57, 58] followed by a cascode amplifier. The output of the second stage is then converted to differential to interface with the mixer. On-chip antennas are also integrated and the interface between the antennas and the LNAs will be discussed in the next chapter. The receiver consumes a total of 65mW for 4 elements, including 24mW from the LNAs, 20mW from the mixers, 4mW from the tripler, 8mW from the baseband phase shifters and 8mW for the output buffer. The 65mW total power consumption results in a 16mW/element power consumption, demonstrating by far the most efficient phased-array receiver to date. It is interesting to see that the power spent on mixer and LO delivery is now comparable with the front-end LNA, which verifies our proposal at the beginning of this chapter.

5.4.1 Bandwidth Measurement

Due to the integration of on-chip antennas, the receiver front-end can not be directly measured in terms of gain. The whole system including antenna will be presented in Chapter 7. Here, only the bandwidth of the receiver chain is presented. With a fixed IF at 1GHz, sweeping both LO and RF together resulted in a measured RF bandwidth of $\sim 7\text{GHz}$; With the LO fixed at 63GHz, sweeping the RF frequency resulted in a measured conversion bandwidth of about 3GHz, indicating that most of the bandwidth limitation is at baseband. This is mainly due to the wire bonding to the PCB and limited bandwidth on the PCB. In a complete system where the baseband is integrated onto the same die, this bandwidth can be significantly improved.

5.4.2 LO Performance

The locking range is measured by sweeping the input clock as well as the varactor bank while monitoring a LO monitor port. As shown in Fig. 5.16, the locking range is above 7GHz for a 60GHz carrier, exceeding our 10% target.

The phase noise is also measured using an external down-converter. With the relatively high Q at 30GHz, the phase noise at 1MHz offset is lower than -110dBc/Hz. As shown in Fig. 5.17, the output phase noise is limited by the input reference below $\sim 200kHz$.

5.5 Summary

This chapter discusses the implementation of the phased-array receiver and focuses on the down-converter design coupled with LO delivery. A mixer stacked on top of the push-push doubler is proposed. The mixer’s intrinsic low impedance LO interface eases the routing by
eliminating matching networks that were previously required. Reusing the DC current from the push-push doubler also eliminates extra power consumption. Shown in Fig. 5.18, with this new approach, the overhead circuitry including the mixer, the tripler, and the baseband phase shifters consume 8mW in total. Compared to a conventional architecture (Fig. 5.19),
CHAPTER 5. ENERGY-EFFICIENT PHASED-ARRAY RECEIVER

Figure 5.17: Phase noise of the receiver LO at monitor port.

Figure 5.18: Power breakdown of the proposed receiver.

the proposed architecture achieves a 47% reduction in overhead power. As shown in Fig. 2.3, with 8mW overhead power, the optimal number of receiver elements is close to 4, which is what we will demonstrate in the next chapter.
Figure 5.19: Predicted power breakdown of a receiver using conventional architecture.
Chapter 6

On-Chip mm-Wave Antennas

One of the advantages of using mm-Wave frequencies is that the short wavelength makes efficient antenna sizes much smaller. The 5mm wavelength at 60GHz potentially allows its integration onto the silicon die, therefore further reducing the cost. However, the lossy nature of the CMOS substrate as well as its thin metal limits the performance of on-chip antenna at mm-Wave frequencies; the planar nature of on-chip antenna also significantly limits its radiation angle within the chip plane. In this chapter, we demonstrates a slot-loop antenna design with $\sim 30\%$ efficiency. With the same antenna structure, antenna diversity is achieved by the multi-access approach, overcoming the coverage limit in a conventional antenna.

6.1 On-Chip Antenna Efficiency

Many different on-chip antenna structures at 60GHz have been studied in the past [59, 60, 61, 62], demonstrating the feasibility of integration. Although most of these antennas are fairly efficient, they are in general sensitive to near by ground planes. In other words, a large keep out area needs to be maintained in order to achieve high efficiency. While integrating with the rest of the on-chip circuitry, this leads to a large footprint. The large keep out area also increases the loss due to the interconnection between the circuitry and the antenna.

To overcome this issue, aperture based designs can be used because of their intrinsic ground plane. In principle, almost all antennas can be reversed into a aperture based antenna by inverting the conductor into dielectric and vice versa (Fig. 6.1) [3].

While presenting a performance similar to slot loop antenna, the slot dipole antenna has a size close to $\lambda/2$, making it hard to integrate within a 2D array. The slot loop antenna, however, occupies a larger area but smaller in dimensions, making it appealing for our implementation. Therefore, the rest of this chapter will focus on the design of slot-loop antennas.
6.1.1 Slot-Loop Antenna Design

The most important design metric for an on-chip antenna is its efficiency. Although it is possible to calculate the radiation efficiency as well as the loss mathematically, the presence of the lossy substrate makes this calculation tedious and difficult. The approach we take here is to analyze the antenna radiation resistance without considering the substrate. After obtaining a close-to-optimum sizing, the EM tool HFSS is used to further optimize the efficiency of the antenna.

6.1.1.1 Diameter

The circumference of the loop or slot-loop antenna is the dominant parameter to achieve high efficiency because it directly determines the radiation resistance. It is desirable to maximize radiation resistance in order to achieve the highest radiation efficiency. The radiation resistance of a loop antenna has been studied in the past [63] and Fig. 6.2(a) shows the...
resistance versus the circumference with the same calculation. The peak resistance happens around half wavelength for a loop antenna. However, the slot loop antenna has a different resistance. Based on Babinet’s principle [3], the impedances of a normal antenna \(Z_N\) and the slot antenna \(Z_S\) are related by:

\[
Z_N Z_S = \frac{\eta^2}{4}. \tag{6.1}
\]

where \(\eta\) is the intrinsic impedance of the medium, that is \(120\pi\) in air.

The radiation resistance can therefore be calculated as shown in Fig. 6.2(b). A 1.05\(\lambda\) antenna provides the first peak of the impedance, resulting in a peak efficiency. At 60GHz with a dielectric constant of \(\sim 4\), the wavelength is 2.5mm. The diameter is therefore close to \(2.5\text{mm}/\pi \approx 800\mu\text{m}\). In reality, due to the loading effect from the high dielectric constant (11.7) substrate, the effective wavelength is shorter, resulting in a smaller radius. As shown in the simulation from Fig. 6.3, the efficiency peaks at a diameter of 670\(\mu\text{m}\). The 30\% efficiency of the slot-loop antenna is lower than the maximum efficiency of a loop antenna (\(\sim 35\%\)) due to its lower radiation resistance. However, when ground plane is within a range of half \(\lambda\), the efficiency of the loop antenna is significantly reduced to \(\sim 20\%\).

![Figure 6.2: Radiation resistance of (a) loop antenna and (b) slot loop antenna.](image)

**6.1.1.2 Gap Width**

Having a minor effect on the efficiency, the gap width (Fig. 6.4) primarily impacts the bandwidth of the antenna. One measure that can be used to quantify this effect is the \(S_{11}\) bandwidth. As illustrated in Fig. 6.5, \(S_{11} < -15dB\) is chosen to be the criteria and the bandwidth increases with the gap width initially. When the gap width is beyond 40\(\mu\text{m}\), the
inductance of the feed line starts to decrease the bandwidth. For large gap width, the feed structure also starts to affect the radiation pattern. In this design, the gap width was chosen to be 20µm, providing a sufficient bandwidth of roughly 10GHz.
6.1.1.3 Substrate Thickness

The doped silicon substrate is one major source of loss for on-chip antennas. Not only does the substrate present a resistive load to the antenna, surface waves in the substrate also lead to significant loss in the antenna [64]. Thinning the die therefore mitigates this issue. Simulation (Fig. 6.6) shows a significant improvement in efficiency from 25% to over 30% by thinning the die from a typical 250µm to a 100µm thickness.

6.2 Multiple-Access On-Chip Antenna

Although it does support reasonable in-plane radiation compared to other structures, a slot-loop antenna still displays a null in the feed direction due to its planar nature (Fig. 6.7).

Fortunately, the rotationally symmetric shape of the slot-loop antenna provides an opportunity to drive it at different points, achieving antenna diversity using the same radiating structure. The fact that the antenna is integrated on the same chip with all the transistors eases the connection between the circuits and the antenna, eliminating extra high frequency I/O’s that would be required with off-chip antennas. As shown in Fig. 6.8, the radiation pattern when driving at P2 covers the null direction of the pattern when driven at P1. The overall coverage of the entire system is therefore boosted. Simulation shows that the worst case gain normalized to the peak is -3dB at the cross point of these two radiation patterns.
The same principle can also be utilized for the purpose of T/R switching. Similar to [65], connecting the transmitter and receiver at different locations removes the explicit T/R switch, reducing the loss of the front-end. As an example shown in Fig. 6.9, the antenna has four different ports and two of them are connected to the transmitters (TX1, TX2) and the other two ports are connected to receivers (RX1, RX2). The two receivers are orthogonal to each other, and therefore provide full coverage. The transmitters are also orthogonal to each other, but they are 45° rotated from the receivers to ease the integration with other building blocks.
6.2.1 Antenna Multiplexing

To switch between different drive points, the entire transmitter is duplicated and each of them can be turned on/off independently. When it is off, the input clock is gated through control signals and the biases of the oscillator as well as the power amplifier are grounded. However, the receiver has too large of a footprint to duplicate, mainly due to its multiple stages and
the bulky hybrid. The multiplexing is therefore done in the first stage LNA, realized by two identical cascode LNAs whose drains are shorted and gate biases are independently controlled (Fig. 6.10). Depending on the cascode bias control as well as the input device control, signals from different driving points are passed through the following stages.

![Figure 6.10: Input multiplexer embedded in low noise amplifier.](image)

6.2.2 Reducing Loading Effect

The major concern when adding multiple drivers/receivers to the antenna is that their additional loading can result in degraded efficiency. For a slot antenna, it is important to maintain a high impedance when the driver or receiver is not used. This is much easier to achieve than a loop structure, where a low impedance is desirable to present between the two terminals of the drive point and therefore large device size is required.

On the transmitter side, because the off power amplifier presents only the drain capacitance of the transistor in parallel with the transformer, it intrinsically presents a high impedance. The off impedance is limited by the parallel impedance of the transformer, which is close to 200Ω in our design, sufficiently high for our purposes. Therefore, the antenna switching in the transmitter can simply be achieved by turning off the power amplifier bias.

However, the proposed LNA/MUX scheme has a significant issue: the input impedance of the LNA in the off state is low. In a inductively source degenerated LNA (Fig. 6.11), the input impedance is

\[
Z_{in} = sL_g + sL_s + \frac{1}{sC_{gs}} + \frac{g_m}{C_{gs}}L_s. \tag{6.2}
\]
The LNA is usually designed to have $L_g + L_s$ resonating with $C_{gs}$, and the real part of the impedance is designed to be 50Ω when the transistor is on. However, when the transistor is off, the real part of the impedance goes away, leaving the input impedance as a short (Fig. 6.11(c)). This will significantly load the antenna and is thus undesirable. To solve this issue, a $\lambda/4$ transmission line is inserted in front of the receiver, transforming the impedance up when the LNA is off and maintaining the match when the LNA is on. This part of the transmission line also serves the purpose of routing to bring the input signals to a single location on the die, as shown in Fig. 6.12. Because the gate of the two LNAs are DC shorted through the transmission line and the antenna, it is desirable to add an AC coupling capacitor (Fig. 6.12) to isolate the bias so they can be adjusted independently.

![Figure 6.11: (a) Schematic of source degenerated LNA and its impedance with bias (b) on and (c) off.](image)

### 6.3 Measurement Results

Fabricated in a 65nm G+ CMOS technology, the antenna pattern has been measured using an external horn antenna (QuinStar QWH-VPRS00) with 20dB gain at 60GHz. The measurement setup is shown in Fig. 6.13. Because we do not have separate antenna test structures, the antenna can not be characterized independently. Therefore, the antenna together with the transceiver was measured instead.
Figure 6.12: $\lambda/4$ transmission line inserted for impedance transformation while also serving the purpose of routing the antenna ports to a single location.

Figure 6.13: Antenna measurement setup.

### 6.3.1 Measured Transmitter Power and Receiver Gain

Using a single transmitter, the transmitter’s radiation pattern was measured using a 70GHz sampling scope. Although a close by measurement provides a large signal to overcome
the noise in the oscilloscope, near-field coupling of the horn antenna with the test board makes the measurement results inaccurate. Fig. 6.14 therefore shows the measured signal amplitude versus distance in a single direction. The red curve sketches the ideal free space path loss equation normalized to the power measurement at 25cm distance. As shown here, the measurement taken beyond 15cm has a good match with the theory, showing negligible near-field coupling. The radiated power in this direction can be calculated using the data at 25cm as, $-60 + 56 = -4dBm$. As mentioned before, the simulated output power from the transmitter is 0dBm. The antenna simulation shows a 30% efficiency with 2dBi gain in the direction used in this measurement. Therefore the simulation predicts a -3.2dBm power, which is 0.8dB higher than our measurement.

![Figure 6.14: Transmitter radiated power versus distance.](image)

Similarly, the receiver antenna pattern is also characterized using a single element. Although it should show similar near-field effects as the transmitter, it is desirable to re-characterize the receiver itself. As shown in Fig. 6.15, the receiver also shows negligible coupling beyond 10cm range. The total gain of the receiver again can be calculated using the data at 25cm as $-38.2 + 56 = 17.8dB$. The simulated gain of the receiver chain is 22dB and therefore the expected gain in this direction is $22 - 5.2 + 2 = 18.8dB$, which is roughly 1dB higher than our measurement. These measurement results from transmitter and receiver suggest that the antenna has an efficiency higher than 25% if all the circuits work as simulated.
6.3.2 Antenna Pattern Measurement

To measure the radiation pattern, the horn antenna was positioned 15cm away from the chip and the amplitude at each angle was recorded while rotating the horn antenna. Depending on the plane in which the horn antenna is swept, samples of the entire 3D pattern can be captured (Fig. 6.16). The two transmitters connected to the same antenna shows orthogonal end-fire radiation patterns, as expected. The nulling is not as deep as the simulation results because the signal level in this direction is hitting the noise floor of the sampling scope. On the other hand, when sweeping orthogonal to the PCB board, the radiation pattern shows a null or full coverage depending on whether it is in the feed direction or not, matching with our antenna simulation pattern. The receiver antenna pattern (Fig. 6.16) also demonstrates the desired antenna characteristics. Notice that the in-plane radiation pattern of the transmitter and receiver are quite different; this is due to the fact that the transmitters are connected to the antenna with a 45° rotation from the receivers.

6.4 Summary

In this chapter, the design of efficient on-chip antennas is discussed. A slot-loop antenna was chosen in this design because its intrinsic ground plane eases the integration with other circuitry. Design procedures for the antenna were shown and 30% efficiency antenna was im-
implemented on a CMOS die. To overcome the limited radiation coverage in end-fire direction, a multi-access antenna structure is proposed to implement antenna diversity using a single antenna. The T/R switch is also eliminated using the same idea. The measurement results confirmed that the antenna efficiency is above 25% while demonstrating the expected full coverage in the end-fire direction.
Chapter 7

Fully Integrated 4-Element Phased-Array

The 4-element transceiver array was fabricated in TSMC’s 65nm CMOS G+ technology to demonstrate our proposed techniques (Fig. 7.1). The antennas are placed in a 2X2 fashion with 1.4mm spacing between them. This spacing is less than $\lambda/2$ and was limited by the total silicon die area.

Figure 7.1: Die photo of complete transceiver.
7.1 Phased-Array Measurement

7.1.1 Transmitter Phased-Array Measurement

Similar to the transmitter antenna pattern measurement mentioned before, the array measurement of the transmitter was performed using an external horn antenna and the 70GHz sampling scope.

To achieve the optimum phase settings, the horn antenna was placed at the desired direction and only one transmitter was turned on. The second transmitter was then turned on and its phase setting was swept to maximize the signal amplitude acquired on the scope. Similar procedures were taken for the third and then the last transmitter, resulting in maximum EIRP in the desired direction.

Once the per-element phases were set, the horn antenna was then swept across different planes to sample the radiation pattern in space. Fig. 7.2 shows an example where the phased-array is pointing at 45° inside the chip plane and the array pattern is shown in black. Compared to the single antenna pattern (shown in red), a 12dB improvement is achieved as expected. The peak-null ratio is more than 21dB.

![TX Radiation Pattern, Measured@10cm](image)

Figure 7.2: Transmitter antenna array pattern inside PCB plane.
7.1.2 Receiver Phased-Array Measurement

Similar procedures were taken for the receiver phased-array measurement. Instead of using a high frequency sampling scope, a 6GHz oscilloscope was used to monitor the baseband output. With the first receiver phase shifter setting unchanged, the phase settings of the rest of the phase shifters were swept. Because of the way phase shifting was implemented in the receivers, only a subset of coefficients whose amplitude is within the 1dB boundary of the ideal curve were used. Once the per-element phases were set, the radiation pattern was measured to verify the functionality of the array. Fig. 7.3 shows the measurement of the array (black) compared to the single receiver (red), demonstrating a functioning phased-array receiver.

![RX Antenna Pattern, Measured@10cm](image)

Figure 7.3: Receiver antenna array pattern inside PCB plane.

7.2 Link Measurement

To demonstrate a working wireless link, two transceiver chips are used together. Fig. 7.4 shows two different configurations of measurements. In the first case, two board are placed in the same plane and the communication relies on the in-plane radiation. The second case puts two board in different planes and thus the utilized radiation is mainly broadside. As seen in the antenna measurement section, the end-fire direction tends to have slightly less gain and so it is expected to have a shorter range.
Using a PRBS-15 sequence at 5.2GS/s (10.4Gb/s), the eye diagram can be captured on the oscilloscope as shown in Fig. 7.5. Notice that the eye diagram is different from normal NRZ eye diagrams because the waveform shape of the transmitter is closer to RZ modulation.

Although the eye diagram indicates the quality of the received eye, bit-error rate (BER)
is more straightforward to measure and quantify. Fig. 7.6 shows the BER versus distance in these two cases. In the broadside case, there is no error within $10^7$ bits up to a distance of 55cm. The end-fire direction has a slightly higher BER for the same communication distance as expected. The first error within $10^7$ bits shows up at a distance of 45cm.

![Figure 7.6: BER measurement versus distance, different directions.](image)

It is desirable to compare our measurement results with the theoretical link budget calculation, as the Friis transmission equation shows:

$$SNR = \frac{P_{TX} N_T^2 \eta_{ant}^2 N_{RX}}{N_o} \left( \frac{\lambda}{4\pi d} \right)^2.$$  

(7.1)

Here both $N_{TX}$ and $N_{RX}$ are equal to 4 and the noise figure of each receiver is 7dB based on simulation results. Using the measured 0dBm transmitter output power as well as antenna efficiency of 25%, the SNR is estimated to be 14dB (Fig. 7.7). One the other hand, the SNR can be back-calculated as 7dB from the measured $10^{-7}$ BER. This suggests our system has 7dB SNR loss compared to its theoretical value. There are a couple of impairments that can potentially cause this degradation. For example, although quite small based on the eye diagram, the inter-symbol-interference (ISI) will degrade our BER. The non-perfect alignment of the boards can also miss align the polarization, introducing further loss to the system.

### 7.3 Summary

As shown in this chapter, the transceiver demonstrates a working 10.4Gb/s link over the air. Because of our antenna switching technique, the transceiver covers at least 45cm overall in all
directions with a BER less than $10^{-7}$. Together with the overall 115mW power consumption, this demonstrates a fully integrated 60GHz phased-array transceiver while achieving excellent overall energy efficiency (Table 7.1).
Table 7.1: Comparison table of mm-wave transceiver designs.

<table>
<thead>
<tr>
<th>Technology</th>
<th>[66]</th>
<th>[18]</th>
<th>[67]</th>
<th>[68]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Modulation</td>
<td>QPSK</td>
<td>QPSK</td>
<td>ASK</td>
<td>OOK</td>
<td>QPSK</td>
</tr>
<tr>
<td>Antenna</td>
<td>Bond Wire</td>
<td>N/A</td>
<td>Bond Wire</td>
<td>On-board Patch array</td>
<td>On-chip Slot loop</td>
</tr>
<tr>
<td>Phased Array</td>
<td>No</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Data Rate</td>
<td>2.62Gb/s</td>
<td>10Gb/s</td>
<td>11Gb/s</td>
<td>3.3Gb/s</td>
<td>10.4Gb/s</td>
</tr>
<tr>
<td>Communication Range</td>
<td>5cm</td>
<td>N/A</td>
<td>1.4cm</td>
<td>60cm</td>
<td>&gt;40cm</td>
</tr>
<tr>
<td>TX:</td>
<td>160mW</td>
<td>137mW</td>
<td>29mW</td>
<td>183mW</td>
<td>50mW</td>
</tr>
<tr>
<td>RX:</td>
<td>233mW</td>
<td>137mW</td>
<td>41mW</td>
<td>103mW</td>
<td>65mW</td>
</tr>
<tr>
<td>Power/Element (TRX)</td>
<td>393mW</td>
<td>68mW</td>
<td>80mW</td>
<td>286mW</td>
<td>29mW</td>
</tr>
</tbody>
</table>
Chapter 8

Conclusions

8.1 Thesis Summary

The advance in wireless technologies has enabled many applications for handset devices. More applications requiring higher data rate have emerged recently. Although recent advances in WiFi technologies have promised data rates up to 1.7Gb/s in a 2X2 MIMO system, the limited bandwidth at 2.4GHz limits the ultimate data rate of the system. To overcome this issue, this thesis therefore focuses on the unlicensed 7GHz bandwidth at 60GHz band to build wireless communication systems supporting 10+Gb/s.

The short wavelength at mm-wave frequencies promises higher SNR in aperture limited applications. To fully utilize this advantage, phased-array techniques are often employed to increase the effective antenna gain. However, classic phased-array implementations are often power hungry, due to the limited performance of devices when operating at close to device $f_T$. This thesis analyzed the power-performance tradeoffs in phased-array systems, concluding that the overhead power consumption in each element is ultimately the limiting factor of the system. This guides the rest of study about how to implement phased-arrays in CMOS technology in an energy efficient manner.

One of the key differences between a phased-array transceiver and non-array transceivers is the phase-shifting and combining. Different implementation schemes, including RF, LO, and baseband phase shifting, were discussed to compare their power consumption. Due to high varactor loss at 60GHz, varactor based phase-shifter are often more power hungry compared to baseband phase shifting schemes, where the baseband amplifier can be modified to accommodate the phase shifting functionality. It was shown that shifting this functionality to a lower frequency baseband achieved a much better power consumption while providing more flexibility and robustness.

A new transmitter architecture utilizing a fast start-up oscillator is proposed to further reduce the overhead power in the transmitter. Conventional approaches usually require a complex modulator as well as two LO buffers to provide enough LO swing, consuming at least $\sim$12mW, limited by the maximum on-chip impedance at this frequency. This proposed
CHAPTER 8. CONCLUSIONS

architecture merges the phase modulation functionality into the fast start-up oscillator by modulating the phase of the start-up signal. The phase shifting functionality is thus moved to the baseband domain where sub-ps resolution delay elements can be implemented efficiently. The phased-array functionality is then a natural extension of the phase modulation capability. With the proposed fast startup oscillator, the transmitter achieves roughly 5% average efficiency and 10% efficiency if the peak amplitude is considered, resulting in more than 2X improvement than prior state of art. The proposed architecture also reduces the overhead power down to \( \sim 2\text{mW} \), making it even more attractive for use in a phased-array.

Although the receiver suffers from a very similar problem of demodulator and LO buffer power consumption, the same architecture can not be applied at the receiver since the incoming signal is small. In order to eliminate the LO buffer as well as multiple lossy matching networks, an architecture using a mixer stacked on top of the LO buffer is proposed. This allows low impedance routing at the natural low impedance node, resulting in boosted efficiency. A push-push structure with embedded impedance transformation is also proposed to replace the normal buffer and further reduce the power consumption. This results in a 30GHz LO generation rather than 60GHz and is achieved efficiently by a 30GHz resonant injection locked tripler. This lower frequency operation boosts the quality factor of the tank and therefore improves the efficiency as well as the phase noise performance. The overall system provides 22dB gain in simulation with just 65mW of power consumption, showing the best published efficiency to date.

The short wavelength at high frequencies also allows integrating antennas on silicon to further reduce the system cost. The intrinsic ground plane of aperture based antennas allows easy integration with the other circuitry. Therefore the slot-loop antenna is chosen in our design. Although achieving efficiency as high as 30% in simulation, the antenna has limited coverage in end-fire directions due to its planar nature. To solve this issue, multiple driving points on the same slot-loop antenna are utilized to implement antenna diversity and T/R switches. The challenge however is to prevent loading effects between different operating modes. Although the transmitter provides a relatively high impedance to the antenna when the bias is turned off, the receiver front-end naturally exhibits low input impedance. Quarter wavelength transmission lines were therefore inserted to achieve impedance transformation as well as signal routing. Measurement results verify the proposed antenna diversity, showing an efficiency close to 30% and full spatial coverage with maximum 3dB loss compared to the peak angle.

Utilizing the proposed transmitter architecture using fast start-up oscillator, stacked mixer structure in the receiver, and on-chip antenna multiplexing scheme, a fully integrated 4-element phased-array transceiver was fabricated and tested over the air. The measurement results demonstrated a coverage of \( \sim 0.5\text{m} \) with bit-error-rate lower than \( 10^{-7} \). The total power consumption of the system is 115mW, representing the state-of-art in efficiency. The proposed architectures also significantly reduced overhead power, therefore making them attractive for phased-array designs.
8.2 Future Directions

This thesis focused on applications requiring relatively short range within a room. There is an opportunity to further investigate how to achieve high efficiency for long communication range, especially in cellular systems where asymmetric links can be implemented. Due to their asymmetric nature, large arrays of transmitters at the base stations can be employed using the techniques proposed here for better efficiency. One of the challenges with high-data-rate, large-aperture systems is that their time-of-fly across the array might be longer than the bit period, and therefore it is necessary to implement timed-arrays rather than phased-arrays. Our proposed transmitter architecture can be easily modified to support timed-array, and thus it is interesting to study the adoption in such systems.

At short distance, higher data rate is one of the most desired features in the near future. Compared to its wired counterpart, wireless links have so far been limited to $\sim 10\text{Gb/s}$. Multiple venues can be explored for higher data rates. MIMO systems utilizing multiple path in space is another approach. Although fundamentally requiring multi-path, MIMO can potentially boost the channel capacity significantly [69]. The associated power challenge is due to its increased dynamic range on the signal path. Moving up in spectrum for a wider bandwidth is another good candidate. Many recent efforts have pushed the operating frequency of CMOS beyond the 200GHz regime [70, 65]. Utilizing these frequencies can potentially achieve $\sim 50\text{Gb/s}$ data rate. However, with current technologies, the energy required to generate these high frequency carrier is rather high, preventing a low cost implementation. Investigating improvements to the efficiency will be quite interesting and intriguing.

Besides future communication systems described above, mm-Wave is also known as a good candidate for imaging systems due to its short wavelength [71]. Given the very different system requirements in these two applications, research can be conducted to re-optimize the performance for an imaging system. Nevertheless, many circuit level techniques described in this thesis, including efficient RF pulse generation, precise timing control, and etc., can potentially be extended into these systems, enabling low power and low cost designs.
Bibliography


