# Design of the Aggregator Module ASIC for the Octopus-Mimetic Neural Implant (OMNI) 



Electrical Engineering and Computer Sciences University of California at Berkeley

Technical Report No. UCB/EECS-2016-77
http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-77.html
May 13, 2016

Copyright © 2016, by the author(s). All rights reserved.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.

# Design of the Aggregator Module ASIC for the Octopus-Mimetic Neural Implant (OMNI) 

by Nathaniel Anthony Mailoa

## Research Project

Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of Master of Science, Plan II.

Approval for the Report and Comprehensive Examination:


Abstract<br>Design of the Aggregator Module ASIC for the Octopus-Mimetic Neural Implant (OMNI)<br>by<br>Nathaniel Anthony Mailoa<br>Master of Science in Electrical Engineering and Computer Sciences<br>University of California, Berkeley<br>Professor Elad Alon, Chair

The Aggregator Module (AM) is one of the 3 modules in the Octopus-Mimetic Neural Implant (OMNI) Brain-Machine Interface system. The AM is responsible for communication between the Control Module (CM) and the Neuromodulator Modules (NMs) as well as power distribution to the NMs. The current prototype of the AM is implemented in PCB and consumes 1.6 mW . This project aims to move the design to an ASIC to reduce power consumption and area.

The AM system consists of a Voltage Rectification and a Voltage Regulation block to produce a constant 1 V supply from the 3.3 Vpp AC power driven by the CM . The AM also recovers a 20 MHz clock from the power signals which is used for the operation of the Digital Logic block as well as the charge pumps. Lastly, the AM contains power switches driven by signals from the Digital Logic block that is level-shifted to the voltages retrieved from the charge pump. These signals are used to drive analog power switches for each NM.

The ASIC implementation, excluding I/O cells, consumes $87.82 \mu \mathrm{~W}$, more than an order of magnitude lower than the current PCB prototype.

## Contents

List of Figures ..... ii
List of Tables ..... iv
1 Introduction ..... 1
1.1 The Octopus-Mimetic Neural Implant (OMNI) ..... 1
1.2 The Aggregator Module (AM) ..... 3
2 Digital Logic ..... 5
2.1 Digital Logic Block Operation ..... 5
2.2 Digital Logic Block Synthesis ..... 6
3 Voltage Rectification ..... 8
3.1 Rectifier Bridge ..... 8
3.2 Active Diode ..... 10
3.3 Active Diode Comparator ..... 12
4 Voltage Regulation ..... 16
4.1 Voltage Reference Generator ..... 16
4.2 Low-Dropout Regulator (LDO) ..... 25
5 Clock Recovery ..... 31
6 Power Switches ..... 34
6.1 Positive Charge Pump ..... 35
6.2 Negative Charge Pump ..... 38
6.3 Logic Level Shifter ..... 39
6.4 Power Switch ..... 45
7 Conclusion ..... 47
Bibliography ..... 49

## List of Figures

1.1 OMNI System Illustration ..... 2
1.2 OMNI System-Level Block Diagram ..... 2
1.3 AM System Block Diagram ..... 4
2.1 Digital Logic Block Layout and Layout Without Routing ..... 7
3.1 Voltage Rectification Block Diagram ..... 8
3.2 Common Rectifier Schematic ..... 8
3.3 Cross-coupled Rectifier Bridge Schematic ..... 9
3.4 Cross-coupled Bridge Rectifier Transient ..... 10
3.5 Active Diode Schematic ..... 11
3.6 Active Diode Transient ..... 12
3.7 Comparator B ..... 13
3.9 Active Diode Comparator Stage 1 Transfer Function ..... 14
3.10 Active Diode Comparator Transient ..... 15
3.11 Voltage Rectifier Block with and without Active Diode ..... 15
4.1 Voltage Regulation Block Diagram ..... 16
4.2 Subthreshold Voltage Reference Schematic ..... 17
4.3 Subthreshold Voltage Reference PSRR ..... 18
4.4 Subthreshold Voltage Reference Variations to Supply Voltage and Temperature ..... 18
4.5 Subthreshold Voltage Reference Startup Transient ..... 19
4.6 Bandgap Voltage Reference Schematic ..... 20
4.7 Bandgap Voltage Reference Startup Transient ..... 21
4.8 Bandgap Voltage Reference PSRR ..... 22
4.9 Bandgap Voltage Reference Variations to Supply Voltage and Temperature ..... 22
4.10 Bandgap Voltage Reference Op-amp Schematic ..... 23
4.11 Bandgap Op-amp Transfer Function ..... 24
4.12 Bandgap Op-amp Supply Transfer Function ..... 25
4.13 Low-Dropout Regulator Schematic ..... 26
4.14 Low-Dropout Regulator Op-amp Schematic ..... 27
4.15 LDO Op-amp Transfer Function ..... 28
4.16 LDO Op-amp Supply Transfer Function ..... 29
4.17 Subthreshold Reference LDO Startup Transient ..... 30
4.18 Bandgap Reference LDO Startup Transient ..... 30
5.1 Clock Recovery Schematic ..... 32
5.2 Clock Recovery Stage 1 Transfer Function ..... 33
5.3 Clock Recovery Transient ..... 33
6.1 Power Switches Block Diagram ..... 34
6.2 Positive Charge Pump Schematic ..... 36
6.3 Positive Charge Startup Transient ..... 37
6.4 Charge Pump Output Transient with Level Shifter Switching ..... 37
6.5 Negative Charge Pump Schematic ..... 38
6.6 Negative Charge Pump Transient ..... 39
6.7 Switch Logic Level Shifter Block Diagram ..... 39
6.8 Switch Logic Level Shifter Block A Schematic ..... 40
6.9 Switch Logic Level Shifter Block B Schematic ..... 41
6.10 Switch Logic Level Shifter Block C Schematic ..... 42
6.11 Switch Logic Level Shifter Block D Schematic ..... 43
6.12 Switch Logic Level Shifter Block E Schematic ..... 43
6.13 Level Shifter Transient ..... 44
6.14 Zoomed Level Shifter Transient ..... 45
6.15 Power Switch Schematic ..... 46
6.16 Power Switch Transient ..... 46
7.1 Transient response of the AM system ..... 47

## List of Tables

1.1 OMNI CM-AM Frame Structure ..... 3
2.1 AM Digital Logic Block Inputs ..... 5
2.2 AM Digital Logic Block Outputs ..... 6
2.3 Synthesized AM Digital Logic Block Power Consumption ..... 7
3.1 Cross-coupled Rectifier Bridge Devices ..... 9
3.2 Active Diode Devices ..... 11
3.3 Active Diode Comparator Devices ..... 14
4.1 Subthreshold Voltage Reference Devices ..... 17
4.2 Bandgap Voltage Reference Devices ..... 20
4.3 Bandgap Op-amp Devices ..... 24
4.4 LDO Op-amp Devices ..... 28
4.5 LDO Devices ..... 29
5.1 Clock Recovery Devices ..... 32
6.1 Positive Charge Pump Devices ..... 36
6.2 Negative Charge Pump Devices ..... 38
6.3 Switch Logic Level Shifter Block A Devices ..... 40
6.4 Switch Logic Level Shifter Block B Devices ..... 41
6.5 Switch Logic Level Shifter Block C Devices ..... 42
6.6 Switch Logic Level Shifter Block D Devices ..... 43
6.7 Switch Logic Level Shifter Block E Devices ..... 44
6.8 Power Switch Devices ..... 46
7.1 Power Breakdown of AM system ..... 48

## Acknowledgments

I would like to thank my advisor, Prof. Elad Alon, as well as OMNI project's co-PI, Prof. Jan Rabaey for giving me with the opportunity to work on this project. I would also like to thank Nathan Narevsky, Ali Moin and George Alexandrov who have continually supported and helped me in this project. Last but not least, I would like to thank my parents and peers for always pushing me forward.

## Chapter 1

## Introduction

### 1.1 The Octopus-Mimetic Neural Implant (OMNI)

As part of the White House Brain Research through Advancing Innovative Neurotechnologies (BRAIN) initiative, a team of UC Berkeley, UC San Francisco, Lawrence Livermore National Laboratory as well as Cortera Neurotechnologies researchers are building a systems-based closed-loop therapy for neuropsychiatric disorders, funded by the Defence Advanced Research Projects Agency (DARPA). These disorders include depression, anxiety, PTSD (Posttraumatic Stress Disorder) and TBI (Traumatic Brain Injury). The goal of the system is to measure how the disorders manifest in the brain and modulate precise intervention based on real-time neurophysiological feedback. The closed-loop treatment is designed to force the brain to unlearn dysfunctions.

Current closed-loop neuromodulation systems such as the Neuropace RNS and the Medtronic Activa PC+S are limited in spatial resolution, on-board data processing, as well as data streaming bandwidth to the external host. The Octopus-Mimetic Neural Implant (OMNI) is developed as a neuromodulation system that is modular, distributed, intelligent, and efficient. The system can be easily reconfigurable with up to 8 implants in the brain, each containing up to 64 channels. With a modular system, the implants can reach physically separated regions of the brain. The system also has a reconfigurable processing platform that can be used for autonomous closed-loop operation as well as data storage. The improved and added features are traded off with battery life, which is targeted to be in the order of 1 month of operation between recharges. This is significantly lower than the current systems mentioned above, which can operate in the order of years without recharging.

The OMNI consists of 3 modules: the Control Module (CM), Aggregator Module (AM), and the Neuromodulator Modules (NMs). The CM contains an FPGA with an ARM core, a radio and a wirelessly-rechargeable battery that provides the on-board computation power, communication capability and power supply for the whole system. The AM is the main data hub, controlling the data flow between the CM and the NM, while the NMs record brain activity and stimulate precise locations in the brain.

To facilitate communication and power distribution in the modules, the OMNI contains the


Figure 1.1: OMNI System Illustration
following links:

- Downstream: c2a_valid, c2a_data, a2n_valid[7:0], a2n_data [7:0]
- Upstream: n2a_data[7:0], a2c_data
- Power: c2a_power+, c2a_power-, a2n_power+[7:0], a2n_power- [7:0]

The final system facilitates up to 8 NMs , but the current prototype supports up to 4 implants. At the moment, the CM and AM are implemented on PCBs with off-the-shelf components while the NM is an ASIC. This research project aims to build an ASIC version of the AM, which will greatly reduce power consumption and area.


Figure 1.2: OMNI System-Level Block Diagram

### 1.2 The Aggregator Module (AM)

## Functionality

The Aggregator Module (AM) controls which NM is turned on and facilitates communication between the CM and active NMs through time-interleaving the data and valid signals. The AM runs on the 20 MHz AC power delivered by the c2a_power lines which are driven at 3.3 Vpp . A 20 MHz clock is recovered from the c2a_power signals for its operation.

## Downstream Communication

The AM receives frames sent from the CM on the c2a_valid and c2a_data lines at a baud rate of 20 MHz . Each frame consists of a pilot and a reset bit as well as a bit for each of the 8 NMs , totaling to 10 bits. The pilot and reset bits determine the type of frame which trigger actions such as resetting the AM or the NMs, switching the NMs on or off, and sending data to the NMs.

| Bit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| ---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Slot | Pilot | Reset Type | NM1 | NM2 | NM3 | NM4 | NM5 | NM6 | NM7 |
| NM8 |  |  |  |  |  |  |  |  |  |  |

Table 1.1: OMNI CM-AM Frame Structure
The AM sends data to the NMs on the a2n_valid and a2n_data lines at a baud rate of 2 MHz . If the frame received in the CM-AM link is a data frame, the AM demultiplexes each slot designated to each NM in the frame and sends the corresponding bits to each NM.

## Upstream Communication

The upstream communication from each NM to the CM is similar to the downstream communication, except there is no valid line. The AM receives data from the NM at a baud rate of 2 MHz and time-multiplexes the data into 10-bit frames with a pilot and a valid bit. These frames are sent to the CM at a baud rate of 20 MHz .

## AM System

Fig. 1.3 shows the system-level block diagram of the AM. The c2a_power lines are rectified through a Voltage Rectification circuit (Chapter 3), which is fed into the Voltage Regulation circuit (Chapter 4) to produce a 1 V supply. They are also used by the Clock Recovery circuit (Chapter 5) which produces a 20 MHz clock. This clock, along with the regulated voltage, is used by the Digital Logic block (Chapter 2). Lastly, the clock, supply as well as the digital signal from the Digital Logic block are used to control the connection between c2a_power and a2n_power.

The OMNI runs on a wirelessly-rechargeable battery, so power consumption is a priority. The current PCB prototype of the AM consumes 1.6 mW . The goal of this ASIC is to reduce the power consumption by an order of magnitude. Since power is the main design objective, most of the


Figure 1.3: AM System Block Diagram
time when there is a tradeoff of power and area, area is sacrificed. This is tolerable since the ASIC is smaller than the PCB prototype by nature much. The ASIC is designed in the TSMC 65nm LP technology.

## Chapter 2

## Digital Logic

The Digital Logic block of the AM facilitates the communication between the CM and the NMs. It keeps track of and synchronizes to the CM's frames using an internal counter and demultiplexes the data from the CM to each NM for the downstream communication. It also time-multiplexes the upstream communication from the NMs and sends it to the CM using the frame structure.

### 2.1 Digital Logic Block Operation

## I/O

The Digital Logic block has I/Os that are directly connected to the ASIC pinouts. The inputs are received by a cross-coupled inverter with a weak feedback inverter forming a weakened latch. Inverters drive the output of the logic block down cables to the other modules.

The Digital Logic block has the following inputs:

| clk_am | 20MHz clock recovered from the power lines |
| ---: | :--- |
| c2a_data | CM-AM data line containing the CM frames |
| c2a_valid | CM-AM valid line containing the CM frames |
| n2a_data[3:0] | NM-AM data line containing bits streamed from the NM |
| reset | Hardware reset for debugging |

Table 2.1: AM Digital Logic Block Inputs

It has the following outputs:

| a2n_data[3:0] | AM-NM data line containing bits demultiplexed from c2a_data |
| ---: | :--- |
| a2n_valid[3:0] | AM-NM valid line containing bits demultiplexed from c2a_valid |
| nm_switch[3:0] | NM power switch control |
| a2c_data | AM-CM data line containing frames reconstructed from <br> time-multiplexing n2a_data |

Table 2.2: AM Digital Logic Block Outputs

## Synchronization

The AM has to synchronize with the CM to keep track of the start of each frame. To do this, it has a shift register that detects a certain bit pattern of c2a_valid signal that corresponds to an empty frame. If this sequence is found, a counter that keeps track of the slot of the frame is reset.

## Downstream Logic

The AM samples its inputs from the CM at the negative edge of the clock and saves them in a set of registers. It also prepares for a synchronized reset, NM switching, or NM reset when clock is low. On the positive edge of the clock, the captured data is sent to the NMs. Consequently, there is a 1 clock cycle delay between the c2a inputs and the a2n outputs. Since the NMs expect data at 2 MHz , one-tenth the baud rate of the CM-AM link, this is not a problem. The AM is also responsible for sending a bit sequence that triggers the NMs to reset if it receives an NM reset frame from the CM.

## Upstream Logic

The upstream data is not registered. The AM outputs either a pilot bit, a reset type bit, or one of the NMs' data bit according to the internal counter. These data bits make up a packetized transaction between the CM and NMs, which is encoded in the NM and decoded in the CM. The $A M$ is not involved in the higher-level packeting structure.

### 2.2 Digital Logic Block Synthesis

The Digital Logic block was synthesized using the TSMC 65nm LP library running at 1V supply. Running it at 1V reduces the power consumption although some extra circuitry is needed to interface between the Digital Logic block and the chip IOs, which requires 1.2 V swing.

Fig. 2.1 shows the layout of the Digital Logic block that has been synthesized using Synopsys Design Compiler and place-and-routed using Synopsys IC Compiler. The design is $37 \mu \mathrm{~m}$ by $38 \mu \mathrm{~m}$ in size, including the keepout margin. The ICC estimates for power are shown in table 2.3.

The critical path is 11 logic gates long, but since the clock frequency is very low, all of the cells used are HVT cells. In fact, registers make up almost half of the standard cells in the design.


Figure 2.1: Digital Logic Block Layout and Layout Without Routing

| Switching Power (mW) | Leakage Power $(\mathrm{nW})$ | Total Power $(\mathrm{mW})$ |
| :---: | :---: | :---: |
| $3.08 \times 10^{-3}$ | $9.24 \times 10^{-3}$ | $1.23 \times 10^{-2}$ |

Table 2.3: Synthesized AM Digital Logic Block Power Consumption

## Chapter 3

## Voltage Rectification

The AM receives power through the c2a_power+ and c2a_power- lines. The lines carry a 3.3 Vpp AC power at 20 MHz . To be able to run the Digital Logic block and the Power Switches, the AM has to first rectify the power lines. Fig. 3.1 shows the block diagram of the Voltage Rectification block. The power lines are first rectified to positive voltages through a bridge rectifier, which then passes through an active diode to produce $V_{\text {rec }}$, the supply voltage to be regulated by the Voltage Regulation block discussed in Chapter 4.


Figure 3.1: Voltage Rectification Block Diagram

### 3.1 Rectifier Bridge



Figure 3.2: Common Rectifier Schematic

The first step in rectifying the AC power is to convert the negative voltages to positive voltages. This is commonly done using the diode bridge circuit formed by Q1-Q4 in fig. 3.2. In principal, the circuit allows current to flow only from the ground to the inputs and from the inputs to the output node, therefore forcing positive voltages at the output. The diodes can be realized using diode-connected PMOS and NMOS in a CMOS process. However, the diode connected transistors require $V_{T H}$ across it to fully conduct current. As a result, the output can never go above $V_{i n}-V_{T H}$.


Figure 3.3: Cross-coupled Rectifier Bridge Schematic
To counter this effect, the gates of the transistors can be driven by the opposite input as shown in fig. 3.3. When $V_{p}$ is high and $V_{n}$ is low, M 3 has a negative $V_{G S}$, shutting it completely and reducing leakage current from $V_{p}$ to ground. Similarly, M 2 has a positive $V_{G S}$, completely shutting it. M4, in return, is driven by a very large negative $V_{G S}$, causing it to drive more current. M 1 is also driven with a very high $V_{G S}$, allowing it to drive more current. This cross-coupled bridge rectifier results in an output voltage that is almost equal to the absolute value of the input voltages.

The devices used are shown in table 3.1.

$$
\begin{array}{l|l|}
\hline \mathrm{M} 1, \mathrm{M} 2, \mathrm{M} 3, \mathrm{M} 4(3.3 \mathrm{~V}) & 16 \mu \mathrm{~m} / 500 \mathrm{~nm}
\end{array}
$$

Table 3.1: Cross-coupled Rectifier Bridge Devices

The transient response of the cross-coupled bridge rectifier is shown in fig. 3.4. As expected, the cross-coupled rectifier is able to drive the output to voltages much closer to the inputs than the regular rectifier circuit. This allows the output voltage to reach a higher average voltage than the regular bridge rectifier circuit, which is helpful for the Voltage Regulation block. It also provides some headroom such that if the inputs are not exactly 3.3 Vpp the regulation circuit still works fine.

Output Voltage of NVC with Resistive Load


Figure 3.4: Cross-coupled Bridge Rectifier Transient

### 3.2 Active Diode

Before generating a constant power supply from the rectified voltage, it needs to be stored in a capacitor to form an acceptable ripple that is tolerable by the voltage regulation circuit. This can be done by the diode Q5 in fig. 3.2. However, as we have seen in the bridge rectifier circuit, the diode has a $V_{T H}$ across it, so the output voltage can only go as high as $V_{\text {bridge }}-V_{T H}$.

To solve this issue, a technique similar to the cross-coupled bridge rectifier is used. The diode Q5 is implemented using a PMOS since a PMOS does not have a $V_{d s, s a t}$ requirement at high voltages. The gate of the PMOS, however, is driven by a signal that depends on the input of the diode. If the input is higher than the output, the gate is driven low (OV), causing the PMOS to be fully turned on. If the input is lower than the output, the gate is driven to the highest voltage ( $V_{\text {out }}$ ). This effectively turns the PMOS off. This mechanism is implemented using a comparator between the input and output voltages powered by the output voltage. The circuit is shown in fig. 3.5.

There are a couple issues that need to be considered in this circuit.
Firstly, the body voltage of M1 has to be regulated. If it is tied to either the input or the output, there is always a time when its source is at a higher voltage, causing a change in the threshold voltage of M1. To deal with this issue, M3 and M4 are used to regulate the body of M1. M4 is turned on if the output is higher than the input, causing the body of M 1 to be biased at the output voltage. If the output is lower than the input, M4 is turned off and M3 is turned on, causing the body of M1 to be biased at the input voltage. Thus, the M3-M4 forces the body of M1 to be at the highest voltage.

Secondly, there might be startup issues since the op-amp does not work as desired until it has a certain supply voltage, sourced from the output node. This issue is resolved by adding a


Figure 3.5: Active Diode Schematic
diode-connected M3 in parallel to the active diode. This device is minimum-sized and is used only to ensure the output node reaches a certain voltage when the op-amp behaves as expected.

The devices used in the active diode besides the op-amp is shown in table 3.2.

| M1 | $5 \mu \mathrm{~m} / 100 \mathrm{~nm}$ |
| :---: | :---: |
| M2,M3,M4 | $200 \mathrm{~nm} / 60 \mathrm{~nm}$ |

Table 3.2: Active Diode Devices

In addition, a 3.34 pF capacitor $-16 \times(10 \mu m \times 10 \mu m)$ - is added in the output to store the rectified voltage.

The transient response of the active diode is shown in fig. 3.6. The cyan line is the comparator output, which goes low only when the anode (input node) is higher than the cathode (output node). The plot also compares the output voltage of the active diode to that of a regular diodeconnected PMOS of the same size, which suffers from the $V_{T H}$ drop. The transient simulation is done with with the comparator discussed in the next section along with a 3.3 pF load capacitor and a $100 \mathrm{k} \Omega$ load resistor.


Figure 3.6: Active Diode Transient

### 3.3 Active Diode Comparator

## Design Considerations

The Active Diode requires a comparator that has a relatively high bandwidth. For the active diode to work properly, the comparator has to have a decent phase shift at the 20 MHz input frequency. If the phase shift is too much, the diode would turn on and off too late, causing faulty operation. The first stage of the comparator has to be a differential pair, which drives a series of inverters of increasing drive strength.

A DC sweep of the inverter transfer function with 1.2 V supply reveals that the output transitions when the input is between 500 mV and 650 mV . Since the input voltage varies from 0 V to 1.6 V , the gain of the first stage does not have to be large at all. A small-signal design methodology is not really suitable since the input range is very high; however, rough estimates of the current $I_{D}$ needed for a sufficient phase shift can be calculated.

Assuming a gain of 10 with a 3 dB bandwidth of 20 MHz for a phase shift of $45^{\circ}$, the gainbandwidth product is 200 MHz . With a load capacitance of 3 fF from the input capacitance of the smallest inverter, this requires a transconductance $g_{m}$ of $3.77 \mu \mathrm{~S}$. Using the current efficiency $\left(\mathrm{V}^{*}\right)$ design methodology, the required drain current $I_{D}$ is approximately 285 nA .

If the phase shift from this design method is not enough, the current could be increased later, trading off speed with power consumption.

## Implementation

We consider two architectures of differential pair: Comparator A and Comparator B.


Figure 3.7: Comparator B
(a) Active Diode Comparator Schematic

Comparator A uses a common NMOS input differential pair with current mirror loading. Since the gates of M 1 and M 2 are biased at very high voltages, the NMOSes are very long devices, resulting in a high output resistance that might be too high for the comparator to achieve the desired phase shift. This architecture also requires a tail transistor bias, which adds more power consumption in a bias network.

Comparator B uses a common gate input stage in M1. It is biased by M3-M5, which can be thought of as the negative input of the comparator, from the rectified output of the Active Diode. It has a higher output swing, although the output swing is not an important specification for the comparator. Since the gates of M1 and M3 can be arbitrarily biased depending on the sizing of M3-M5, the devices can be sized accordingly for the desired output resistance.

Comparator B is implemented with a bias current of 285 nA on both branches when running on a 1.2 V supply. The devices shown in table 3.3.

| M1 | $850 \mathrm{~nm} / 100 \mathrm{~nm}$ |
| :---: | :---: |
| M2 | $200 \mathrm{~nm} / 150 \mathrm{~nm}$ |
| M3 | $1.14 \mu \mathrm{~m} / 100 \mathrm{~nm}$ |
| M4,M5 | $315 \mathrm{~nm} / 150 \mathrm{~nm}$ |
| I 1 | INVD0_HVT |
| I 2 | INVD2_HVT |
| I3 | INVD4_HVT |

Table 3.3: Active Diode Comparator Devices

The transfer function of the stage 1 amplifier is shown in fig. 3.9. It has an open loop gain of 23.5 dB and a phase shift of $-36.5^{\circ}$ at 20 MHz input frequency. The phase shift could have been made smaller by reducing the output resistance, thus shifting the pole out to a higher frequency. However, this requires making M2 shorter, and for variability reasons this was avoided.


Figure 3.9: Active Diode Comparator Stage 1 Transfer Function

The operation of the comparator is verified with an input sinusoid of amplitude 400 mV biased at 1.2 V in fig. 3.10.

AD Comparator Transient


Figure 3.10: Active Diode Comparator Transient

The most important step is to verify that the phase shift is small enough for the Active Diode to work properly. The output of the Voltage Rectifier block with and without the active diode are compared in fig. 3.11. The Voltage Rectification block is loaded with a 3.3pF capacitor and a $100 \mathrm{k} \Omega$ resistor. From the transient waveform, the phase shift in the designed comparator is acceptable.


Figure 3.11: Voltage Rectifier Block with and without Active Diode

## Chapter 4

## Voltage Regulation

The rectified voltage $V_{\text {rec }}$ has to be regulated to a constant supply voltage for the Digital Logic block and the power switches. To regulate the voltage, the rectified voltage is first used to generate a reference voltage $V_{\text {ref }}$, which is used in a Low-Dropout Regulator (LDO) to get an output of 1V.


Figure 4.1: Voltage Regulation Block Diagram
Since the rectified voltage $V_{\text {rec }}$ is a ripple and a constant 1 V output voltage is needed, the voltage regulation circuit has to have a very high Power Supply Rejection Ratio (PSRR). Although the output voltage is stored in a 7.5 pF capacitor to source the Digital Logic block and the Power Switches, the intrinsic PSRR of the reference generator and the LDO have to be acceptable, especially at 20 MHz , the frequency of $V_{\text {rec }}$.

### 4.1 Voltage Reference Generator

## Design Considerations

Since the LDO uses an op-amp to regulate the output, if the op-amp has a low gain at 20 MHz , the voltage reference does not need to have a significantly high PSRR. Instead, the more crucial specification for the reference generator is the sensitivity of the output voltage to DC supply voltage and temperature variations. Out of the two, the variations to supply is more important since the ASIC is designed to be implanted and thus will have an approximately constant temperature.

A survey of different methods of generating voltage references was done. Two designs are chosen because of their strengths and implemented.

## Subtreshold Voltage Reference

The Subthreshold Voltage Reference presented in [1] is shown in fig. 4.2.


Figure 4.2: Subthreshold Voltage Reference Schematic

M1 is an LVT device while the other NMOSes are regular VT devices. All four NMOSes in the circuit are biased in the subthreshold region. By considering the subthreshold exponential currents of the devices, the voltage of node A and the output node can be derived to be dependent on the difference between of the LVT and RVT threshold voltages, the $V_{T}$ of the subthreshold current, and the size of the devices. The difference in the threshold voltages is exploited to generate the reference voltage. Sizing M2 and M3 at the same size results in zero temperature constant voltages at node $A$ and the output node. The device sizes are shown in the table 4.1. The reference voltage is in the range of 100 mV .

| M1 (LVT) | $10 \mu \mathrm{~m} / 10 \mu \mathrm{~m}$ |
| :---: | :---: |
| M2,M3,M4 | $1 \mu \mathrm{~m} / 10 \mu \mathrm{~m}$ |

## Table 4.1: Subthreshold Voltage Reference Devices

The main advantage of this circuit is its very low power consumption. Since the transistors are all in subthreshold region, they only consume 5.5 pA at 1.2 V supply. This voltage reference also has a very high PSRR at high frequencies. Fig. 4.3 shows the PSRR, which reaches 84.7 dB at 20 MHz .


Figure 4.3: Subthreshold Voltage Reference PSRR

Since the output voltage is engineered to have zero temperature constant, the temperature variations is minimal. Between $25^{\circ} \mathrm{C}$ and $50^{\circ} \mathrm{C}$, the reference voltage varies for $0.9 \%$. It is also quite susceptible to supply voltage variations. Between 1 V and 1.5 V supply voltages, the reference voltage varies for $2.3 \%$. As mentioned before, the supply sensitivity is more important than the temperature sensitivity because of the application of the ASIC. Thus, the $2.3 \%$ variation might cause some concern.


Figure 4.4: Subthreshold Voltage Reference Variations to Supply Voltage and Temperature

Another advantage of this circuit is its simplicity. It is made up of only 4 transistors with no capacitors or resistors required, consuming little area even when the devices are sizeable. It also does not require startup circuitry. However, as seen in fig. 4.5, it takes around $100 \mu$ s to reach a stable output. On top of that, the circuit relies heavily on the difference of the threshold voltage of the RVT and the LVT NMOSes. Random variations on the threshold voltages can be harmful to the operation of the circuit.


Figure 4.5: Subthreshold Voltage Reference Startup Transient

## Bandgap Voltage Reference

A Bandgap voltage reference is commonly used for generating a temperature-insensitive reference voltage or current. A low-voltage variant of the voltage reference is shown in fig. 4.10.

The current flowing down the core branches - the branches containing M1 and M2-are the summation of a Proportional to Absolute Temperature (PTAT) current generated by the difference in $V_{B E}$ of the two diode-connected PNP transistors, converted into current by $R_{1}$, and a Complementary to Absolute Temperature (CTAT) current imposed by the $V_{B E}$ of $Q_{1}$, converted into current by $R_{2}$ in both branches. An op-amp in negative feedback is used to force the $\Delta V_{B E}$ across $R_{1}$. The output of the op-amp controls the current flowing through M3, which generates a bias for the current mirror M4/M1 and M4/M2. M3 and M4 provide additional loop gain and improves the PSRR, as shown in [6].

It is important to note, however, that for M3 and M4 to help with PSRR, the op-amp has to have a bandwidth that is higher than the signal $(20 \mathrm{MHz})$. As seen in the active diode comparator design, achieving a high bandwidth requires significant power. Instead of building a high-bandwidth op-amp, the PSRR is improved by adding capacitors C1 and C2 instead. These capacitors consume more area but the overall power consumption is significantly reduced. C2 is added between the


Figure 4.6: Bandgap Voltage Reference Schematic
supply line and the gates of $\mathrm{M} 1, \mathrm{M} 2$ and M 4 , coupling the supply ripple to the bias voltage. This creates a similar effect to having a high bandwidth amplifier. C1 is added at the output node, effectively adding a pole at the output node.

Note that this circuit is optimized for power consumption, and the design produces a zero temperature constant current. The actual reference voltage varies with temperature because the R3's resistance is dependent on temperature. Usually Bandgap voltage references use another branch to create temperature-independent voltage. However, since temperature is not a big concern in this application, more attention was given to supply voltage variations in the design.

The devices are shown in table 4.2.

| Q1 | $1 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| :---: | :---: |
| Q2 | $16 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| M1,M2 | $1 \mu \mathrm{~m} / 120 \mathrm{~nm}$ |
| M3 | $200 \mathrm{~nm} / 500 \mathrm{~nm}$ |
| M4 | $650 \mathrm{~nm} / 250 \mathrm{~nm}$ |
| MS1 | $200 \mathrm{~nm} / 15 \mu \mathrm{~m}$ |
| MS2 | $1 \mu \mathrm{~m} / 60 \mathrm{~nm}$ |
| MS3 | $200 \mathrm{~nm} / 5 \mu \mathrm{~m}$ |
| R1 | $35.82 \mathrm{k} \Omega: 1 \times(2 \mu \mathrm{~m} \times 50 \mu \mathrm{~m})$ |
| R2,R3 | $236.4 \mathrm{k} \Omega: 6.6 \times(2 \mu \mathrm{~m} \times 50 \mu \mathrm{~m})$ |
| C1,C2 | $834.4 \mathrm{fF}: 4 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |

Table 4.2: Bandgap Voltage Reference Devices

The Bandgap core needs to be started by pushing current through the PNP transistor Q1 before it reaches its diode voltage. To do this, MS2 is biased such that it is in saturation when the core has not started up, but has a negative $V_{G S}$ when the core operates. The operation is shown in fig. 4.7. The green line is the supply, the blue line is the reference voltage while the red line is the current flowing through MS2. MS1 and MS3 are used to bias MS2. These devices are long to minimize the current, but does not have to be exactly matched as long as the gate of MS2 is lower than the core voltage in operation.


Figure 4.7: Bandgap Voltage Reference Startup Transient

The Bandgap voltage reference produces a voltage around 650 mV . Excluding the op-amp, it consumes $12.34 \mu \mathrm{~A}$ at 1.2 V . It has a PSRR of 33.33 dB at 20 MHz .

Due to some design choices, the reference voltage is sensitive to temperature. At 1.2 V supply, it varies for $7.25 \%$ between $25^{\circ} \mathrm{C}$ and $50^{\circ} \mathrm{C}$. However, it is very insensitive to supply variations; between 1 V and 1.5 V the reference changes by $0.0234 \%$. The green line in fig. 4.9 shows the current flowing through the core at different temperatures, showing an almost constant behavior. The reference voltage slope is caused by the slight PTAT behavior of the resistors.


Figure 4.8: Bandgap Voltage Reference PSRR

Bandgap Voltage Reference with VT Variations


Figure 4.9: Bandgap Voltage Reference Variations to Supply Voltage and Temperature

Although the Bandgap reference consumes more power than the Subthreshold reference, it has a much faster startup time (by 2 orders of magnitude) and a much better supply insensitivity, which is important since the supply is unregulated at this point.

## Bandgap Reference Op-amp

The Bandgap reference requires an op-amp to fix the $\Delta V_{B E}$ drop across $R_{1}$. This op-amp needs to have a high open-loop gain so the loop gain is large enough not to cause static error. As discussed before, ideally the op-amp has a high bandwidth. However, assuming a gain of 1000 and a 3 dB bandwidth of 20 MHz , the unity gain bandwidth is 20 GHz . Even though the amplifier only drives the gate capacitance of M3 in fig. 4.10, a quick calculation using the $V^{*}$ current efficiency method reveals that the amplifier requires currents in the order of $40 \mu \mathrm{~A}$.

$$
\frac{g_{m}}{C_{L}}=\omega_{u}=2 \pi \cdot 20 \times 10^{9} \quad g_{m}=2.5 \times 10^{-} 4 \quad I_{D}=\frac{g_{m} V *}{2}=18.75 \mu \mathrm{~A}
$$

Since power consumption is a priority, the op-amp is designed to have a very low bandwidth. The low bandwidth is compensated by having the coupling cap C2 in fig. 4.10.


Figure 4.10: Bandgap Voltage Reference Op-amp Schematic
The schematic of the Bandgap op-amp is shown in fig. 4.10. It is composed of an NMOS differential pair with cascoded NMOS and a current mirror load with a common source second stage. The cascode is used to create a high impedance node for a large gain. A compensation capacitor is also used in an internally-compensated configuration proposed in [3] to create a lower-frequency pole. As a side effect, the compensation capacitor creates poles and zeros a high frequencies, which does not interfere with operation as the amplifier has a low bandwidth and the new poles and zeros are beyond the unity-gain bandwidth. The op-amp shares a bias at 600 mV that is generated in the LDO op-amp discussed in later in this chapter.

The op-amp is designed with a high gain and low bandwidth; the devices in the circuit are listed in table 4.3.

| M0 | $200 \mathrm{~nm} / 1.57 \mu \mathrm{~m}$ |
| :---: | :---: |
| M1,M2 | $200 \mathrm{~nm} / 525 \mathrm{~nm}$ |
| M3,M4 | $3.38 \mu \mathrm{~m} / 120 \mathrm{~nm}$ |
| M5,M6 | $200 \mathrm{~nm} / 455 \mathrm{~nm}$ |
| M7 | $200 \mathrm{~nm} / 455 \mathrm{~nm}$ |
| M8 | $200 \mathrm{~nm} / 3.96 \mu \mathrm{~m}$ |
| MB1 | $200 \mathrm{~nm} / 1 \mu \mathrm{~m}$ |
| MB2 | $630 \mathrm{~nm} / 120 \mathrm{~nm}$ |
| Cc | $834.4 \mathrm{fF}: 4 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |

Table 4.3: Bandgap Op-amp Devices

The op-amp consumes 900 nA in the first stage and 450 nm in the second stage. An additional 190nA is required for biasing. With a load capacitance of 2 fF , the amplifier has an open-loop DC gain of 70.4 dB . It has a bandwidth of 370 Hz and a unity-gain bandwidth at 1.25 MHz . It has a supply gain of -1 dB at 20 MHz , although this is not harmful for the bandgap core since a scaled version of the same signal would have been in the output if the amplifier has a high bandwidth.


Figure 4.11: Bandgap Op-amp Transfer Function


Figure 4.12: Bandgap Op-amp Supply Transfer Function

The Bandgap voltage reference is superior compared to the Subthreshold reference in terms of supply voltage insensitivity and startup time. The first advantage is very important in this situation since the supply might not be constant over time. Its drawbacks include a much larger area and power consumption as well as a lower PSRR. However, the PSRR of this reference will be attenuated further by the LDO. Overall, the Bandgap reference is the better choice unless the power or area budget is not met.

### 4.2 Low-Dropout Regulator (LDO)

After a reference voltage is acquired, a Low-Dropout Regulator (LDO) is used to output a constant supply voltage of 1V. Fig. 4.13 shows the basic LDO circuit. The LDO uses an op-amp to bias R1 at the reference voltage. By choosing the ratio of R1 and R2, the output node can be set to any arbitrary voltage above the reference and below the rectified supply. A PMOS is used to provide high impedance at the output node; the PMOS behaves like a current source at the output.


Figure 4.13: Low-Dropout Regulator Schematic

## Design Considerations

The LDO op-amp also has to have a high open loop DC gain such that the closed loop configuration does not leave static errors. The feedback factor of the LDO is around 0.17 for the Subthreshold reference and 0.9 for the Bandgap reference. The Bandgap reference is also superior in this aspect. The op-amp has to have a very low bandwidth to reject any signal besides the DC level. It also needs to have a good PSRR such that the supply is rejected at the output, although this requirement is not as important as the output of the LDO will be loaded with a large capacitor C1.

A larger M1 would result in a larger closed loop gain, and create a smaller static error, but would also result in a worse PSRR since the supply is amplified through a common gate configuration.

## Low-Dropout Regulator Op-amp

To achieve high open loop gain and low bandwidth, a cascoded differential amplifier with current mirror load is used, with a common source second stage. The compensation capacitor Cc is put in the configuration suggested in [3], as in the Bandgap op-amp. The input devices are PMOSes because the design can be used for either the subthreshold reference or the Bandgap reference. The subthreshold reference is at 100 mV , so an NMOS input stage would not work.


Figure 4.14: Low-Dropout Regulator Op-amp Schematic

MB1-MB5 form the bias network, generating 200 mV and 600 mV . Since this structure has two stable points - at its desired operating point and at OV, a startup circuit formed by MS1-MS3 is needed to inject some current. In operation, MS2 will have a negative $V_{G S}$. To improve the PSRR, C 2 is added in the bias network, keeping the bias Vb1 at a constant DC voltage. C1 is also added in the current mirror load to keep the current more constant.

The devices in the circuit are shown in table 4.4.
The amplifier consumes $1.398 \mu \mathrm{~A}$ at 1.2 V supply while the biasing network consumes 255.9 nA . It has an open loop gain of 57.4 dB with a 3 dB corner at 418.5 Hz . Its unity-gain bandwidth is 406.7 kHz and it has a supply gain of -15.65 dB at 20 MHz .

| M0 | $200 \mathrm{~nm} / 700 \mathrm{~nm}$ |
| :---: | :---: |
| M1,M2 | $300 \mathrm{~nm} / 3.3 \mu \mathrm{~m}$ |
| M3,M4 | $200 \mathrm{~nm} / 1.15 \mu \mathrm{~m}$ |
| M5,M6 | $200 \mathrm{~nm} / 1.05 \mu \mathrm{~m}$ |
| M7 | $200 \mathrm{~nm} / 950 \mathrm{~nm}$ |
| M8 | $200 \mathrm{~nm} / 1.4 \mu \mathrm{~m}$ |
| MB1,MB2 | $200 \mathrm{~nm} / 7.2 \mu \mathrm{~m}$ |
| MB3 | $500 \mathrm{~nm} / 510 \mathrm{~nm}$ |
| MB4 | $8.6 \mu \mathrm{~m} / 60 \mathrm{~nm}$ |
| MB5 | $23.4 \mu \mathrm{~m} / 60 \mathrm{~nm}$ |
| MS1,MS3 | $200 \mathrm{~nm} / 20 \mu \mathrm{~m}$ |
| MS2 | $200 \mathrm{~nm} / 60 \mathrm{~nm}$ |
| C1 | $208.6 \mathrm{fF}: 1 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| C2 | $208.6 \mathrm{fF}: 1 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| Cc | $834.4 \mathrm{fF}: 4 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |

Table 4.4: LDO Op-amp Devices


Figure 4.15: LDO Op-amp Transfer Function


Figure 4.16: LDO Op-amp Supply Transfer Function

Since the Subthreshold and Bandgap references produce references at different voltages, the resistor ratios in the LDO are different for the two references. The value of the resistances also determine the power consumed by the LDO. The target power consumption in the resistors is $10 \mu \mathrm{~W}$. The smaller the power consumption, however, the larger the resistors have to be, hence the bigger the area. This results in the resistor values in table 4.5.

| Device | Subthreshold Ref | Bandgap Ref |
| :---: | :---: | :---: |
| R1 | $14 \mathrm{k} \Omega: 4 \times(2 \mu \mathrm{~m} \times 50 \mu \mathrm{~m})$ | $66.8 \mathrm{k} \Omega: 19 \times(2 \mu \mathrm{~m} \times 50 \mu \mathrm{~m})$ |
| R2 | $91.4 \mathrm{k} \Omega: 26 \times(2 \mu \mathrm{~m} \times 50 \mu \mathrm{~m})$ | $38.671 \Omega: 11 \times(2 \mu \mathrm{~m} \times 50 \mu \mathrm{~m})$ |
| M1 | $200 \mathrm{~nm} / 60 \mathrm{~nm}$ |  |
| C1 | $7.5 \mathrm{pF:} 36 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |  |
| Power | $9.49 \mu \mathrm{~W}$ | $9.48 \mu \mathrm{~W}$ |

Table 4.5: LDO Devices

Figs. 4.17 and 4.18 show the transient responses of the LDOs with different voltage references. Note that the Subthreshold reference settles at a much longer time compared to the Bandgap reference, as shown before. Also note that the Subthreshold reference has a substantial overshoot before settling to the operating point, which corresponds to the output rising well beyond 1V before settling back to 1 V . This might be harmful for the circuits that use the 1 V source, which includes the Digital Logic block and the charge pumps for the switches. If voltages above 2 V and below -2 V are produced by the charge pumps for an extended amount of time, some transistors are going to be subjected to excessive gate voltage, potentially harming the transistors. Thus, the Bandgap reference is chosen as the better choice.


Figure 4.17: Subthreshold Reference LDO Startup Transient


Figure 4.18: Bandgap Reference LDO Startup Transient

## Chapter 5

## Clock Recovery

The Digital Logic block requires a 20 MHz clock that is recovered from the c2a_power lines. The power lines deliver AC at 20 MHz with a 180 degree phase shift, so the simplest clock recovery circuit is a comparator that compares the two power lines. The clock recovery circuit is powered by the 1 V regulated supply to generate a clean clock.

## Design Considerations

The most important specification in the clock recovery circuit is the drive strength of the output inverter. It has to drive the charge pumps in the power switches as well as provide clock to the Digital Logic block. To do this, the comparator output will be driven by inverters with increasing drive strength. HVT inverter cells are used to minimize crossbar and leakage power.

The output clock does not have to be precisely $50 \%$ duty cycle because it is at a very low frequency. The Digital Logic block uses both edges of the clock - the positive edge for sequential logic and the negative edge for sampling - but some jitter and duty cycle variations are negligible since the critical path has a lot of slack.

## Implementation

Fig. 5.1 shows the clock recovery circuit, which is similar to the active diode comparator circuit in Chapter 3, but inverted with a PMOS input since the input common mode is at 0 V . It consists of a differential pair with an input 2.5 V PMOS device that can have any gain larger than 1 at 20 MHz since the input voltage swings at a peak-to-peak voltage of 3.3 Vpp . Adding gain to the differential pair only adds marginal improvement especially since the inverters are HVT devices. The difference between the input logic low and the input logic high of these HVT inverters are so small that, at 3.3Vpp input signal, the time both PMOS and NMOS are on is very small. The comparator's first stage is designed to have a gain of 2 at 20 MHz .

MB1-MB5 make up the biasing network. There is already a 200 mV and a 600 mV bias from the LDO biasing network, but that biasing network is derived from the unregulated supply. Using that bias would not result in a clean clock or the clock might be skewed if the bias is higher than


Figure 5.1: Clock Recovery Schematic
expected since the unregulated supply might be higher than the designed 1.2 V . A new bias network is derived from the 1 V regulated supply with a startup branch made up of MS1-MS3.

The device sizes are shown in table 5.1.

| M0 | $1.8 \mu \mathrm{~m} / 1 \mu \mathrm{~m}$ |
| :---: | :---: |
| M1,M2 $(2.5 \mathrm{~V})$ | $400 \mathrm{~nm} / 3.25 \mu \mathrm{~m}$ |
| M3,M4 | $20 \mathrm{~nm} / 10.8 \mu \mathrm{~m}$ |
| M5 | $8.61 \mu \mathrm{~m} / 60 \mathrm{~nm}$ |
| M6 | $23.4 \mu \mathrm{~m} / 60 \mathrm{~nm}$ |
| M7 | $500 \mathrm{~nm} / 510 \mathrm{~nm}$ |
| MB1,MB2 | $200 \mathrm{~nm} / 600 \mathrm{~nm}$ |
| MB3 | $200 \mathrm{~nm} / 950 \mathrm{~nm}$ |
| MB4 | $4.94 \mu \mathrm{~m} / 60 \mathrm{~nm}$ |
| MB5 | $13.7 \mu \mathrm{~m} / 60 \mathrm{~nm}$ |
| MS1 | $200 \mathrm{~nm} / 20 \mu \mathrm{~m}$ |
| MS2 | $200 \mathrm{~nm} / 60 \mathrm{~nm}$ |
| MS3 | $200 \mathrm{~nm} / 20 \mu \mathrm{~m}$ |
| I1 | INVDOHVT |
| I2 | INVD4HVT |
| I3 | INVD16HVT |

Table 5.1: Clock Recovery Devices

The differential pair consumes 143.1 nA while the bias network consumes 153.3 nA at 1 V supply. The open loop transfer function of the differential pair is shown in fig. 5.2. It has a gain of 6.83 dB at 20 MHz .


Figure 5.2: Clock Recovery Stage 1 Transfer Function


Figure 5.3: Clock Recovery Transient

Fig. 5.3 shows the transient response of the clock recovery circuit. The output is a sharp clock signal at a duty cycle of close to $50 \%$.

## Chapter 6

## Power Switches

The Power Switch block is controlled by the Digital Logic block and determines whether an NM receives power through the a2n_power lines. The main challenge of this block is the fact that the regulated supply is 1 V while the AC power has a 3.3 Vpp signal. In order to control the power switches properly, a voltage that is at least one $V_{T H}$ higher than 1.65 V and a voltage that is at least one $V_{T H}$ lower than -1.65 V are needed.


Figure 6.1: Power Switches Block Diagram

Fig. 6.1 shows the block diagram of the Power Switch unit. The 1 V supply is first converted into a 2 V supply and a -2 V supply through charge pumps utilizing the 20 MHz clock. These higher supplies are used to shift the Digital Logic nm_switch signals from $0 / 1 \mathrm{~V}$ to $-2 / 2 \mathrm{~V}$. These signals then control analog switches for each NM power line. Note that a level shifter and a power switch block is needed for each of the four NMs.

### 6.1 Positive Charge Pump

## Design Considerations

Several charge pump architectures are considered, including an improved version of the Dickson charge pump with charge transfer switches proposed in [5] and the cross-coupled charge pump proposed in [4]. The Dickson charge pump is advantageous for large voltage gains since it is smaller and easier to chain. However, the charge pump in the Power Switches only require a gain of 2 . On top of that, the Dickson pump requires an output stage to keep the output at a constant voltage, for which [5] proposed to use the cross-coupled charge pump. The cross-coupled charge pump is a little less efficient but does not require an output stage and has a simple structure.

Note that the output of the charge pump is used only by the level shifters. The level shifters, as shown later, does not consume significant static current, so the charge pump does not have to supply a lot of current. Also note that the circuit does not need exactly 2 V . As long as the output voltage is a threshold voltage above 1.65 V the power switches would work fine.

Lastly, the input clock is driving the large capacitance in the charge pump, so this should be taken into account when picking the drive strength of the clock driver. If the drive strength is not enough, an internal chain of inverter can be introduced so the charge pump does not provide excessive load to the clock signal.

## Implementation

The cross-coupled charge pump schematic is shown in fig. 6.2. The input clock is inverted to create two phase-shifted clocks. When the clock is low, M1 is turned on and C1 is charged up to 1 V . When the clock goes high, this charge is transferred to the output node through M3 which is turned on by node $B$ being at 1 V . The pumping mechanism is essentially from the 1 V line through M1 to C1, then through M3 to the output capacitor. The main inefficiencies come from the power dissipated as the charge flows through the transistors. Minimum-sized 2.5 V devices are used to minimize the power dissipation.

Since in the application the power switches do not switch often - the NMs are turned on and off in the order of minutes - the output capacitance C3 does not have to be significantly large. When an NM is switched, the output node would droop for a few clock cycles until the charge pump restores it back to 2 V . A smaller C3 would result in a larger droop, but the only requirement is that the output voltage is high enough to drive the analog switches.

To calculate the optimal capacitor sizes, the energy consumption of a switching activity is considered. One level shifter switching activity consumes 1 pJ from the 2 V supply and 1 pJ from the -2 V supply. Since the power switches need to drive up to 1.65 V from the power lines, the output capacitor is designed such that the output voltage does not droop below 1.7 V in a switching operation assuming the charge pump is turned off that cycle. Equating the energy before and after the switching,

$$
\frac{1}{2} C(2 V)^{2}=\frac{1}{2} C(1.7 V)^{2}+1 p J \quad C=1.8 p F
$$



Figure 6.2: Positive Charge Pump Schematic

The pumping capacitors C 1 and C 2 determine how fast the 2 V supply recovers after a switching operation. Assuming roughly that at most every other CM packet - containing 10 bits - switches one NM on or off, the supply needs to recover in 20 clock cycles, or 400 ns . The pumping capacitors are chosen such that this specification is met.

The devices in the circuit is shown in table 6.2.

| M1,M2,M3,M4 (2.5V) | 400nm/280nm |
| :---: | :---: |
| C1,C2 | $834.4 \mathrm{fF:} 4 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| C 3 | $1.877 \mathrm{pF:} 9 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| I1 | INVD0HVT |

Table 6.1: Positive Charge Pump Devices

The transient response of the charge pump when it starts up is shown in fig. 6.3. The output node stabilizes at 2 V within 700 ns s on startup.


Figure 6.3: Positive Charge Startup Transient

Fig. 6.4 shows the output 2 V transient when a switching occurs in the level shifter. The output voltage droops to 1.8 V and fully recovers to 2 V in 400 ns.


Figure 6.4: Charge Pump Output Transient with Level Shifter Switching

### 6.2 Negative Charge Pump

## Implementation

The negative charge pump outputs a -2 V supply. To achieve this, the positive charge pump architecture is inverted. One stage would only pump the output to -1V, so two stages of negative charge pump are cascaded to produce the -2 V output. The resulting schematic is shown in fig. 6.5. The sizing of the devices are the same as the positive charge pump, with the intermediate -1 V voltage stored in a 1.877 pF capacitor C 3 , just like the output node.


Figure 6.5: Negative Charge Pump Schematic

| M1,M2,M3,M4,M5,M6,M7,M8 (2.5V) | 400nm/280nm |
| :---: | :---: |
| C1,C2,C4,C5 | $834.4 \mathrm{fF}: 4 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| C3,C6 | $1.877 \mathrm{pF:} 9 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| I1 | INVD0HVT |

Table 6.2: Negative Charge Pump Devices
The transient response of the output node charging to -2 V is shown in fig. 6.6. The output settles at a -2 V in around $4 \mu \mathrm{~s}$ during startup, which is tolerable for the application.


Figure 6.6: Negative Charge Pump Transient

### 6.3 Logic Level Shifter

## Design Considerations

A logic level shifter is required to shift the nm_switch signals from $0 / 1 \mathrm{~V}$ to $-2 / 2 \mathrm{~V}$ to control the analog switches. However, the circuit has to be designed carefully such that each device is not stressed with a high $V_{G S}$ that might harm the device. To perform the level shift, the problem is divided into 5 smaller blocks shown in fig. 6.7. The numbers on the sides of the block correspond to the logic level of the signal. Each block can be built with transistors safely in a comfortable region.


Figure 6.7: Switch Logic Level Shifter Block Diagram

## Implementation

Most of the logic level shifter blocks are built from half-latches that is disturbed by the input device. They are, in essence, digital circuits which does not have any bias current. They have minimal leakage power but consumes power when switching. In the application, the level shifters do not switch often - they switch only when an NM is turned on or off, so the power consumption of this block is not a huge concern.

## Block A



Figure 6.8: Switch Logic Level Shifter Block A Schematic

Block A shifts $0 / 1 \mathrm{~V}$ to $0 / 2 \mathrm{~V}$. Since it needs to shift up, the input devices are NMOSes. M3 and M 4 make up a half-latch that is easily disturbed by M 1 and M 2 . The $V_{G S}$ of M 1 and M 2 never get larger than 1 V , so regular transistors can be used. M 3 and M 4 , however, are subject to $V_{G S}$ up to 2 V , so they need to be implemented using 2.5 V devices.

| M1,M2 | $400 \mathrm{~nm} / 60 \mathrm{~nm}$ |
| :---: | :---: |
| M3,M4 (2.5V) | $400 \mathrm{~nm} / 280 \mathrm{~nm}$ |

Table 6.3: Switch Logic Level Shifter Block A Devices

## Block B



Figure 6.9: Switch Logic Level Shifter Block B Schematic

Block B shifts $0 / 1 \mathrm{~V}$ to $-1 / 1 \mathrm{~V}$. This stage is needed to generate signals that could drive Block C effectively. Without this stage, the input to Block $C$ would be between 0 and 1 V and the input devices of Block $C$ are not turned on enough to easily disturb the half-latch.

Block B uses an architecture proposed in [2] that is similar to the charge pump. There are two copies of the circuit above, one for in and one for inb. The two blocks can share I1. The circuit stores some charge across C 1 which boosts node inboostb to -1 when in is high. A small disadvantage of this circuit is that the inboostb node starts at 0 V when the circuit starts up, so the first switching does not work as well as stead-state operation. When the switch has been turned on and off once, the inboostb will be at -1 V or 1V. Fig. 6.13 shows this abnormality in the red waveform.

Only M 2 is subjected to $V_{G S}$ higher than 1.2 V , so only M 2 has to be a 2.5 V device.

| M1,M3,M4 | $200 \mathrm{~nm} / 60 \mathrm{~nm}$ |
| :---: | :---: |
| M2 (2.5V) | $400 \mathrm{~nm} / 280 \mathrm{~nm}$ |
| C 1 | $208.6 \mathrm{fF}: 1 \times(10 \mu \mathrm{~m} \times 10 \mu \mathrm{~m})$ |
| I 1 | INVDO_HVT |

Table 6.4: Switch Logic Level Shifter Block B Devices

## Block C



Figure 6.10: Switch Logic Level Shifter Block C Schematic

Block C shifts the boosted signal from Block B from $-1 / 1 \mathrm{~V}$ to $-2 / 1 \mathrm{~V}$. It has the same structure as Block A, except the structure is inverted and a cascode is introduced in the NMOS biased at ground. The cascode is needed to weaken the half-latch. This is needed on top of the boosted inputs from Block B to securely disturb the half-latch state. The input devices M1 and M2 also need to be sized properly for strength.

| M1,M2 (2.5V) | $1 \mu \mathrm{~m} / 280 \mathrm{~nm}$ |
| :---: | :---: |
| M3,M4 (2.5V) | $1 \mu \mathrm{~m} / 280 \mathrm{~nm}$ |
| M5,M6 (2.5V) | $400 \mathrm{~nm} / 280 \mathrm{~nm}$ |

Table 6.5: Switch Logic Level Shifter Block C Devices

## Block D



Figure 6.11: Switch Logic Level Shifter Block D Schematic

Block $D$ level shifts the output of Block $C$ from $-2 / 1 \mathrm{~V}$ to $-2 / 0 \mathrm{~V}$. This is required such that the input devices of Block E does not get stressed with $V_{G S}$ more than 2 V . In Block D , the worse $V_{G S}$ of M 1 is -2 V when in2 is low. The addition of M 3 adds a level-shift such that the $V_{G S}$ of M 2 does not reach 3 V when clk2 is 1 V . Two copies of the circuit above is needed, one for in2 and one for in2b.

$$
\begin{array}{|l|l|}
\hline \text { M1,M2,M3 (2.5V) } & \text { 400nm/280nm } \\
\hline
\end{array}
$$

Table 6.6: Switch Logic Level Shifter Block D Devices

## Block E



Figure 6.12: Switch Logic Level Shifter Block E Schematic

Lastly, Block E uses the outputs of Blocks A and D and provide an output stage. M 1 is driven by $-2 / 0 \mathrm{~V}$ so it does not experience $V_{G S}$ greater than 2 V while M 2 is driven by $0 / 2 \mathrm{~V}$ and also does not get stressed.

There is a potential issue in this block, namely when in1 switches low before in3 switches low. Since in 1 goes through 1 inversion block while in3 goes through 3 inversion blocks before this stage, in1 always switches before in3. In the case that in1 and in3 switches from high to low, there is some time when both the PMOS and NMOS are turned on, resulting in a crossbar current. This effect can be seen in fig. 6.14. The green signal changes before the yellow signal, causing both M 1 and M 2 to turn on for some time. However, this issue only happens when the level shift switches, which is infrequent in steady-state usage. The power consumption from the brief short-circuit current is negligible.

A potential solution to this issue is to add 2 inverters in series as a buffer between Block A and Block E to delay in1, the PMOS signal in the input of Block E. However, this results in a higher dynamic and leakage power since more capacitances need to be charged that it ends up consuming more energy than the current solution.

| $\mathrm{M} 1, \mathrm{M} 2(2.5 \mathrm{~V})$ | 400nm/280nm |
| :--- | :--- |

Table 6.7: Switch Logic Level Shifter Block E Devices


Figure 6.13: Level Shifter Transient

The delay in the switching is 7.5 ns with a 120 fF load calculated from the gate capacitance of the power switches and the energy consumed per switching is 2.1 pJ . It has a leakage power of 39.21 nW .


Figure 6.14: Zoomed Level Shifter Transient

### 6.4 Power Switch

Now that the control signals of the switches are at $-2 / 2 \mathrm{~V}$, they can be used to control the analog switches. For the lowest resistance, large 3.3 V devices are used in a passgate configuration shown in fig. 6.15. The devices are chosen to be large enough relative to the output cable impedance so the output swing is not degraded significantly. The transient response of the power switches are shown in fig. 6.16.


Figure 6.15: Power Switch Schematic

| M1,M3 (3.3V) | $20 \mu \mathrm{~m} / 500 \mathrm{~nm}$ |
| :--- | :--- |
| M2,M4 (3.3V) | $25 \mu \mathrm{~m} / 500 \mathrm{~nm}$ |

Table 6.8: Power Switch Devices


Figure 6.16: Power Switch Transient

## Chapter 7

## Conclusion

The Aggregator Module (AM) of the Octopus-Mimetic Neural Implant (OMNI) system is realized in the TSMC 65nm LP process. The c2a_power lines are rectified and regulated to provide a stable 1 V supply for a Digital Logic block as well as the charge pumps. A 20 MHz clock is recovered from the input power lines. The charge pumps are used to produce higher voltages needed to control the a2n_power line switches.

The whole system is simulated at $37^{\circ} \mathrm{C}$ with the following transient responses.


Figure 7.1: Transient response of the AM system

The analog circuitry consumes a total power of $72.52 \mu \mathrm{~W}$ with the Bandgap Voltage Refer-
ence and $59.63 \mu \mathrm{~W}$ with the Subthreshold Voltage Reference. Although the Bandgap consumes more power, based with the transient above the Bandgap circuit should be adopted due to its fast startup and insensitivity to input voltage.

Adding the Digital Logic block power consumption estimates from Synopsys IC Compiler, the total chip power excluding I/O cells is $87.82 \mu \mathrm{~W}$, which is within the target range. The current AM PCB consumes 1.6 mW , so an ASIC implementation would reduce the power consumption by more than an order of magnitude.

The power breakdown of the system in steady state operation is show in table 7.1.

| Component | Power Consumption |
| :--- | :---: |
| Voltage Rectification | $\mathbf{2 7 . 5 5} \mu \mathbf{W}$ |
| Bridge Rectifier | $18 \mu \mathrm{~W}$ |
| Active Diode | $9.55 \mu \mathrm{~W}$ |
| Voltage Regulation | $\mathbf{4 3 . 0 5 \mu \mathbf { W }}$ |
| Bandgap Reference | $21.73 \mu \mathrm{~W}$ |
| Low-Dropout Op-amp | $6.109 \mu \mathrm{~W}$ |
| Low-Dropout Regulator | $15.21 \mu \mathrm{~W}$ |
| Clock Recovery | $\mathbf{1 . 4 5 6 \mu \mathbf { W }}$ |
| Power Switches | $\mathbf{4 6 4 n \mathbf { W }}$ |
| Positive Charge Pump | 86.51 nW |
| Negative Charge Pump | 170.4 nW |
| Level Shifter and Switches (x4) | 360.5 nW |
| Digital Logic | $\mathbf{1 2 . 3} \boldsymbol{\mu} \mathbf{W}$ |
| Total | $\mathbf{8 7 . 8 2} \boldsymbol{\mu W}$ |

Table 7.1: Power Breakdown of AM system

## Bibliography

[1] Xiaocheng Jing, Philip K.T. Mok, Cheng Huang and Fan Yang. "A 0.5V nanoWatt CMOS Voltage Reference with Two High PSRR Outputs". In: IEEE International Symposium on Circuits and Systems (2012), pp. 2837-2840.
[2] Peijun Liu et al. "A Novel High-Speed and Low-Power Negative Voltage Level Shifter for Low Voltage Applications". In: Proceedings of 2010 IEEE International Symposium on Circuits and Systems (2010), pp. 601-604.
[3] David B. Ribner and Miles A. Copeland. "Design Techniques for Cascoded CMOS Op Amps with Improved PSRR and Common-Mode Input Range". In: IEEE Journal of Solid-state Circuits SC-19.6 (1984), pp. 919-925.
[4] Chun Yu Cheng, Ka Nang Leung, Yi Ki Sun and Pui Ying Or. `Design of a Low-Voltage CMOS Charge Pump'. In: 4th IEEE International Symposium on Electronic Design, Test and Applications (2008), pp. 342-345. [5] Jieh-Tsorng Wu and Kuen-Long Chang. "MOS Charge Pumps for Low-Voltage Operation'. In: IEEE Journal of Solid-State Circuits 33.4 (1998), pp. 592-597. [6] Wenguan Li, Ruohe Yao and Lifang Guo. `’A Low Power CMOS Bandgap Voltage Reference with Enhanced Power Supply Rejection". In: IEEE 8th International Conference on ASIC (2009), pp. 30-304.

