## Modeling and Design of High-Speed CMOS Receivers for Short-Reach Photonic Links

Diaaeldin Mahmoud Ibrahim Abdelrahman

A Thesis In the Department of Electrical and Computer Engineering

Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy (Electrical and Computer Engineering) at Concordia University Montreal, Quebec, Canada March 2021 © Diaaeldin Abdelrahman, 2021

#### **CONCORDIA UNIVERSITY**

#### SCHOOL OF GRADUATE STUDIES

This is to certify that the thesis prepared

#### By: Diaaeldin Abdelrahman

Entitled: Modeling and Design of High-Speed CMOS Receivers for Short-Reach Photonic Links

and submitted in partial fulfillment of the requirements for the degree of

**DOCTOR OF PHILOSOPHY** (Electrical & Computer Engineering)

complies with the regulation of the University and meets the accepted standards with respect to originality and quality.

Signed by the final examining committee:

|             |                             | Chair                     |
|-------------|-----------------------------|---------------------------|
| Dr. Weiyi S | Shang                       |                           |
|             |                             | External Examiner         |
| Dr. Sudip S | hekhar                      |                           |
|             |                             | External to Program       |
| Dr. Christo | phe Grova                   |                           |
|             |                             | Examiner                  |
| Dr. Rabin F | Raut                        |                           |
|             |                             | Examiner                  |
| Dr. Chunya  | n Wang                      |                           |
|             |                             | Thesis Co-Supervisor      |
| Dr. Odile L | iboiron-Ladouceur           |                           |
|             |                             | Thesis Supervisor         |
| Dr. Glenn ( | Cowan                       |                           |
| oved by     |                             |                           |
|             | Dr. Wei-Ping Zhu, Graduate  | Program Director          |
| h 22, 2021  |                             |                           |
|             | Dr. Mourad Debbabi, Dean    |                           |
|             | Gina Cody School of Enginee | ring and Computer Science |

## ABSTRACT

Modeling and Design of High-Speed CMOS Receivers for Short-Reach Photonic Links

#### Diaaeldin Abdelrahman, Ph.D. Concordia University, 2021.

This dissertation presents several research outcomes towards designing high-speed CMOS optical receivers for energy-efficient short-reach optical links. First, it provides a wide survey of recently published equalizer-based receivers and presents a novel methodology to accurately calculate their noise. The proposed methodology is then used to find the receiver that achieves the best sensitivity.

Second, the trade-off between sensitivity and power dissipation of the receiver is optimized to reduce the energy consumption per bit of the overall link. Design trade-offs for the receiver, transmitter, and the overall link are presented, and comparisons are made to study how much receiver sensitivity can be sacrificed to save its power dissipation before this power reduction is outpaced by the transmitter's increase in power. Unlike conventional wisdom, our results show that energy-efficient links require low-power receivers with input capacitance much smaller than that required for noise-optimum performance.

Third, the thesis presents a novel equalization technique for optical receivers. A linear equalizer (LE) is realized by adding a pole in the feedback paths of an active feedback-based wideband amplifier. By embedding the peaking in the main amplifier (MA), the front-end meets the sensitivity and gain of conventional LE-based receivers with better energy efficiency by eliminating the standalone equalizer stage(s). Electrical measurements are presented to demonstrate the capability of the proposed technique in restoring the bandwidth and improving the performance over the conventional design.

### Acknowledgments

I would like to thank my advisor, Dr. Glenn Cowan for his patient guidance throughout this Ph.D. I would also like to thank my co-supervisor, Dr. Odile Liboiron-Ladouceur (McGill University, Montreal, QC, Canada) for generously sharing time during group meetings. I would also like to thank my committee members, Dr. Rabin Raut, Dr. Chunyan Wang, and Dr. Christophe Grova for taking their time and providing valuable feedback throughout this project. A special thank is due to Dr. Sudip Shekhar (University of British Columbia, Vancouver, BC, Canada) for reading my thesis and providing thorough feedback on my work.

I would like to acknowledge the financial support from Gina Cody Faculty of Engineering at Concordia University and the Natural Sciences and Engineering Research Council (NSERC) of Canada through the Strategic Project Grant Program. I am sincerely grateful for the financial support from ReSMiQ (Montreal, QC, Canada). I also acknowledge the contributions and technical support of the Canadian Microelectronics Corporation (CMC).

I am extremely grateful to Ted Obuchowicz for providing technical and CAD support. I would like to thank my colleagues Abdullah Ibn Abbas, Christopher Williams, and Rubana Priti for their invaluable advice, useful contributions, and insightful discussions. I would like to thank Sheryl Tablan for her administrative support throughout my academic time at Concordia University.

I am deeply indebted to my mom, dad, sisters, and brothers for this success. I owe a large portion of this success to my wife and daughter. Thank you all for your patience through these years. Your presence and encouragement made it much easier to go through the hard times.

I would like to extend my deepest gratitude to my mentor Dr. Mohamed Atef and my friend Dr. Mahmoud Elsaadany. This project would not have been completed without your advice and encouragement. Thank you!

## Contents

| List of Figur | resix                                                    |
|---------------|----------------------------------------------------------|
| List of Table | es xv                                                    |
| List of Acro  | nymsxvi                                                  |
| Chapter 1     |                                                          |
| Introduction  | n1                                                       |
| 1.1 Mot       | ivation                                                  |
| 1.2 Thes      | sis Objectives                                           |
| 1.3 Clair     | m of Originality                                         |
| 1.4 Publ      | ications and Contributions of the Author7                |
| 1.5 Thes      | sis Organization10                                       |
| Chapter 2     |                                                          |
| Background    | l and Fundamentals12                                     |
| 2.1 Intro     | oduction                                                 |
| 2.2 Conv      | ventional Optical Receiver Front-End                     |
| 2.2.1         | Transimpedance Amplifier                                 |
| 2.2.2         | Power Penalty due to the Swing Requirements of the CDR16 |
| 2.2.3         | Main Amplifier 17                                        |
| 2.2.4         | The Transimpedance Limit                                 |
| 2.2.5         | Noise-Power Trade-off                                    |
| 2.3 Limi      | ited-Bandwidth Front-End                                 |
| 2.3.1         | What if the Bandwidth is Reduced?                        |
| 2.4 Sum       | mary                                                     |

| Chapter 3   |                                                                          | . 29 |
|-------------|--------------------------------------------------------------------------|------|
| Noise Analy | sis and Design Considerations for Equalizer-Based Optical Receivers      | . 29 |
| 3.1 Intro   | oduction                                                                 | . 29 |
| 3.2 Inve    | rter-Based TIA                                                           | 32   |
| 3.2.1       | Frequency Response                                                       | 32   |
| 3.2.2       | Time Response                                                            | 34   |
| 3.2.3       | Input-Referred Noise Current                                             | 36   |
| 3.3 Nois    | e Optimization Procedure                                                 | 39   |
| 3.4 Nois    | e Calculation of Equalizer-Based Receivers                               | 42   |
| 3.4.1       | DFE-Based Receivers                                                      | 42   |
| 3.4.2       | CTLE-Based Receivers                                                     | . 44 |
| 3.4.3       | FFE-Based Receivers                                                      | 49   |
| 3.5 Con     | parison and Discussion                                                   | 54   |
| 3.5.1       | Noise Bandwidths                                                         | 54   |
| 3.5.2       | Simulation at Higher $f_{bit}$ and $C_D$                                 | 55   |
| 3.6 Con     | clusions                                                                 | 58   |
| Chapter 4   |                                                                          | . 59 |
| Optimizatio | n of the Power-Sensitivity Trade-off in CMOS Receivers for Energy-Effici | ent  |
| Short-Reacl | 1 Optical Links                                                          | 59   |
| 4.1 Intro   | oduction                                                                 | 59   |
| 4.2 Opti    | cal Receiver Modelling                                                   | 61   |
| 4.2.1       | Transimpedance Amplifier                                                 | 61   |
| 4.2.2       | Small-Signal Model                                                       | 62   |
| 4.2.3       | Bandwidth and Transimpedance Gain                                        | 64   |
| 4.2.4       | Input-Referred Noise Current                                             | 66   |
| 4.3 Rec     | eiver Sensitivity-Power Trade-Off                                        | 67   |
| 4.3.1       | Power Penalty due to the Swing Requirements of the CDR                   | 67   |
| 4.3.2       | Main Amplifier                                                           | 69   |
| 4.3.3       | Receiver Power Dissipation                                               | 71   |
| 4.4 Opti    | cal Transmitter and Link Budget                                          | . 72 |

| 4.4.1                                                                                                                                                   | Laser Diode                                                                                                                                                                                      | 72            |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
| 4.4.2                                                                                                                                                   | Laser Diode Driver                                                                                                                                                                               | 73            |
| 4.4.3                                                                                                                                                   | Transmitter Power Consumption                                                                                                                                                                    | 75            |
| 4.4.4                                                                                                                                                   | VCSEL and Driver Modeling                                                                                                                                                                        | 75            |
| 4.4.5                                                                                                                                                   | Link Budget                                                                                                                                                                                      | 78            |
| 4.5 Opti                                                                                                                                                | mization Procedure and Link Evaluation                                                                                                                                                           | 79            |
| 4.5.1                                                                                                                                                   | Link Evaluation for Moderate Data Rate and Swing Requirement                                                                                                                                     | 79            |
| 4.5.2                                                                                                                                                   | Link Evaluation for High Data Rate and Swing Requirements                                                                                                                                        |               |
| 4.5.3                                                                                                                                                   | Validation of Model Accuracy                                                                                                                                                                     | 83            |
| 4.6 Disc                                                                                                                                                | ussion                                                                                                                                                                                           | 83            |
| 4.6.1                                                                                                                                                   | Advances on Photonic and Interconnect Technologies                                                                                                                                               |               |
| 4.6.2                                                                                                                                                   | Advances in CMOS Technology                                                                                                                                                                      | 85            |
| 4.6.3                                                                                                                                                   | Other Implementations of Transmitter and Receiver Subblocks                                                                                                                                      |               |
| 4.7 Con                                                                                                                                                 | clusion                                                                                                                                                                                          |               |
| Chapter 5                                                                                                                                               |                                                                                                                                                                                                  | 89            |
| 1                                                                                                                                                       |                                                                                                                                                                                                  |               |
| An Inducto                                                                                                                                              | rless Power-Efficient Design Technique for Linear Equalization                                                                                                                                   | in CMOS       |
| An Inducto<br>Optical Rec                                                                                                                               | rless Power-Efficient Design Technique for Linear Equalization                                                                                                                                   | in CMOS<br>89 |
| An Inducto<br>Optical Rec<br>5. 1 Intr                                                                                                                  | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>89 |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low                                                                                                      | rless Power-Efficient Design Technique for Linear Equalization<br>eivers<br>oduction<br>v-Bandwidth TIA                                                                                          | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1                                                                                             | rless Power-Efficient Design Technique for Linear Equalization<br>eivers<br>oduction<br>v-Bandwidth TIA<br>Small-Signal Model and Frequency Response                                             | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2                                                                                    | rless Power-Efficient Design Technique for Linear Equalization<br>eivers<br>oduction<br>v-Bandwidth TIA<br>Small-Signal Model and Frequency Response<br>Effective Gain                           | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ                                                                        | rless Power-Efficient Design Technique for Linear Equalization<br>eivers<br>oduction<br>y-Bandwidth TIA<br>Small-Signal Model and Frequency Response<br>Effective Gain<br>alizing Main Amplifier | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ<br>5.3.1                                                               | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>   |
| An Inductor<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ<br>5.3.1<br>5. 4 From                                                 | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ<br>5.3.1<br>5. 4 From<br>5.4.1                                         | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ<br>5.3.1<br>5. 4 From<br>5.4.1<br>5.4.2                                | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ<br>5.3.1<br>5. 4 From<br>5.4.1<br>5.4.2<br>5.4.3                       | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ<br>5.3.1<br>5. 4 From<br>5.4.1<br>5.4.2<br>5.4.3<br>5.4.4              | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>   |
| An Inducto<br>Optical Rec<br>5. 1 Intr<br>5. 2 Low<br>5.2.1<br>5.2.2<br>5. 3 Equ<br>5.3.1<br>5. 4 From<br>5.4.1<br>5.4.2<br>5.4.3<br>5.4.4<br>5. 5 Circ | rless Power-Efficient Design Technique for Linear Equalization<br>eivers                                                                                                                         | in CMOS<br>   |

| 5.5.2       | Sensitivity to Process and Temperature Variations 1                   | 10  |
|-------------|-----------------------------------------------------------------------|-----|
| 5.5.3       | Stability1                                                            | .11 |
| 5. 6 Exp    | erimental Validation 1                                                | .12 |
| 5.6.1       | Transient Measurement 1                                               | 12  |
| 5.6.2       | Noise Measurement 1                                                   | 17  |
| 5.6.3       | Discussion and Comparison to Prior Work 1                             | 17  |
| 5.6.4       | Operation at Higher Data Rate 1                                       | .19 |
| 5.6.5       | Operation with Large Input Signal 1                                   | 20  |
| 5. 7 Con    | clusions 1                                                            | .22 |
| Chapter 6   |                                                                       | 23  |
| Conclusions | and Future Work1                                                      | 23  |
| 6.1 Thes    | sis Highlights                                                        | .24 |
| 6.2 Pote    | ntial Areas for Future Work1                                          | 25  |
| 6.2.1       | Extension of the Proposed Equalization Technique 1                    | 25  |
| 6.2.2       | Design of Receiver Circuits of Higher Modulation Schemes 1            | 27  |
| 6.2.3       | Design of Adaptive Receiver Circuits for Optimized Link Performance 1 | .28 |
| References  |                                                                       | 31  |

## List of Figures

| Fig. 1.1. (a) Continuous growth in internet traffic (b) breakdown of traffic in 2020 [1]2                                                                                                                                                                                                                                                                                                                                                                                                                            |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. 1.2. Block diagram of a typical short-reach photonic link                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Fig. 2.1. Commonly used TIA topologies (a) Resistor TIA, (b) Shunt-feedback TIA and its CMOS inverter-based implementation, and (c) Common-gate TIA                                                                                                                                                                                                                                                                                                                                                                  |
| Fig. 2.2. (a) A receiver front-end that consists of an Inv-TIA and an n-stage MA. (b) Representative eye diagrams illustrating the power penalty incurred by the limited sensitivity of the decision circuit. The grayed area represents the output voltage when the input is set to the noise-based sensitivity limit. The height of the bottom eye is increased by $V_S^{PP}$ to satisfy the voltage amplitude requirements of a practical CDR (c) The incurred power penalty as a function of the gain of the MA. |
| Fig. 2.3. The required per-stage gain-bandwidth product as a function of the number of stages for $A_{MA} = 40$ dB, $f_{MA} = 10$ GHz, and various values of m                                                                                                                                                                                                                                                                                                                                                       |
| Fig. 2.4. Schematic of the (a) Common-source amplifier, (b) Common-source-based Cherry-<br>Hooper amplifier, and (c) Inverter-based Cherry-Hooper amplifier                                                                                                                                                                                                                                                                                                                                                          |
| Fig. 2.5. Active feedback-based CH amplifier (a) circuity [9] and(b) block diagram                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| Fig. 2.6. Higher-order implementations of active feedback-based MA (a) a third-order gain stage (b) a cascade of two third-order stages with interleaving feedback, and (c) a fifth-order MA.                                                                                                                                                                                                                                                                                                                        |
| Fig. 2.7. The transimpedance limit as a function of the number of MA stages for $C_T = 300$ fF, $f_{FE} = 17.5$ GHz, and various values of $A_s f_s$                                                                                                                                                                                                                                                                                                                                                                 |
| Fig. 2.8. The energy-efficiency of the FE in Fig. 2. 2 (a) as a function of the circuit's input capacitance to the total parasitic capacitance for $f_{bit}$ of 25 Gb/s, $C_D$ of 150 fF, and various values of the number of stages. The SF-TIA and each MA stage are assumed to be implemented by the Inv-TIA and the Inv-CH, respectively                                                                                                                                                                         |

| Fig. 2<br>b<br>r           | 2.9. (a) Resistor TIA (b) Output signal-to-noise ratio of the R-TIA as a function of the 3 dB bandwidth to data rate ratio for $f_{bit}$ , $I_{in}$ and $C_T$ fixed at 10 Gb/s, 100 µApp, and 200 fF, respectively                                                                                                                                                                                                                                                                                           |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. 2<br>f<br>in          | 2.10. Simulation results for the eye-diagram at the output of the R-TIA for $f_{bit}$ , $I_{in}$ , and $C_T$ fixed at 10 Gb/s, 100 µApp, and 200 fF, repectivlly. The resistor value and $f_{TIA}/f_{bit}$ are indicated in the title of each eye-diagram                                                                                                                                                                                                                                                    |
| Fig. 3<br>a<br>s<br>e<br>r | B.1. Representative block and eye diagrams of (a) conventional optical receiver where the TIA and the MA respectively provide midband gains of $Z_{TIA,0}$ and $A_{MA,0}$ and the front-end has a sufficiently wide bandwidth to introduce no ISI (b) equalizer-based optical receiver where the effective opening of the equalized eye $V_{pp}$ is less than the peak-to-peak opening of the eye right after the TIA $Z_{TIA,0}i_{pp}$ (offset compensation details are not shown)                          |
| Fig. 3                     | 3.2. The small-signal model of the inverter-based TIA                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| Fig. 3<br>f<br>p           | 3.3. TIA's output pulse response for $f_{3dB}$ ranging from $0.1f_{bit}$ to $f_{bit}$ with $Q$ , $C_T$ , $C_L$ , and gm fixed at 0.707, 136.8 fF, 113.6 fF, and 53.5 m $\Omega^{-1}$ , respectively. The input is an ideal current pulse with unity amplitude ( $i_{pp} = 1A$ ) and width of $T_b = 1/f_{bit} = 100$ ps                                                                                                                                                                                      |
| Fig. 3<br>v                | 3.4. Normalized transimpedance gain calculated from the midband, pulse response height, and vertical eye-opening. The pulse height and VEO are calculated based on Fig. 3. 3                                                                                                                                                                                                                                                                                                                                 |
| Fig. 3<br>p<br>a           | 3.5. Input-referred noise current as a function of $f_{3dB}/f_{bit}$ calculated using midband gain, pulse response height, and VEO. The TIA pole Q is set to 0.707 and the $f_{3dB}$ is swept according to the procedure and values in Table 3.2                                                                                                                                                                                                                                                             |
| Fig. 3                     | 3.6. DFE-based receivers (a) FIR feedback [24] (b) IIR feedback, modified based on [26]. 43                                                                                                                                                                                                                                                                                                                                                                                                                  |
| Fig. 3                     | 3.7. Corrected and uncorrected noise for DFE-based receivers. The $f_{3dB}$ is changed according to the procedure and values in Table 3.2 while $Q$ is kept constant at 0.707                                                                                                                                                                                                                                                                                                                                |
| Fig. 3<br>b<br>t           | 3.8. Impact of the placement of the TIA's poles on the input-referred noise of 2-tap FIR-DFE based receiver. The $f_{3dB}$ is changed according to the procedure and values in Table 3.2 while the values of the TIA's pole $Q$ are given in the legend                                                                                                                                                                                                                                                      |
| Fig. 3                     | 3.9. General block diagram of a CTLE-based front-end                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| Fig. 3<br>b<br>i<br>d      | 3.10. Input-referred noise of the CTLE-based receiver. The horizontal axis represents the bandwidth of the overall front end to the data rate ratio. The TIA pole $Q$ is set to 0.707 and its $f_{3dB}$ is swept according to the procedure and values in Table 3.2. The CTLE different designs are summarized in Table 3.3                                                                                                                                                                                  |
| Fig. 3<br>f<br>r<br>f<br>n | 3.11. The input-referred noise power as a function of the bandwidth-to-data rate ratio for both full-bandwidth TIA and CTLE-equalized front-end. For the latter, the horizontal axis represents the bandwidth of the overall front end to the data rate ratio. In this simulation, both full- and limited-bandwidth TIAs have pole $Q$ of 0.707 and the equalizer is assumed to be noiseless and designed as shown in the first row in Table 3.3 with $\omega_z = \omega_n$ and $\omega_{n,e} = 2\omega_n$ . |
| •                          | 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |

| Fig. 3.12. Integrating double-sampling receiver and its waveform [17]                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. 3.13. RC double-sampling receiver and its waveform [18]                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Fig. 3.14. Double-sampling receiver employing a low-BW TIA to avoid the charge sharing problem [19], [20]                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Fig. 3.15. Block diagram representation of the FFE-based receivers                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| Fig. 3.16. Pulse response at different points in the front-end shown in Fig. 3. 15. The low-BW TIA has pole Q of 0.5, $R_F = 9.7 \text{ k}\Omega$ and $f_{3dB} = 0.2 f_{bit}$                                                                                                                                                                                                                                                                                                                                                   |
| Fig. 3.17. Input-referred noise current of the FFE-based front-end shown in Fig. 3.15. The horizontal axis represents the 3dB bandwidth of the overall front-end. The TIA's bandwidth is changed according to the procedure and values in Table 3.2 while its pole <i>Q</i> is kept constant at 0.5.                                                                                                                                                                                                                            |
| Fig. 3.18. The noise performance of the FFE-based receiver in the case where the TIA has two readistinct poles. The horizontal axis represents the 3dB bandwidth of the overall front-end. The TIA's bandwidth is changed according to the procedure and values in Table 3.2 while its pole <i>Q</i> is shown in the legend                                                                                                                                                                                                     |
| Fig. 3.19. The best-case noise performance for each receiver architecture. The horizontal axis represents the bandwidth of the overall front-end. The TIA's pole <i>Q</i> and equalizer design for each curve are listed in Table 3.4                                                                                                                                                                                                                                                                                           |
| Fig. 3.20. The best-case noise performance for each receiver architecture at $f_{bit} = 30$ Gb/s. The horizontal axis represents the bandwidth of the overall front-end to the data rate ratio. The optimum value of the TIA's pole $Q$ is found to be 0.707 for the IIR-DFE and CTLE-based front-ends (the CTLE is designed as in the second row in Table 3.3). While the optimum values of the TIA's pole $Q$ of the 2-tap FIR-DFE and FFE-based front-ends are found to be 0.577 and 0.3 (and $\alpha = 0.7$ ), respectively |
| Fig. 3.21. The best-case noise performance for each receiver architecture at $C_D = 200$ fF. The horizontal axis represents the bandwidth of the overall front-end to the data rate ratio. The optimum value of the TIA's pole $Q$ is found to be 0.707 for the IIR-DFE and CTLE-based front-ends (the CTLE is designed as in the second row in Table 3.3). While the optimum values of TIA's pole $Q$ of the 2-tap FIR-DFE and FFE-based front-ends are found to be 0.577 and 0.3 (and $\alpha = 0.7$ ), respectively          |
| Fig. 4.1. Inv-TIA (a) circuitry, (b) small-signal model with noise sources                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| Fig. 4.2. (a) Inv-TIA bandwidth as a function of $R_F$ for a given total transistor width W (b) The required $R_F$ and the resulting gain and pole Q as a function of $C_I/C_D$ for a targeted bandwidth of 8 GHz.                                                                                                                                                                                                                                                                                                              |
| Fig. 4.3. (a) TIA's input-referred noise current as a function of $C_I/C_D$ for a fixed 3dB bandwidth of 8 GHz. (b) Receiver sensitivity as a function of $C_I/C_D$ for a FE that includes only a TIA. $f_{TIA}$                                                                                                                                                                                                                                                                                                                |

| and $V_s^{PP}$ are fixed at 8 GHz and 50 mV <sub>PP</sub> , respectively. The bold markers indicate the locations of maximum gain (MG), minimum noise (MN), and best overall sensitivity (BS)                                                                                                                             |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. 4.4. Inv-based Cherry-Hooper MA                                                                                                                                                                                                                                                                                      |
| Fig. 4.5. Receiver sensitivity for $f_{bit} = 16 \text{ Gb/s}$ , $V_s^{PP} = 50 \text{ mV}_{PP}$ , and various receiver architectures (a) $n = 1$ , and (b) $n = 3$                                                                                                                                                       |
| Fig. 4.6. VCSEL characteristics (a) P-I curve (b) V-I curve. Curves are not plotted into scale 72                                                                                                                                                                                                                         |
| Fig. 4.7. Circuit and operation of the VCSEL driver (a) circuit, (b) current switch model to transmit a binary "0" and (c) to transmit a binary "1"                                                                                                                                                                       |
| Fig. 4.8. The complete model of the driver, package, and VCSEL                                                                                                                                                                                                                                                            |
| Fig. 4.9. Modeled VCSEL performance excluding driver and package (a) P-I curve and (b) modulation response at various values of VCSEL current                                                                                                                                                                             |
| Fig. 4.10. Model-generated eye diagrams at the output of the transmitter considering the driver, package, and VCSEL for $I_{bias} = 4 \text{ mA}$ , $I_{mod} = 1 \text{ mA}$ and (a) $f_{bit} = 16 \text{ Gb/s}$ , and (b) fbit = 25 Gb/s                                                                                 |
| Fig. 4.11. Energy efficiency as a function of $C_I/C_D$ for $f_{bit} = 16 \text{ Gb/s}$ , $V_s^{PP} = 50 \text{ mV}_{PP}$ , and (a) $n = 1$ and (b) $n = 3$                                                                                                                                                               |
| Fig. 4.12. Energy efficiency as a function of $C_I/C_D$ for $f_{bit} = 25$ Gb/s, $V_s^{PP} = 100$ mV <sub>PP</sub> , and (a) $n = 1$ ( $V_{DD_D}$ is increased to 1.2 V) and (b) $n = 3$                                                                                                                                  |
| Fig. 4.13. Simulation results for the eye diagrams at the receiver output for various data rates and receiver architectures. The circuit parameters and the required peak-to-peak output voltage are also listed for each eye                                                                                             |
| Fig. 4.14. Link performance at various data rates and swing requirements (a) using 65 nm CMOS technology and advanced photonic and interconnect technologies (b) using advanced CMOS technology and typical photonic and interconnect technologies. A receiver with a single-stage MA is used for both simulations        |
| Fig. 5.1. The proposed and the conventional receivers are represented by the same block diagram (top). The bottom graph illustrates the operation of the proposed receiver (black) in contrast to that of the conventional receiver (gray)                                                                                |
| Fig. 5.2. (a) TIA's 3dB bandwidth and pole $Q_0$ as a function of the feedback resistor. (b) The exact and the approximate calculations of $\rho$ as a function of the feedback resistor                                                                                                                                  |
| Fig. 5.3. (a) Output pulse response for various values of $f_{TIA}/f_{bit}$ . The input current pulse has a peak-to-peak value of 10 µApp and a width of 100 ps. (b) Different gains as a function of $f_{TIA}/f_{bit}$ . $f_{bit}$ is fixed at 10 Gb/s while $f_{TIA}$ is swept by varying $R_F$ . The labeled points in |
|                                                                                                                                                                                                                                                                                                                           |

|      | (b) illustrate that linear equalization is favorable for applications that require high gain in the receiver FE                                                                                                                                                                                                                                                                                                                                       |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. | 5.4. Block diagram of (a) the third-order gain stage in [15] (b) the proposed EMA with a LPF inserted in each feedback path                                                                                                                                                                                                                                                                                                                           |
| Fig. | 5.5. (a) Pole-zero locations of the proposed EMA for various values of $\omega Z$ in comparison to the conventional third-order gain stage where $\omega_Z = \infty$ . The dashed arrows indicate the direction of pole-zero movements as $\omega_Z$ increases (b) amplitude response of the proposed EMA for various ratios of $\omega_Z/\omega_1$ . $\beta_1$ , $A_1$ and $\omega_1$ are fixed at 0.25, 2.5, and $2\pi \times 30$ GHz, respectively |
| Fig. | 5.6. Block diagram of the proposed front-end. The two-stage EMA is modified based on the two-stage MA in [14]. The grayed feedback cells indicate the locations of the inserted poles                                                                                                                                                                                                                                                                 |
| Fig. | 5.7. (a) Amplitude response (b) output response to an input current pulse with a peak-to-peak value of 15 $\mu$ A <sub>PP</sub> and width of 100 ps. The EMA parameters are $\omega_Z/\omega_1 = 0.075$ and $\beta_1 = 0.25$ .                                                                                                                                                                                                                        |
| Fig. | 5.8. Matlab generated 10 Gb/s output eye diagrams when the limited-bandwidth TIA is followed by (a) an EMA, and (b) a wideband MA. The peak-to-peak value of the input current is fixed at 15 $\mu$ A <sub>PP</sub>                                                                                                                                                                                                                                   |
| Fig. | 5.9. (a) Circuit model used for noise analysis (b) Matlab simulated noise reduction in the proposed FE compared to its conventional counterparts. The arrows indicate the amount of change for each noise component                                                                                                                                                                                                                                   |
| Fig. | 5.10. Block diagram and circuitry of the implemented front-end. Parameter values for 10 Gb/s operation are tabulated                                                                                                                                                                                                                                                                                                                                  |
| Fig. | 5.11. (a) Simulated amplitude response. (b) Simulated group-delay 109                                                                                                                                                                                                                                                                                                                                                                                 |
| Fig. | 5.12. Simulation results for the 10 Gb/s output eye diagrams when the limited-bandwidth TIA is followed by (a) a wideband MA and (b) the proposed EMA. In (c), the TIA's bandwidth is widened and a wideband MA is employed. The input current is fixed at 15 $\mu$ App for all simulations                                                                                                                                                           |
| Fig. | 5.13. Simulated performance under process and temperature variations (a) EMA's peaking at Nyquist frequency (b) gain and bandwidth of the overall FE                                                                                                                                                                                                                                                                                                  |
| Fig. | 5.14. (a) Chip micrograph (b) Test setup for electrical characterization                                                                                                                                                                                                                                                                                                                                                                              |
| Fig. | 5.15. Electrically measured BER as a function of input voltage amplitude for PRBS pattern length of 31. The inset shows the measured 10 Gb/s single-ended eye diagrams for both the conventional (black) and the proposed (white) FEs. The eyes diagrams are measured for an input voltage set to the receiver's sensitivity limit and a PRBS31pattern                                                                                                |

| Fig. | 5.16. (a) Bathtub curves measured at 10 Gb/s and PRBS pattern length of 31 (b) receiver sensitivity as a function of the input PRBS length                                                                                                                                                     |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. | 5.17. Simulation results for the 20 Gb/s output eye diagrams when the limited-bandwidth TIA is followed by (a) a wideband MA (b) the proposed EMA (b). In (c), the TIA's bandwidth is widened and a wideband MA is employed. The input current is fixed at $25 \mu A_{PP}$ for all simulations |
| Fig. | 5.18. Simulation results for the output eye diagram when the input current is set to 1 mA <sub>PP</sub> at (a) 10 Gb/s (b) 20 Gb/s                                                                                                                                                             |
| Fig. | <ul><li>6.1. NRZ versus PAM-4 signaling schemes (a) NRZ amplitude levels, (b) NRZ eye diagram,</li><li>(c) PAM-4 amplitude levels, and (d) PAM-4 eye diagram</li></ul>                                                                                                                         |
| Fig. | 6.2. Waveforms and power distributions of (a) noiseless transmitted signal (b) noisy received signal                                                                                                                                                                                           |

## List of Tables

| Table 2.1: Performance | e summary of the th | ree commonly used | TIAs in Fig. 2.1. |  |
|------------------------|---------------------|-------------------|-------------------|--|
|                        | 2                   | 5                 | 0                 |  |

| Table 3.1: Numerical examples for integral coefficients $I_{n0}$ and $I_{n2}$ for a second-o                                         | rder TIA 39          |
|--------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| Table 3.2: Noise Optimization procedure                                                                                              |                      |
| Table 3.3: Integral coefficients, bandwidth extension factor ( $\chi$ ) and amplitude peak equalized receiver with TIA $Q = 0.707$ . | king for CTLE-<br>47 |
| Table 3. 4: Optimum design point for different receivers                                                                             | 56                   |

| Table 4.1: Extracted parameters of a replica-loaded CMOS inverter with $W_p = W_n = 1 \mu\text{m} * NF$ , simulated in 1V-65nm CMOS technology |
|------------------------------------------------------------------------------------------------------------------------------------------------|
| Table 4.2: Model-predicted performance for various receiver architectures                                                                      |
| Table 4.3: VCSEL and driver model parameters 77                                                                                                |
| Table 4.4: Optimization procedure and bounds 81                                                                                                |
| Table 4.5: Performance comparison between the receiver's best sensitivity, and link's best energy efficiency design points.      81            |
| Table 4.6: Link performance across a broad range of technologies and data rates                                                                |
|                                                                                                                                                |
| T-11.51 D.:                                                                                                                                    |

| Table 5.1: Design parameters and performance summary of the proposed front-error to its conventional counterpart. | nd in comparison |
|-------------------------------------------------------------------------------------------------------------------|------------------|
| Table 5.2: Performance comparison of the three measured FEs                                                       |                  |
| Table 5.3: Performance comparison with published 10 Gb/s receivers.                                               |                  |

# List of Acronyms

| AWGN | additive white Gaussian noise           |
|------|-----------------------------------------|
| BER  | bit error rate                          |
| BPG  | bit pattern generator                   |
| BS   | best sensitivity                        |
| CDR  | clock and data recovery                 |
| CG   | common gate                             |
| СН   | Cherry–Hooper                           |
| CMOS | complementary metal-oxide-semiconductor |
| CMU  | clock multiplication unit               |
| CS   | common source                           |
| СТ   | continuous time                         |
| CTLE | continuous-time linear equalizer        |
| DFE  | decision feedback equalizer             |
| DMUX | demultiplexer                           |
| DOM  | dynamic offset modulation               |
| DT   | discrete time                           |
| EMA  | equalizing main amplifier               |
| FE   | front-end                               |
| FFE  | feedforward equalizer                   |
| FIR  | finite impulse response                 |
| IIR  | infinite impulse response               |
| I/O  | input-output                            |
| ISI  | inter-symbol interference               |

| LDD    | laser diode driver                                |
|--------|---------------------------------------------------|
| MA     | main amplifier                                    |
| MG     | maximum gain                                      |
| MMF    | multi mode fiber                                  |
| MN     | minimum noise                                     |
| MOSFET | metal oxide semiconductor field-effect transistor |
| MUX    | multiplexer                                       |
| NRZ    | non-return to zero                                |
| OMA    | optical modulation amplitude                      |
| PAM    | pulse amplitude modulation                        |
| PD     | photodetector                                     |
| PP     | power penalty                                     |
| PRBS   | pseudo random bit sequence                        |
| PSD    | power spectral density                            |
| RGC    | regulated cascode                                 |
| RX     | receiver                                          |
| SF     | shunt feedback                                    |
| SMF    | single-mode fiber                                 |
| SNR    | signal-to-noise ratio                             |
| TIA    | transimpedance amplifier                          |
| TX     | transmitter                                       |
| UI     | unit interval                                     |
| VCSEL  | vertical-cavity surface-emitting laser            |
| VEO    | vertical eye-opening                              |

### Chapter 1

## Introduction

In recent years, the increasing demand for bandwidth-intense services such as social networks, online high-definition video streaming, video conferences, online games, mobile internet, and cloud-based storage has caused an exponential growth of internet traffic. According to the Cisco Global Cloud Index [1], more than 15 zettabytes of data were transferred in 2020 as shown in Fig. 1.1(a). Further, the traffic has increased by nearly three times over the last five years [2]. This growth is expected to continue, necessitating a corresponding increase in the number of hyperscale data centers that include thousands of high-speed interconnects. Interestingly, Fig. 1.1(b) shows that the total traffic is dominated by data communication that takes place within the data center. This in turn drives the development of robust, high-speed, and energy-efficient interconnects to transfer the data around the data center. Electrical links are usually deployed for short distances up to 10 m. To extend the reach of electrical links, sophisticated equalization techniques can be deployed to compensate for their high-frequency losses. This solution considerably increases design complexity and dissipates more power and silicon area. Alternatively, optical links provide lower high-frequency losses, better immunity to interference, and higher capacity compared to their electrical counterparts. Therefore, optical links are widely used to communicate data between data centers or within data centers for distances up to 300 m with multi-mode fiber (MMF) or with single-mode fiber (SMF) when the distance exceeds 300 m.



Fig. 1.1. (a) Continuous growth in internet traffic (b) breakdown of traffic in 2020 [1].

Hyperscale data centers include thousands of high-speed interconnect links. Therefore, to maintain a reasonable power dissipation, recent research suggests that optical interconnects must achieve an efficiency of better than 1 pJ/bit at 25 Gb/s [3]. Further, most of the services provided by data centers are free of charge for the end-users. Therefore, in addition to being energy-efficient, optical links must be low-cost with costs below 10's of cents/Gbps [3], [4]. Most short-reach optical links in data centers are based on the vertical-cavity surface-emitting laser (VCSELs) operating at 850 nm over multimode optical fiber (MMF) [5]. MMF provides a cost-efficient solution for short-reach optical links up to 300 m. Compared to its single-mode fiber (SMF) counterpart, MMF has a larger inner core diameter which enables the use of optical connectors with relaxed tolerance and inexpensive optical components. However, MMF suffers from modal dispersion that limits the reach, especially as data rates increase. Therefore, single-mode fiber (SMF)-based links are usually used to extend the reach beyond 300 m.

#### 1.1 Motivation

The required metrics for short-reach optical links motivate research to design high-speed, dense, and low-power optical transceivers. Fig. 1.2 illustrates a simplified block diagram of a VCSEL-based short-reach optical link. On the transmitter (TX) side, a multiplexer (MUX) is used to merge several parallel low-speed data into a single high-speed serial data stream. To control the MUX, a clock multiplication unit (CMU) generates a bit rate clock from the parallel data clock. The high-speed serial data is then fed to a laser diode driver (LDD) which modulates the current of the VCSEL. In some applications, drivers retime the data and thus require a data rate clock signal from the CMU. The modulated light emitted from the laser is then transmitted to a photodiode (PD) through a MMF channel.

The transmitted data are in a non-return-to-zero (NRZ) format. The signal is on for the entire bit period to transmit a binary "1" and is off for the entire bit period to transmit a binary "0". The inverse of the bit period is the data rate. For example, transmitting the periodic sequence '010101...' at a data rate of 10 Gb/s in NRZ format creates a 5 GHz square wave with a 50 % duty cycle. The NRZ is also known as two-level pulse-amplitude modulation (PAM2). Although higher modulation schemes such as PAM4 and PAM8 are emerging, this thesis is aimed at PAM2.

On the receiver (RX) side, a photodiode (PD) converts the optical signal into a small electrical current. This current is converted to a voltage with some amplification by a transimpedance amplifier (TIA). The TIA is followed by a main amplifier (MA) to provide further amplification to produce a signal with sufficiently large amplitude to drive a clock and data recovery unit (CDR). The CDR synchronizes an internal clock to the incoming data and uses it to capture and regenerate the data. Finally, a demultiplexer (DMUX) converts the high-speed serial data back into *n* parallel lower-speed data streams. The combination of the TIA and the MA is called receiver front-end (FE) and it represents the main interest of this work.



Fig. 1.2. Block diagram of a typical short-reach photonic link.

Historically, two different approaches have been adopted to design the receiver FE. The first approach is to design the FE to have a wide bandwidth of at least 70 % of the data rate ( $f_{bit}$ ) to maintain signal integrity. Despite its simplicity, this approach has a major drawback at high speed where the FE becomes power-hungry and occupies a larger chip area mainly due to the passive inductors required for bandwidth extension. A more recent technique uses a FE with bandwidth far below the data rate (20 %-30 %)  $f_{bit}$ . The FE is then followed by an equalizer to compensate for the inter-symbol interference (ISI) introduced by the intentionally reduced bandwidth. In contrast to electrical links, the equalizer is used here to compensate for the receiver bandwidth, not the copper channel ISI. Therefore, simple equalization circuits are sufficient to cancel the ISI without introducing significant hardware or power consumption overhead.

This thesis presents several research directions toward the design of high-speed and energyefficient receiver circuits for short-reach-optical interconnect. It presents the design, optimization, and test results for the receiver front-end (dashed box in Fig. 1.2). A methodology for accurately analyzing equalizer-based receivers is presented. The power-sensitivity trade-off in the receiver is optimized to minimize the link's overall power dissipation. The design, implementation, and measurement results of a new equalization technique in optical receivers are also presented. The presented technique improves the front-end's area- and power-efficiencies compared to the conventional wideband design approach.

#### **1.2 Thesis Objectives**

The main objective of this thesis is to design high-speed area- and power-efficient receiver circuits for short-reach optical links for modern data centers.

The objectives of this thesis are summarized as follow:

- Study equalizer-based optical receivers to provide general guidelines for noise optimization in these receivers. The objective is to reach an optimization model that allows designers to compare the noise performance of different receiver architectures for a given technology, photodiode capacitance, and data rate. The model also revisits the analysis of these receivers in comparison to their conventional full band counterparts and provides key modifications to correctly calculate the sensitivity.
- Explore the sensitivity-power trade-off in optical receivers to minimize the link's overall power dissipation. The sensitivity is calculated as a function of the receiver's input capacitance relative to the detector capacitance for various receiver architectures, data rates, and swing requirements. The goal is to study how small (less sensitive) the receiver can become before its power reduction is outpaced by the transmitter's increase in power.
- Present new receiver architectures that employ novel equalization techniques. The goal is to build high-speed and low-power optical receiver circuits in CMOS technology for the next generation of high-speed short-link optical interconnects.

#### **1.3 Claim of Originality**

The contributions of this thesis can be summarized as follows:

- The thesis presents a novel methodology for evaluating the noise performance of equalizerbased optical receivers. A new concept of effective gain is presented and used as an inputreferral gain. The proposed methodology is used to compare the noise performance of different receiver architectures. Further, the proposed method is used to study the optimal allocations of TIA's pole based on the type of the used equalizer.
- The thesis presents a complete study and optimization of the power-sensitivity trade-off in optical receivers. Conventionally, the receiver is designed for minimum noise. In this thesis, we design the receiver to minimize the link's overall power dissipation. For that purpose, design trade-offs for the receiver, transmitter, and the overall link are presented to study how small, or noisy, the receiver can become to minimize the link's total power dissipation. Unlike conventional wisdom, our simulation results show that energy-efficient links require low-power receivers with input capacitance much smaller than that required for noise-optimum performance.
- The thesis presents the design and measurement results of a novel inductor-less equalization technique for optical receivers. The equalizer is realized by adding a pole in the feedback paths of an active feedback-based wideband amplifier. By embedding the peaking in the main amplifier (MA), the front-end meets the sensitivity and gain of conventional equalizer-based receivers with better energy efficiency by eliminating the equalizer stages. Measurement results demonstrate the capability of the proposed equalization technique in restoring the required bandwidth and improving the performance compared to the conventional design approach.

#### **1.4 Publications and Contributions of the Author**

The research in this dissertation is presented in several published, submitted or under revision journal articles, conference proceedings, and tutorials. The publications and contributions of the author are listed below:

#### **Journal Articles:**

J1) D. Abdelrahman and G. E. R. Cowan, "Noise Analysis and Design Considerations for Equalizer-Based Optical Receivers," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 66, no. 8, pp. 3201-3212, Aug. 2019.

D. Abdelrahman: contributed to the idea, performed all analysis and simulations, and drafted the manuscript.

G. Cowan: contributed to the idea, supervised the work, and edited and reviewed the manuscript.

J2) D. Abdelrahman O. Liboiron-Ladouceur, and G. E. R. Cowan, "Optimization of the Power-Sensitivity Trade-off in CMOS Receivers for Energy-Efficient Short-Reach Optical Links," Submitted to IEEE Transactions on Circuits and Systems I: Regular Papers.

D. Abdelrahman: proposed the idea, performed all analysis and simulations, and drafted the manuscript.

O. Liboiron-Ladouceur: reviewed the manuscript.

G. Cowan: supervised the work, edited, and reviewed the manuscript.

J3) D. Abdelrahman O. Liboiron-Ladouceur, and G. E. R. Cowan, "An Inductorless Low-Power Design Technique for Linear Equalizations in Optical Receivers,"

D. Abdelrahman: proposed the idea, designed, and drew the layout of the receiver, performed electrical measurements, and wrote the manuscript.

O. Liboiron-Ladouceur: reviewed the manuscript.

G. Cowan: supervised the work, edited, and reviewed the manuscript.

Access to an optical testbed was limited. This situation was further aggravated with the campus shut down due to the pandemic situation. The manuscript will be submitted to a journal once the optical measurements are completed.

J4) C. Williams, D. Abdelrahman, X. Jia, A. I. Abbas, O. Liboiron-Ladouceur and G. E. R. Cowan, "Reconfiguration in Source-Synchronous Receivers for Short-Reach Parallel Optical Links," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 27, no. 7, pp. 1548-1560, July 2019.

C. Williams: proposed the idea of reconfiguration, organized the teamwork, designed and drew the layout of the RF path and assembly of top-level chip, led the measurements, and wrote most of the manuscript.

D. Abdelrahman: decided the implementation of the analog part, designed and drew the layout of the receiver analog front-end, participated in measurements, wrote a section in the manuscript, and revised the manuscript.

- O. Liboiron-Ladouceur: co-supervised the work, edited and reviewed the manuscript.
- G. Cowan: supervised the work, edited, and reviewed the manuscript.

#### **Conference Papers:**

C1) D. Abdelrahman, O. Liboiron-Ladouceur, and G. Cowan "Low-noise optical receiver frontend using narrow-bandwidth TIA and cascaded linear equalizer," 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, 2017,

D. Abdelrahman: proposed the idea, designed the circuit, performed all analysis and simulations, and drafted the manuscript.

- O. Liboiron-Ladouceur: reviewed the manuscript.
- G. Cowan: supervised the work, edited, and reviewed the manuscript.

Although not included in the thesis, this work was our first effort to understand the performance of equalizer-based optical receivers and laid a foundation for noise analysis work.

C2) D. Abdelrahman and G. E. R. Cowan, "Noise Analysis and Design Considerations for Equalizer-Based Optical Receivers," IEEE Int. Symp. Circuits and Systems (ISCAS), Sevilla, 2020.

A reduced version of the noise analysis work published in J1 is also presented as a conference paper C2.

#### **Tutorial:**

T1) **D. Abdelrahman**, B. Radi, O. Liboiron-Ladouceur, and G. Cowan, "Silicon-Photonic/CMOS Receiver Design for Energy-Efficient Short-Reach Optical Links with High Bandwidth Density" IEEE Int. Symp. Circuits and Systems (ISCAS), Sevilla, 2020.

All authors equally contributed to the preparation of the material.

#### **1.5 Thesis Organization**

This thesis is divided into three main topics that are related to the design of high-speed and energy-efficient shorth-reach optical links. The thesis is organized into six chapters as follows: **Chapter 2** further discusses the different approaches to optical receiver design. The Chapter motivates the limited-bandwidth receivers but also emphasizes the challenges of noise analysis in this design methodology.

**Chapter 3** presents a thorough analysis of the equalized-based optical receivers. The chapter proposes a method for accurately calculating the sensitivity of the receiver considering the gain reduction due to the TIA's limited bandwidth. The proposed analysis is applied to example receiver architectures, including decision feedback equalizer (DFE), continuous-time linear equalizer (CTLE), and feedforward equalizer (FFE). Several simulation scenarios are considered to compare different receiver architectures among each other to find the architecture that achieves the best sensitivity.

**Chapter 4** investigates the power-sensitivity trade-off in optical receivers to minimize the link's total power dissipation. Traditionally, optical receivers with FET front ends are designed for optimized noise-based sensitivity by matching the circuit's input capacitance to the photodiode capacitance which leads to excessive power dissipation in the receiver. In this Chapter, design trade-offs for the receiver, transmitter, and the overall link are presented, and comparisons are made to study how small (noisy) the receiver can become before its power reduction is outpaced by the transmitter's increase in power. Simulation results show that energy-efficient links require low-power receivers with input capacitance much smaller than that required for noise-optimum performance.

**Chapter 5** presents a design methodology to mitigate the trade-off between gain and bandwidth of CMOS multistage amplifiers. A receiver front-end (FE) that employs a high-gain narrowband transimpedance amplifier (TIA) followed by an equalizing main amplifier (EMA) is proposed. The EMA provides a high-frequency peaking to extend the FE's bandwidth from 25 % to 60 % of the targeted data rate. The peaking is realized by adding a pole in the feedback paths of an active feedback-based wideband amplifier. By embedding the peaking in the main amplifier (MA), the front-end meets the sensitivity and gain of conventional equalizer-based receivers with better

energy efficiency by eliminating the equalizer stages. The proposed FE has been implemented in TSMC 65 nm CMOS technology and measured electrically at 10 Gb/s. Measurement results demonstrate the improved performance of the proposed FE compared to its conventional counterpart.

Finally, Chapter 6 concludes the work and presents potential areas for future work.

### Chapter 2

### Background and Fundamentals

#### 2.1 Introduction

This chapter begins with a discussion of the required metrics for optical receivers. Section 2.2 discusses conventional optical receiver front-ends. The section provides a brief analysis of commonly used transimpedance amplifier (TIA) topologies in terms of their gain, bandwidth, and noise. Design trade-offs in the main amplifier (MA) are also discussed, considering the impact of cascading more stages on bandwidth, noise, and power. Practical implementation examples for TIAs and MA are presented. In Section 2.3, the effect of reducing the bandwidth of the front-end is explained. This section motivates the design of optical receivers with bandwidth intentionally reduced far below the targeted data rate to achieve higher gain and better sensitivity. This observation introduces the next chapter that provides a wide survey on equalization techniques for these limited-bandwidth front-ends and a methodology for accurately calculating their noise.

#### 2.2 Conventional Optical Receiver Front-End

A conventional optical receiver front-end is highlighted by the dashed box in Fig. 1.2. It consists of a transimpedance amplifier (TIA) and a main amplifier (MA). The performance of both amplifiers is described below.

#### 2.2.1 Transimpedance Amplifier

The primary function of a TIA is to convert the small photo-current  $(I_{in})$  generated by the photodiode (PD) into a large output voltage  $(V_{out})$ . The performance of the TIA is usually characterized by transimpedance gain  $(R_T)$ , bandwidth  $(f_{3dB})$ , and noise. The transimpedance gain is measured in units of Ohm and, at this point of the thesis, is defined as the midband value of the frequency-dependent transfer function  $Z_{TIA}(f)$ . The gain is required to be as large as possible to create an output signal with a sufficiently large amplitude to drive the MA and to suppress the noise from the downstream circuits.

The bandwidth is the frequency at which the amplitude response  $Z_{TIA}(f)$  drops three dBs below its midband value. To receive data at a rate of  $f_{bit}$ , the bandwidth must be wide enough to introduce negligible inter-symbol interference (ISI). On the other hand, a large bandwidth of the TIA enhances the noise bandwidth and consequently degrades the sensitivity. The wide bandwidth also trades-off with the gain which necessitates cascading more MA stages to satisfy the voltage amplitude requirements of the CDR driven by the receiver front-end. With more MA stages, power dissipation and noise increase. Traditionally, the trade-off between the ISI and the sensitivity is mitigated by setting the TIA's bandwidth to (50 %-70 %) of the targeted data rate. This statement is further investigated later in this Chapter and Chapter 3.

The input-referred noise current  $(i_{n,rms})$  is used to quantify the noise performance of the TIA. It is a fictitious current source that cannot be observed in an actual circuit. It is defined as the current source that can be added at the input of an ideal noiseless TIA to reproduce the same output noise as the original, noisy TIA [6]. The main noise contributors in the TIA are transistor and/or resistor thermal noise sources. The power spectral densities (in A<sup>2</sup>/Hz) of these two sources are given by  $I_{n,M}^2 = 4kT\gamma g_m$ , and  $I_{n,R}^2 = 4kT/R$ , respectively, where k is the Boltzmann constant, T is the temperature in Kelvin,  $\gamma$  is the excess noise factor, and  $g_m$  is the transconductance of the transistor.

#### **Practical Implementations**

Fig. 2.1 shows the three most used TIA topologies and Table 2. 1 compares their performance. A simple passive resistor (R)-TIA in Fig. 2.1 (a) can perform the function of current-to-voltage conversion, delivering a transimpedance gain of  $R_L$ . The bandwidth of this passive R-TIA is determined by the time constant at the input node  $C_T R_L$ , where  $C_T$  is the total parasitic input capacitance that includes the photodiode, pad, and wiring capacitances in addition to the circuit's input capacitance. The presence of  $C_T$  leads to a strong trade-off between the gain and the bandwidth as they both depend on  $R_L$ .

Due to the trade-off in the passive TIA, active topologies are usually used. For example, the shunt-feedback (SF)-TIA is shown in Fig. 2.1 (b). One of the most common implementations of the SF-TIA is the CMOS inverter-based TIA (Inv-TIA) also shown in Fig. 2.1 (b). In this TIA, PMOS and NMOS transistors are connected in a push-pull structure to form the core voltage amplifier. A resistor brackets this amplifier to provide shunt-shunt feedback. The high input impedance of the MOS transistors forces the input current to pass through the feedback resistor  $R_F$ . Therefore,  $R_F$  determines the gain of the Inv-TIA. The shunt feedback lowers the input impedance by the loop gain by a factor of  $(1 + A_0)$ , where  $A_0$  is voltage gain of the CMOS inverter. This in turn extends the bandwidth by the same factor compared to a R-TIA having the same gain (*i.e.*,  $R_F = R_L$ ).

Another active TIA topology is shown in Fig. 2.1 (c) and is called the common-gate (CG)-TIA. The CG-TIA exhibits a very low input impedance of  $(1/g_{m1})$ , where  $g_{m1}$  is the transconductance of the input transistor. The transimpedance gain of the CG-TIA is determined by the load resistor R<sub>D</sub>, while its bandwidth is determined by the time constant at the input node  $C_T/g_{m1}$ . This means that the bandwidth and the transimpedance gain are decoupled from each other.



Fig. 2.1. Commonly used TIA topologies (a) Resistor TIA, (b) Shunt-feedback TIA and its CMOS inverter-based implementation, and (c) Common-gate TIA.

The input-referred noise power spectral density (PSD) is listed in Table 2. 1 for the three TIA topologies. It is obvious that the larger the gain element (*i.e.*,  $R_L$ ,  $R_F$ , or  $R_D$ ) the smaller the input-referred noise. However, this improvement in sensitivity trades-off with the bandwidth. While both the Inv-TIA and the CG-TIA achieve a comparable gain and power dissipation for a targeted bandwidth, the Inv-TIA is known for its superior noise performance. This can be explained as follows: although the noise current from the gain elements in both circuits ( $R_D$  in the CG-TIA and  $R_F$  in the Inv-TIA) directly refers to the input, the CG has an additional noise source from the bias current source that also directly refers to the input. This results in more noise in the CG-TIA even if the two circuits are designed for equal gain (*i.e.*, the gain elements contribute the same amount of noise).

When considering the effect of the second pole in the Inv-TIA, the amplifier exhibits a secondorder amplitude response. As a result, the maximum achievable transimpedance gain drops with the square of the TIA bandwidth (*i.e.*,  $R_T \propto 1/f_{TIA}^2$ ) [7]. For the CG-TIA, the gain drops with both  $f_{TIA}^2$  and pole spacing. The CG-TIA achieves maximum gain if the two poles have the same frequency. Even then, the gain is only 41 % of what the Inv-TIA can attain [7]. This trade-off results in impractically low values for the gain at high data rates for both TIAs. Consequently, additional gain stages should be inserted after the TIA to achieve the minimum required gain of the front-end.

|                                                                               | R-TIA<br>Fig. 2.1 (a)    | Inv-TIA<br>Fig. 2.1 (b)                                          | CG-TIA<br>Fig. 2.1 (c)                                    |
|-------------------------------------------------------------------------------|--------------------------|------------------------------------------------------------------|-----------------------------------------------------------|
| Transimpedance Gain $(R_T)$                                                   | $R_L$                    | $R_F$                                                            | $R_D$                                                     |
| Bandwidth $(f_{3dB})$                                                         | $\frac{1}{2\pi C_T R_L}$ | $\frac{1+A_0}{2\pi C_T R_F}$                                     | $\frac{g_{m1}}{2\pi C_T}$                                 |
| Input-Referred Noise Power $(\overline{\iota_{n,tn}^2})$ (A <sup>2</sup> /Hz) | $\frac{kT}{C_T R_L^2}$   | $\overline{\iota_{n,RF}^2} + \frac{\overline{V_{n,A}^2}}{R_F^2}$ | $\overline{\iota_{n,RD}^2} + \overline{\iota_{n,bias}^2}$ |
| DC Power Dissipation                                                          | Very low                 | Moderate                                                         | Relatively low                                            |

Table 2. 1:Performance summary of the three commonly used TIAs in Fig. 2.1.

 $A_0$  is the DC voltage gain of the CMOS inverter.

 $C_T$  is the total input capacitance including the photodiode, pad, wiring and circuit's input capacitances.

 $C_L$  is the load capacitance.

 $g_{m1}$  is the transconductance of the input transistor in the CG-TIA.

 $\overline{\iota_{n,x}^2}$  and  $\overline{V_{n,x}^2}$  are the current and voltage noise power, respectively.

#### 2.2.2 Power Penalty due to the Swing Requirements of the CDR

Fig. 2.2 (a) shows a receiver front-end that consists of shunt-feedback (SF)-TIA followed by an n-stage MA. A noise-limited input signal produces a peak-to-peak voltage of  $V_0^{PP}$  at the output of this front-end given by

$$V_0^{PP} = SNR \ i_{n,rms} R_T A_{MA} \tag{2.1}$$

where *SNR* is the required signal-to-noise ratio and equals 14.07 for BER of  $10^{-12}$ .  $A_{MA}$  is the total gain of the MA.  $V_0^{PP}$  is sufficiently large to drive an ideal clock-and-data recovery (CDR) circuit to achieve the desired BER. However, the decision circuit in a realistic CDR has a finite sensitivity and requires a minimum peak-to-peak input voltage swing ( $V_S^{PP}$ ). Therefore, the FE's output voltage needs to be increased by  $V_S^{PP}$  as shown in Fig. 2.2 (b) to attain the same BER as an ideal CDR. The finite sensitivity of the CDR incurs a power penalty (PP) of [8]

$$PP = \frac{V_{O}^{PP} + V_{S}^{PP}}{V_{O}^{PP}} = 1 + \frac{V_{S}^{PP}}{SNR \ i_{n,rms} \ R_{T} \ A_{MA}}$$
(2.2)

The incurred PP is plotted in Fig. 2.2 (c) as a function of  $A_{MA}$  for SNR,  $i_{n,rms}$ , and  $V_S^{PP}$  fixed at 14, 1  $\mu$ A<sub>rms</sub>, and 50 mV<sub>PP</sub>, respectively. The figure shows that the MA needs to provide a very high gain to reduce the PP to a negligible value.



Fig. 2.2. (a) A receiver front-end that consists of an Inv-TIA and an n-stage MA. (b) Representative eye diagrams illustrating the power penalty incurred by the limited sensitivity of the decision circuit. The grayed area represents the output voltage when the input is set to the noise-based sensitivity limit. The height of the bottom eye is increased by  $V_S^{PP}$  to satisfy the voltage amplitude requirements of a practical CDR (c) The incurred power penalty as a function of the gain of the MA.

For example, a gain of 100 (40 dB) is required to reduce the incurred PP to 0.15 dB (1.0174). To achieve such a high gain, several stages must be cascaded in the MA.

#### 2.2.3 Main Amplifier

The main amplifier is usually constructed by cascading n identical gain stages to simultaneously achieve high gain and wide bandwidth. If each gain stage has  $m^{th}$ -order Butterworth amplitude response, then the overall bandwidth ( $f_{MA}$ ) and the total gain ( $A_{MA}$ ) of this cascaded chain are calculated as [9]

$$A_{MA} = A_s^n, \qquad f_{MA} = f_s \sqrt[2m]{\sqrt{n}\sqrt{2} - 1}$$
 (2.3)

where  $A_s$  and  $f_s$  are the per-stage gain and bandwidth, respectively. This requires each gain stage to have a gain-bandwidth product of [9]

$$GBW_{s} = A_{s}f_{s} = \frac{f_{MA}}{\sqrt[2m]{n/2} - 1} \sqrt[n]{A_{MA}}$$
(2.4)



Fig. 2.3. The required per-stage gain-bandwidth product as a function of the number of stages for  $A_{MA} = 40 \ dB$ ,  $f_{MA} = 10 \ GHz$ , and various values of m.

The required per-stage gain-bandwidth product is plotted in Fig. 2.3 as a function of the number of stages for  $A_{MA} = 40$  dB,  $f_{MA} = 10$  GHz, and m = 1, 2 and 3. The figure shows that increasing the number of cascaded stages as well as the order of each stage mitigates the required per-stage gain-bandwidth product. However, increasing *n* considerably increases the power consumption. It also reduces the per-stage gain which causes a rapid accumulation of noise and consequently degrades sensitivity. Fig. 2.3 also shows that  $A_s f_s$  tends to saturate for higher values of *n*. As a result, *n* is typically limited to three to five stages [9]. That is, higher values of *n* increase the power dissipation and degrade the sensitivity for a marginal improvement in the per-stage gain-bandwidth product.

#### **Practical Implementations**

A straight-forward implementation of a first-order gain stage is the common-source (CS) amplifier depicted in Fig. 2.4 (a). In the CS amplifier, the load resistor  $(R_{D,CS})$  converts the small-signal drain current into an output voltage  $(V_{out})$ . The CS amplifier provides a low-frequency voltage gain of  $A_{V,CS} = g_{ma}R_{D,CS}$ , where  $g_{ma}$  is the transconductance of the NMOS transistor. The bandwidth of this topology is determined by the output pole  $\omega_{p,CS} = (R_{D,CS}C_L)^{-1}$ , where  $C_L$  is the total load capacitance. This leads to a strong trade-off between the gain and the bandwidth as they both depend on the load resistor.



Fig. 2.4. Schematic of the (a) Common-source amplifier, (b) Common-source-based Cherry-Hooper amplifier, and (c) Inverterbased Cherry-Hooper amplifier.

An alternative to first-order stages is the Cherry-Hooper (CH) amplifier in Fig. 2.4 (b). It consists of a cascade of two NMOS transistors with resistive feedback ( $R_{F,CH}$ ) around the second transistor. Due to this feedback, the drain of each NMOS sees a small-signal resistance of approximately  $1/g_{m2}$ . This relatively low resistance results in high-frequency poles at  $\omega_{p1,CH} = g_{m1}/C_X$  and  $\omega_{p2,CH} = g_{m2}/C_Y$ , where  $g_{mi}$  is the transconductance of the transistor  $M_i$  and  $C_X$  and  $C_Y$  are the total capacitance at nodes X and Y, respectively. The low-frequency voltage gain of this topology is  $A_{V,CH} = g_{m1}R_{F,CH} - g_{m1}/g_{m2}$ . Assuming that  $R_{F,CH} \gg 1/g_{m2}$ , the CH amplifier achieves the same voltage gain as a CS amplifier with  $R_{D,CS} = R_{F,CH}$ , but with a wider bandwidth [6].

Fig. 2.4(c) shows another implementation of the CH amplifier. It consists of a cascade of two CMOS inverters, Inv1 and Inv2, with resistive feedback,  $R_{F,CH}$ , around Inv2 to boost the bandwidth. Inv1 acts as a transconductance converter while Inv2 together with  $R_{F,CH}$  implement a transimpedance transfer function. The inverter-based CH (Inv-CH) is widely adopted for various data rates and technologies [10] [11] [12].
#### Active feedback for higher-order implementations

Another implementation of the second-order gain stage is shown in Fig. 2.5 (a). This circuit consists of two first-order CS stages  $(M_{1-4})$  with active feedback formed by the differential pair  $(M_{f1\&2})$  around the second CS amplifier [9]. The feedback converts the cascade of two first-order stages into a single second-order stage. Therefore, instead of having two real and identical poles, the implementation in Fig. 2.5 (a) can give complex poles. Unlike the conventional CH amplifier, active feedback does not resistively load the transimpedance stage [9] and allows for easier control of the pole quality factor. The circuit can be modeled by the block diagram in Fig. 2.5 (b) where each CS amplifier is modeled by a first-order transfer function A(s) and the active feedback is modeled by  $\beta(s)$ .

Higher-order gain stages can be constructed by manipulating the number of cascaded first-order amplifiers in the forward path and the connection of the active feedback. For example, the thirdorder gain stage in Fig. 2.6 (a) consists of three identical first-order gain cells A(s) and an active feedback cell  $\beta(s)$  brackets the last two cells [13]. Without active feedback, the overall transfer function of the three-stage amplifier is  $A^3(s)$ , which has three identical real poles. Adding the active feedback  $\beta(s)$  results in an overall transfer of function of  $A^3(s)/(1 + A^2(s)\beta(s))$  with a non-dominant real pole and two complex poles [13]. This means that active feedback rearranges the pole locations of a uniform three-stage amplifier. Increasing the feedback gain extends the bandwidth at the cost of reducing the gain and increasing the amplitude peaking. Cascading thirdorder gain stages leads to a fast accumulation of the amplitude peaking. To get around this limitation, the active interleaving feedback in Fig. 2.6 (b) is presented in [13]. The sixth-order amplifier in Fig. 2.6 (b) can be divided into two non-identical third-order stages with over- and under-damped amplitude responses. By carefully choosing the feedback gain, the overall sixthorder amplifier has the possibility of having a flat amplitude response with much less peaking compared to the case where no interleaving feedback is deployed.

The peaking performance can be further improved by using the nested feedback in Fig. 2.6 (c) [14]. The nested feedback introduces a feedforward zero in the loop gain expression that results in improved stability margin compared to the third-order and the third-order interleaved architectures [15].



Fig. 2.5. Active feedback-based CH amplifier (a) circuity [9] and(b) block diagram.



(a)



(b)



Fig. 2.6. Higher-order implementations of active feedback-based MA (a) a third-order gain stage [13] (b) a cascade of two third-order stages with interleaving feedback [13], and (c) a fifth-order MA using nested active feedback [14].

#### 2.2.4 The Transimpedance Limit

The transimpedance limit is defined as the maximum achievable gain for a given bandwidth and technology [7]. Referring to the front-end in Fig. 2.2 (a), the SF-TIA and each MA stage are assumed to have a second-order Butterworth amplitude response. Further, each MA stage is assumed to have a bandwidth equal to the TIA's bandwidth. The transimpedance limit of this front-end is bounded by [7]

$$R_{T,FE} = \frac{(A_s f_s)^{n+1}}{2\pi C_T f_{FE}^{n+2}} {\binom{(n+1)}{\sqrt{2}}} - 1 {\binom{n+2}{4}}$$
(2.5)

where  $R_{T,FE}$  and  $f_{FE}$  are the gain and bandwidth of the overall front-end.  $C_T$  is the total input capacitance that includes detector, pad, and ESD capacitance in addition to circuit's input capacitance. For the limit in (2.5), n = 0 corresponds to the case where no MA is employed (i.e., the FE consists only of the TIA). In this situation, the TIA's gain drops with the square of the bandwidth. As the number of stages increases, the limit becomes more sensitive to the ratio  $A_s f_s / f_{FE}$ . This ratio is called the bandwidth headroom and it measures how close the FE's targeted bandwidth to the capability of the technology [7].

The transimpedance limit is plotted in Fig. 2.7 as a function of the number of MA stages for a targeted data rate of  $f_{bit} = 25$  Gb/s. The FE is assumed to have a bandwidth of 70 % of the targeted data rate (*i.e.*,  $f_{FE} = 0.7f_{bit} = 17.5$  GHz) to introduce a negligible ISI. In this simulation,  $C_T$  is fixed at 300 fF (this assumption is justified in the next section). The transimpedance limit is plotted for various values of the per-stage gain-bandwidth product as indicated in the legend of Fig. 2.7. The desired  $R_{T,FE}$  determines the minimum required number of gain stages. For example, to achieve a total gain of 70 dB $\Omega$ , at least three stages are required for  $A_s f_s = 100$  GHz. When  $A_s f_s$  is reduced to 65 GHz, the required number of stages increases to six. The desired gain becomes unrealizable by any number of stages when  $A_s f_s$  is further reduced to 50 GHz. The perstage gain-bandwidth product is limited by the transit frequency ( $f_T$ ) of the technology node. Therefore, Fig. 2.7 indicates that as the targeted data rate becomes closer to the capability of the technology, it becomes harder to design the FE with a sufficient gain in a realistic power budget.



Fig. 2.7. The transimpedance limit as a function of the number of MA stages for  $C_T = 300 \, fF$ ,  $f_{FE} = 17.5 \, GHz$ , and various values of  $A_s f_s$ .

## 2.2.5 Noise-Power Trade-off

The input-referred noise of optical receivers with FET front ends is minimized by matching the circuit's input capacitance  $(C_I)$  to the total input parasitic capacitance  $(C_D)$  [16]. For example, if the photodiode, pad, and wiring capacitances, are assumed to be 80 fF [12], 50 fF, and 20 fF, respectively, results in a total input parasitic capacitance of  $C_D = 150$  fF. To minimize the input-referred noise, the TIA must be designed to have an input capacitance of  $C_I = C_D = 150$  fF. This leads to a total input capacitance of  $C_T = C_I + C_D = 300$  fF. The circuit's input capacitance is a measure for the transistor width and hence power dissipation. Therefore, maintaining the capacitive matching rule could lead to excessive power dissipation in the receiver.

The DC power dissipation depends on the actual implementation of the circuit. Therefore, the SF-TIA and each MA stage in Fig. 2.2 (a) are assumed to be implemented by the Inv-TIA and the Inv-CH, respectively. The power consumption of a CMOS inverter is linearly proportional to its transconductance. For a constant drain current density, the total transconductance can be expressed as  $g_m = 2\pi f_{T,Inv}C_I$ , where  $f_{T,Inv}$  is the transit frequency of the CMOS inverter at the chosen biasing point. Defining the drain current-efficiency factor of the input devices as  $V^* = I_D/g_m$ , the inverter's power consumption is calculated as  $P_{DC,Inv} = I_D V_{DD} = 2\pi f_{T,Inv}C_I V^* V_{DD}$ . The receiver's front-end in Fig. 2.2(a) employs an inverter for the TIA in addition to two inverters for each MA stage. Considering that all inverters are identical in device dimensions, the receiver power consumption is calculated as



Fig. 2.8. The energy-efficiency of the FE in Fig. 2.2 (a) as a function of the circuit's input capacitance to the total parasitic capacitance for  $f_{bit}$  of 25 Gb/s,  $C_D$  of 150 fF, and various values of the number of stages. All design points have a bandwidth of 70% of the data rate. The SF-TIA and each MA stage are assumed to be implemented by the Inv-TIA and the Inv-CH, respectively.

$$P_{DC,RX} = 2\pi (2n+1) f_{T,Inv} C_I V^* V_{DD}$$
(2.6)

For a CMOS inverter simulated in TSMC 65 nm CMOS technology with  $V_{DD} = 1$  V,  $f_{T,Inv}$  and  $V^*$  are found to be 57 GHz and 56 mV, respectively. The energy-efficiency of the receiver is calculated as  $P_{DC,RX}/f_{bit}$ . It is measured in mW/Gb/s or equivalently pJ/bit. The energy-efficiency is plotted in Fig. 2.8 as a function of the circuit's input capacitance relative to the total parasitic capacitance for  $f_{bit}$  of 25 Gb/s,  $C_D$  of 150 fF, and various values of the number of stages. The figure shows that increasing the number of stages to achieve the desired gain while maintaining the capacitive matching rule leads to very poor efficiency in the receiver. For example, if we consider the curve of  $A_s f_s = 65$  GHz in Fig. 2.7 corresponds to the 65 nm CMOS technology used in Fig. 2.8, then six gain stages are required to achieve a gain of 70 dB $\Omega$ . The energy-efficiency of the receiver for n = 6 and  $C_I/C_D = 1$  is approximately 3 pJ/bit. This efficiency is inadequate to meet the standards that target an efficiency of better than 1 pJ/bit at 25 Gb/s [3].

# 2.3 Limited-Bandwidth Front-End

The above discussion clearly shows that as data rates increase, wideband FEs designed under capacitive matching rule become power-hungry and inadequate to meet standards that target efficiency of better than 1 pJ/bit at 25 Gb/s. The capacitive matching rule is revisited in Chapter 4. This section studies the effect of pushing the bandwidth far below the data rate. The simple resistor TIA is used here to explain the general concept. However, the analysis and conclusions are still applicable to the inverter-based and CG TIAs.

#### 2.3.1 What if the Bandwidth is Reduced?

Fig. 2.9(a) shows the small-signal model of the R-TIA. The integrated output noise of this TIA is calculated as  $v_{n,out}^2 = kT/C_T$ , where k is the Boltzmann constant, T is the temperature in Kelvin, and  $C_T$  is the total input capacitance. The rms output noise voltage ( $V_{rms}$ ) is the square-root of  $v_{n,out}^2$ . The output noise appears to be independent of the gain. Therefore, the output signal-to-noise ratio ( $SNR_{OUT}$ ) must be considered

$$SNR_{OUT} = \frac{V_{OUT,PP}}{V_{rms}} = \frac{R_L I_{in}}{\sqrt{\frac{kT}{C_T}}}$$
(2.7)

where  $V_{OUT,PP}$  is the peak-to-peak output voltage. Substituting for  $R_L = 1/2\pi C_T f_{TIA}$  in the above equation leads to

$$SNR_{OUT} = \frac{I_{in}}{2\pi f_{TIA} \sqrt{kTC_T}}$$
(2.8)

The  $SNR_{OUT}$  obtained in the above equation is plotted in Fig. 2.9 (b) with circle markers as a function of the TIA's bandwidth to the data rate ratio. In this simulation, the  $f_{bit}$ ,  $I_{in}$ , and  $C_T$  are fixed at 10 Gb/s, 100  $\mu$ A<sub>pp</sub>, and 200 fF, respectively. The curve suggests that  $SNR_{OUT}$  continues to improve as the TIA's bandwidth is further reduced below the data rate. However, this is an erroneous conclusion because the calculation of the  $SNR_{OUT}$  in (2.8) does not consider the impact of the inter-symbol interference (ISI) introduced at low bandwidths. To account for the ISI, the peak-to-peak output voltage must not be calculated as  $R_L I_{in}$ . Instead,  $V_{OUT,PP}$  must be calculated by the internal opening of the simulated eye diagram as shown in Fig. 2.10.



Fig. 2.9. (a) Resistor TIA (b) Output signal-to-noise ratio of the R-TIA as a function of the 3 dB bandwidth to data rate ratio for  $f_{bit}$ ,  $I_{in}$  and  $C_T$  fixed at 10 Gb/s, 100  $\mu A_{pp}$ , and 200 fF, respectively.

An accurate expression of the signal-to-noise ratio can be written as

$$SNR_{OUT} = \frac{VEO}{V_{rms}}$$
(2.9)

where VEO is the vertical eye-opening indicated in Fig. 2.10. The  $SNR_{OUT}$  obtained in the above equation is plotted in Fig. 2.9 with diamond markers as a function of the TIA's bandwidth to the data rate ratio. This curve suggests that TIA's bandwidth can be reduced below the conventional design point of 70 % of the data rate. This improves the signal-to-noise ratio up to a certain point. Beyond this point, the introduced ISI closes the eye diagram and severely degrades the signal-to-noise ratio. The figure shows that the signal-to-noise reaches a maximum value at a bandwidth of 25 % of the data rate.

In the presence of the ISI, the gain can be calculated as  $VEO/I_{in}$ . The gain of the TIA at the lowbandwidth point (25 % of the data rate) is 185.2  $\Omega$  compared to 110.4  $\Omega$  at the high-bandwidth point (70 % of the data rate). A higher gain in the preamplifier suppresses the noise contributions from the downstream circuits and reduces the required number of MA stages. This motivates research in equalizer-based optical receivers where the TIA's bandwidth is intentionally reduced to approximately (20 % -30%) of the targeted data rate, but the VEO is restored by equalizers.



Fig. 2.10. Simulation results for the eye-diagram at the output of the R-TIA for  $f_{bit}$ ,  $I_{in}$  and  $C_T$  fixed at 10 Gb/s, 100  $\mu A_{pp}$ , and 200 fF, respectively. The resistor value and  $f_{TIA}/f_{bit}$  are indicated in the title of each eye-diagram.

# 2.4 Summary

This chapter provided the background information for three research directions that are discussed in the following chapters. First, it has been shown that the traditional approach to receiver design becomes inadequate to meet the energy efficiency requirements at high data rates. This motivates the design of equalizer-based optical receivers that are the focus of Chapter 3 where a wide survey of recently published work is presented and a methodology to accurately evaluate the performance of these receivers is presented. Chapter 2 also showed that optical receivers with FET front-end are usually designed under what is called the capacitive-matching rule that leads to excessive power dissipation in the receiver. This rule is revisited in Chapter 4 to study how small the receiver can be made to minimize the link's overall power dissipation. Finally, a conventional block diagram of an optical receiver front end was discussed, and different implementations of the TIA and MA were presented. The active feedback-based MA is exploited in Chapter 5 to present a new design technique for linear equalization in optical receivers.

# Chapter 3

# Noise Analysis and Design Considerations for Equalizer-Based Optical Receivers

# **3.1 Introduction**

Optical receiver front-ends that are intentionally designed to have a bandwidth low enough that significant inter-symbol interference (ISI) is introduced are becoming commonplace. Although the resultant ISI must be removed using an equalizer, the lower bandwidth allows for higher gain in the front-end's first stage, lower input-referred noise, and fewer gain stages. With fewer main-amplifier stages, power dissipation is reduced. The noise analysis of these front-ends presents several challenges. This chapter derives integrated input-referred noise for inverter-based shunt-feedback transimpedance amplifiers from first principles and highlights the importance of correctly estimating the gain and noise bandwidth of the receiver. The notion of the effective gain of the receiver is introduced which is lower than the midband gain typically used in noise calculations. The analysis of the inverter-based TIA is used to discuss important design trade-offs depending on the type of equalizer used. Integrated input-referred noise is derived and compared for front-ends using decision-feedback equalizers (DFEs), continuous-time linear equalizers (CTLEs), and feed-forward equalizers (FFEs). Simulation results show that a DFE-based receiver achieves the lowest input-referred noise.



Fig. 3.1. Representative block and eye diagrams of (a) conventional optical receiver where the TIA and the MA respectively provide midband gains of  $Z_{TIA,0}$  and  $A_{MA,0}$  and the front-end has a sufficiently wide bandwidth to introduce no ISI (b) equalizer-based optical receiver where the effective opening of the equalized eye  $(V_{pp})$  is less than the peak-to-peak opening of the eye right after the TIA ( $Z_{TIA,0}i_{pp}$ ) (offset compensation details are not shown).

The block diagram of a conventional optical receiver is shown in Fig. 3.1 (a). It consists of a transimpedance amplifier (TIA) followed by additional stages of main amplifiers (MA) and ultimately a clock-and-data recovery circuit whose input is a high-speed latch or latches. The optical receiver front-end (TIA/MA) must provide enough gain that a noise-limited input signal can drive the latch with sufficient voltage swing while adding as little noise as possible. As discussed in Chapter 2, as data rates ( $f_{\text{bit}}$ ) increase, traditional approaches to receiver design dictate that the bandwidth of the front-end also increases. This requires a relatively low gain per stage. If fewer but higher gain stages were used, power dissipation could be reduced.

When the TIA's bandwidth is pushed far below  $f_{\rm bit}$ , the noise analysis shows that noise performance will improve, however, severe ISI may be introduced, to the extent that the eye may be fully closed as shown in the previous chapter. Several different approaches have been used to remove ISI, ranging from discrete-time feed-forward equalizers (DT-FFEs) [17] [18] [19] [20] continuous-time FFEs [21], continuous-time linear equalizers (CTLEs) [22] [23] and decision-feedback equalizers (DFEs) using both finite-impulse-response (FIR) [24] [25] and infinite-impulse-response (IIR) [26] [27] feedback. In this work, the input-referred noise of each of these approaches is derived and compared.

As discussed in the previous chapter, the most useful measure of signal integrity is the signalto-noise ratio at the input of the latch

$$SNR_L = v_{pp} / v_{n,rms} \tag{3.1}$$

where  $v_{pp}$  and  $v_{n,rms}$  are the peak-to-peak eye-opening and the root-mean-squared noise voltages at the input of the latch, respectively. However, in optical receiver design, the input-referred noise current  $(i_{n,rms})$  is a more common performance measure where the receiver's current sensitivity  $(i_{pp}^{sens})$  is calculated as

$$i_{pp}^{sens} = SNR_l i_{n,rms} \tag{3.2}$$

where  $SNR_I$  is the required signal-to-noise ratio calculated using input quantities. The sensitivity calculation in (3.2) is accurate only if the SNRs in (3.1) and (3.2) are equal.

$$SNR_{L} = \frac{i_{pp}^{sens}gain}{v_{n,rms}} = \frac{i_{pp}^{sens}}{\left(v_{n,rms}/gain\right)} = \frac{i_{pp}^{sens}}{i_{n,rms}} = SNR_{I}$$
(3.3)

where "*gain*" is the transimpedance gain of the overall front-end. This means that the two SNRs are equal only if the output noise is referred to the input by the same gain seen by the signal.

It is a common misconception in the literature to use the front-end's midband gain as an inputreferral gain regardless of its architecture. The midband gain can be used only if the front-end has a wide bandwidth where the output eye diagram is free of ISI as shown in Fig. 3.1 (a). However, in the equalizer-based front-end in Fig. 3.1 (b), the effective gain from the TIA's input to its output (and hence the effective gain of the overall front-end) is less than the midband gain. Therefore, using the midband gain to refer the output noise to the input leads to an underestimation of the input-referred noise and hence an inaccurately optimistic estimate of sensitivity.

The rest of this Chapter is organized as follows: Section 3.2 shows a detailed analysis of the inverter-based TIA, drawing attention to the difference between the midband gain, pulse response's height, and vertical eye-opening. The effective gain is then employed in Section 3.3 to calculate the input-referred noise of equalizer-based front-ends depending on the type of equalizer used. The noise calculations aim to provide recommendations for the optimum 3dB bandwidth-to-data rate ratio as well as the optimum TIA pole locations that achieve the best sensitivity for each receiver architecture. Section 3.4 compares the performance of the low-bandwidth front-ends among each other and discusses the effect of changing the photodiode capacitance and data rate on the optimum design points found in Section 3.3. Finally, Section 3.5 concludes the chapter.

#### 3.2 Inverter-Based TIA

#### 3.2.1 Frequency Response

Fig. 3.2 shows the small-signal model of the inverter-based TIA. The photodiode capacitance  $(C_D)$  and the circuit's input capacitance  $(C_I)$  are combined into a total input capacitance of  $C_T$ . The two FET transistors are represented by a voltage-controlled current source with a transconductance of  $g_m$  in parallel with an output resistance  $R_A$ . Therefore, the core amplifier has an open-loop transfer function of  $A(s) = A_0/(1 + s/(2\pi f_A))$  where  $A_0 = g_m R_A$  is the DC voltage gain and  $f_A$  is the open-loop pole formed by the output resistance  $(R_A)$  and output capacitance  $(C_L)$ . Using this model, the TIA exhibits a second-order transfer function given by

$$Z_{TIA}(s) = \frac{V(s)}{I(s)} = \frac{Z_{TIA,0}}{\left(\frac{s}{\omega_n}\right)^2 + \frac{s}{\omega_n Q} + 1}$$
(3.4)

where  $Z_{TIA,0}$ ,  $\omega_n$  and Q are the midband transimpedance gain, natural pulsation frequency, and pole quality factor, respectively, and given by

$$Z_{TIA,0} = \frac{A_0 R_F - R_A}{A_0 + 1} \tag{3.5.a}$$

$$\omega_n = \sqrt{\frac{A_0 + 1}{R_F R_A C_T C_L}}, \qquad Q = \frac{\sqrt{(A_0 + 1)R_F R_A C_T C_L}}{(R_F + R_A)C_T + R_A C_L}.$$
(3.5. b)

The relation between  $\omega_n$ , Q and the TIA's 3-dB bandwidth ( $\omega_{3dB} = 2\pi f_{3dB}$ ) is governed by

$$\rho(Q) = \frac{\omega_{3dB}}{\omega_n} = \sqrt{\left(1 - \frac{1}{2Q^2}\right) + \frac{1}{2Q^2}\sqrt{8Q^4 - 4Q^2 + 1}}.$$
(3.6)

Higher Q results in wider 3-dB bandwidth at the expense of more peaking in the frequency domain and ringing in the time domain. The percent overshoot in the step-response is given by

$$P. 0. = 100e^{-\pi/\sqrt{4Q^2 - 1}}.$$
(3.7)

Wideband TIAs are usually designed to have a Butterworth transfer function with maximally flat amplitude response by setting Q to  $1/\sqrt{2}$  and selecting the largest  $R_F$  for which the target  $f_{3dB}$  is achieved. This leads to  $\omega_{3dB} = \omega_n$  and the *P*. *O*. is only 4%. However, in this work *Q* is chosen to optimize the noise performance depending on the receiver architecture.

The inverter-based TIA is chosen for this work because it exhibits two unique features compared to its common gate (CG) TIA counterpart. First, unlike the CG-TIA, changing the gain element ( $R_F$  in the inverter-basted TIA or load resistor in the CG-TIA) does not alter the DC biasing point. Second, as shown by (3.5.b) the inverter-based TIA can be designed to have complex or real poles. The first feature allows us to separately optimize the values of  $R_F$  and  $g_m$  without being limited by the DC biasing constraints. The second feature allows us to investigate the optimum pole locations that achieve the best sensitivity depending on the type of equalizer used.



Fig. 3.2. The small-signal model of the inverter-based TIA.

#### 3.2.2 Time Response

When the bandwidth is limited, the midband value of  $Z_{TIA}(s)$  is a deceptive measure of the transimpedance gain. The effective gain  $Z_{TIA,e}$  must be calculated from the transient response, more precisely, from the pulse response. The TIA's pulse response is the response for an isolated one transmitted in a sea of zeros [25] [26] [27]. Therefore, it demonstrates the basic tradeoff between gain, settling time, and ISI. The pulse response of the TIA under discussion is plotted in Fig. 3.3 for  $f_{3dB}$  ranging from  $0.1f_{bit}$  to  $f_{bit}$  and constant Q of 0.707. To do so, the open-loop pole  $f_A$  is swept by sweeping  $R_A$  while fixing  $C_L$ . Then, for each value of  $R_A$ , the value of  $R_F$  that satisfies the Q constraint and the corresponding  $f_{3dB}$  are calculated from (3.5.b) assuming constant  $C_T$  and  $g_m$ . More discussion about these assumptions is provided in Section 3.2.4.

The output pulse response in Fig. 3.3 is calculated for an ideal input current pulse with unity amplitude ( $i_{pp} = 1$  A) and width of  $T_b = 1/f_{bit}$  (or alternatively referred to as the unit interval (UI)). To calculate ISI, the pulse response is sampled at baud-rate relative to its peak, resulting in a discrete-time sequence  $V_{h,n}$ 

$$V_{h,n} = vpulse (nT_b) \qquad -\infty < n < \infty \tag{3.8}$$

where  $V_{h,0}$  is the main-cursor sample.  $V_{h,n}$  (n < 0) and  $V_{h,n}$  (n > 0) are pre- and post-cursor samples, respectively. Therefore,  $V_{h,0}$  can be interpreted as an "effective gain" of the circuit. It gives the maximum achievable eye-opening assuming all introduced ISI is canceled.



Fig. 3.3. TIA's output pulse response for  $f_{3dB}$  ranging from  $0.1f_{bit}$  to  $f_{bit}$  with Q,  $C_T$ ,  $C_L$  and  $g_m$  fixed at 0.707, 136.8 fF, 113.6 fF and 53.5 m $\Omega^{-1}$ , respectively. The input is an ideal current pulse with unity amplitude ( $i_{pp} = 1$  A) and width of  $T_b = 1/f_{bit} = 100$  ps.

If we assume no equalization, the introduced ISI can be added destructively, closing the vertical eye-opening (VEO) to

$$VEO = V_{h,0} - \sum_{n \neq 0} |V_{h,n}|.$$
(3.9)

The VEO can also be interpreted as an effective gain for the case in which no ISI is removed. The effective gain calculation from a single-bit response represents a conservative measure of the system gain where it is based on the worst-case ISI and settling time. However, it is still the most useful approach to quantify signal degradation due to insufficient bandwidth.

Fig. 3.4 shows the gains of the TIA in Fig. 3.2 calculated from the midband (3.5.a), the pulse height ( $V_{h,0}$ ) and the VEO (3.9) as a function of the bandwidth-to-data rate ratio ( $f_{3dB}/f_{bit}$ ). In the simulations that follow,  $f_{bit}$  is assumed to be 10 Gb/s. For full-bandwidth designs where  $f_{3dB} \ge$  $0.5f_{bit}$  all gains are equal since the pulse settles to  $V_{h,0} = i_{pp}Z_{TIA,0}$  and ISI is negligible. The red curve plots the midband gain, which for the DC-coupled circuit in Fig. 3.2 is  $Z_{TIA,0}$ . When the bandwidth is limited to less than  $0.5f_{bit}$ , the midband gain continues to grow with  $f_{3dB}^{-2}$  as predicted by the Transimpedance Limit [7]. However, the gain calculated from the pulse height grows more slowly than the midband gain. That is, for  $f_{3dB} < 0.5f_{bit}$ , the pulse does not have enough time to reach the value  $i_{pp}Z_{TIA,0}$ . Further, the gain calculated from the VEO reaches its maximum at  $f_{3dB} = 0.4f_{bit}$  then it starts to decrease. The eye becomes fully closed at  $f_{3dB} = 0.2f_{bit}$ . The VEO is reduced not only due to the introduced ISI but also because the slowly growing pulse height caused by the reduced settling time (see (3.9)). For example, if the bandwidth is reduced from  $0.5f_{bit}$  to  $0.2f_{bit}$  and the ISI is properly removed, the effective gain is now determined by the pulse height at the low-bandwidth point which is 3.6x larger than the midband gain at the fullbandwidth point. However, the midband gain at the low-bandwidth point is 1.88x larger than the gain achievable through ideal equalization as calculated by pulse response height. This is why accurate calculation of the gain is so important. Referring the output noise to the input by too large of a gain will underestimate the integrated input-referred noise and overestimate the input SNR.

#### 3.2.3 Input-Referred Noise Current

The noise of the TIA is a crucial performance parameter that usually dominates other noise sources in the receiver and therefore determines the receiver's sensitivity. The input-referred noise current is used to compare the noise performance of different TIA designs. The noise sources of the inverter-based TIA are shown in Fig. 3.2. To calculate the input-referred noise current, the contribution to the output noise power spectral density (PSD) of each noise source (resistors and transistors) is first calculated. Because the noise sources are uncorrelated, the total output noise PSD  $(V_n^2(f))$  is constructed by adding up all individual power spectra  $V_n^2(f)$ . The total output noise PSD can be integrated up to the CDR bandwidth to find the output rms noise that can be referred to the input by the right gain to find the input-referred noise. Alternatively,  $V_n^2(f)$  can be input-referred noise PSD  $(I_n^2(f))$ 

$$I_n^2(f) = \frac{V_n^2(f)}{|Z_{TIA}(f)|^2} = \beta_1 + \beta_2 f^2$$
(3.10)

where  $\beta_1$  and  $\beta_2$  represent the white and colored noise coefficients, respectively. For the inverterbased TIA under discussion, the noise coefficients can be simply found as

$$\beta_{1} = \frac{4kT}{R_{F}} + \frac{4kT\gamma}{g_{m}R_{F}^{2}}, \qquad \beta_{2} = 4kT\gamma \frac{(2\pi C_{T})^{2}}{g_{m}}$$
(3.11)

where k is the Boltzmann constant, T is the temperature in Kelvin and  $\gamma$  is the noise factor of the input transistor. In the simulations that follow,  $\gamma$  is assumed to be 2.



Fig. 3.4. Normalized transimpedance gain calculated from the midband, pulse response height, and vertical eyeopening. The pulse height and VEO are calculated based on Fig. 3.3.

Equations (3.10) and (3.11) can be linked to Fig. 3.2 as follows: the thermal noise PSD from the feedback resistor  $(I_{n,R_F}^2)$  contributes directly to the input. When referring the noise PSD of the FET  $(I_{n,ch}^2)$  to the input, the referral transfer function has a high-pass characteristic. This makes the FET contribution to the input noise consist of two parts, a white noise part that appears in  $\beta_1$  and a more significant part that increases with frequency and is represented by  $\beta_2$ . The resistor  $R_A$  in Fig. 3.2 represents the transistor's output resistance that models the channel length modulation for a FET operating in saturation. This means that  $R_A$  is not a physical resistor in the circuit. Therefore, it neither contributes noise in (3.11) nor in the following simulations. However, if  $R_A$  is a physical load resistor such as in the common-source TIA, its noise contribution can be mathematically incorporated in (3.11) by changing  $\gamma$  to ( $\gamma + 1/g_m R_A$ ). This comes from the fact that the two noise sources  $I_{n,ch}^2$  and  $I_{n,RA}^2$  are uncorrelated and can be combined in one source given by  $4kT(\gamma + 1/g_m R_A)g_m$ .

The total integrated input-referred noise power  $(\overline{\iota_n^2})$  is determined by dividing the integrated output-referred noise power by an appropriate gain. In traditional TIA design where the  $f_{3dB} \ge 0.5 f_{bit}$ , the low-frequency gain is used [16] [8], giving

$$\bar{\iota}_n^2 = \frac{1}{|Z_{TIA}(0)|^2} \int_0^\infty V_n^2(f) \ df.$$
(3.12)

Rearranging (3.10) and substituting into (3.12)

$$\bar{\iota}_n^2 = \frac{1}{|Z_{TIA}(0)|^2} \int_0^\infty |Z_{TIA}(f)|^2 (\beta_1 + \beta_2 f^2) \, df.$$
(3.13)

This extends to

$$\overline{\iota_n^2} = \frac{\beta_1}{|Z_{TIA}(0)|^2} \int_0^\infty |Z_{TIA}(f)|^2 df + \frac{\beta_2}{|Z_{TIA}(0)|^2} \int_0^\infty |Z_{TIA}(f)|^2 f^2 df$$
(3.14)

which eventually will be in the form of

$$\bar{\iota_n^2} = \beta_1 B W_{n0} + \frac{\beta_2}{3} B W_{n2}^3$$
(3.15)

where  $BW_{n0}$  and  $BW_{n2}$  are the noise bandwidths for white and colored noise, respectively, and given by [16] [8],

$$BW_{n0} = \frac{1}{\left|Z_{TIA,0}\right|^2} \int_0^\infty |Z_{TIA}(f)|^2 df = I_{n0} f_{3dB}.$$
 (3.16. a)

$$BW_{n2}^{3} = \frac{3}{\left|Z_{TIA,0}\right|^{2}} \int_{0}^{\infty} |Z_{TIA}(f)|^{2} f^{2} df = I_{n2}^{3} f_{3dB}^{3}$$
(3.16. b)

where  $I_{n0}$  and  $I_{n2}$  are the integral coefficients that convert the 3dB bandwidth to the corresponding noise bandwidths for a given shape of the TIA's amplitude response. Table 3.1 lists numerical values for  $I_{n0}$  and  $I_{n2}$  for a second-order TIA with different pole Q [16].

The above noise calculation uses the midband gain to refer the output noise to the input. As previously shown in Fig. 3.1 and Fig. 3.4, this is correct only when the bandwidth is wide enough  $(f_{3dB} \ge 0.5 f_{bit})$ . In this scenario, the calculated SNRs at the input and output will be equal. However, when the bandwidth becomes limited to less than  $0.5 f_{bit}$ , (3.12) must be corrected. In this case, the signal sees an effective gain less than  $Z_{TIA}(0)$ . Dividing the integrated output-referred noise power by too large of a gain will underestimate the integrated input-referred noise and overestimate the input SNR. Equation (3.12) can be corrected by considering the effective gain

| Q     | I <sub>n0</sub> | $I_{n2}$ |
|-------|-----------------|----------|
| 0.5   | 1.22            | 2.07     |
| 0.577 | 1.15            | 1.78     |
| 0.707 | 1.11            | 1.49     |
| 0.9   | 1.17            | 1.34     |
| 1     | 1.23            | 1.32     |

Table 3.1: Numerical examples for integral coefficients  $I_{n0}$  and  $I_{n2}$  for a second-order TIA

$$\bar{\iota_n^2} = \frac{1}{|Z_{TIA,e}|^2} \int_0^\infty V_n^2 \, df$$
(3.17)

where  $Z_{TIA,e}$  is the effective gain. This allows the integrated input-referred noise to be calculated. The integrated noise in (3.17) will predict the same SNR at the input as an SNR calculation at the output. Notice that even when the TIA bandwidth is reduced, (3.10) correctly calculates the input-referred noise PSD since the frequency-dependent gain is used. Following the same noise analysis for the bandwidth-limited case while considering the reduced effective gain, (3.15) is modified to (18) where the noise bandwidths are given by (3.16)

$$\bar{\iota_n^2} = \beta_1 B W_{n0} \left| \frac{Z_{TIA,0}}{Z_{TIA,e}} \right|^2 + \frac{\beta_2}{3} B W_{n2}^3 \left| \frac{Z_{TIA,0}}{Z_{TIA,e}} \right|^2.$$
(3.18)

# **3.3 Noise Optimization Procedure**

In this work, the TIA's noise performance is optimized under the assumption that the core amplifier has a constant gain-bandwidth product ( $GBW_A = A_0f_A = constant$ ). This optimization scenario is recommended by [16] because it leads to better noise performance, lower power consumption, and higher transimpedance gain compared to other scenarios that assume constant feedback resistor or constant load capacitance. Under this constraint, the optimum FET size neither depends on the TIA's bandwidth nor pole quality factor and depends only on the selected technology (transit frequency ( $f_T$ ) and noise factor ( $\gamma$ )). This means that the optimum transistor size relative to the photodiode capacitance ( $C_I = 0.71C_D$ ) is the same for all TIAs designed in the same technology regardless of their bandwidth [16]. The fixed FET size translates to a constant  $g_m$  and constant power dissipation. The optimization procedure and variables are summarized in Table 3.2.

| Step |                                                                                | Values and Bounds <sup>(1)</sup>                   |                               |
|------|--------------------------------------------------------------------------------|----------------------------------------------------|-------------------------------|
| 1    | Give the design specifications                                                 | $f_T$                                              | 150 GHz <sup>(2)</sup>        |
|      |                                                                                | $f_{bit}$                                          | 10 Gb/s                       |
|      |                                                                                | C <sub>D</sub>                                     | 80 fF <sup>(3)</sup>          |
|      |                                                                                | Receiver architecture                              |                               |
| 2    | Calculate                                                                      | $GBW_A = f_T/2$                                    | 75 GHz                        |
|      |                                                                                | $C_I = 0.71 C_D^{(4)}$                             | 56.8 fF                       |
|      |                                                                                | $C_T = C_I + C_D$                                  | 136.8 fF                      |
|      |                                                                                | $g_m = 2\pi f_T C_I^{(5)}$                         | $53.5 \text{ m}\Omega^{-1}$   |
|      |                                                                                | $C_L = g_m / (2\pi GBW_A)$                         | 113.6 fF                      |
| 3    |                                                                                | 0.3 to 1                                           |                               |
| 4    | Sweep the open-loop pole $f_A$ by s<br>from step 2. Then find $R_F$ that sa    | $f_A$ is swept from $0.1 f_{bit}$ to $1.2 f_{bit}$ |                               |
| 5    | For each value of $R_F$ , calculate                                            | f <sub>3dB</sub>                                   | From (3.5.b)                  |
|      |                                                                                | effective gain                                     | Depends on RX architecture    |
|      |                                                                                | noise bandwidths                                   | Depends on RX<br>architecture |
| 6    | Calculate and plot the input-refe                                              | From (3.18)                                        |                               |
| 7    | Repeat steps 3 to 6 for different v<br>and the corresponding $f_{3dB}/f_{bit}$ |                                                    |                               |

Table 3.2: Noise optimization procedure

<sup>(1)</sup> These values are used in all simulations unless mentioned otherwise. The effect of changing theses initial values is studied in Section 3.4.2.

<sup>(2)</sup> Assuming 65nm CMOS technology.

<sup>(3)</sup> Value found in [12]

<sup>(4)</sup> Under constant  $A_0 f_A$  constraint, optimum transistor size relative to the photodiode capacitance ( $C_I = 0.71C_D$ ) is the same for all TIAs designed in the same technology regardless their bandwidth [16].

<sup>(5)</sup> Once  $C_I$  is fixed, the  $g_m$  is calculated based on the  $f_T$  of the technology.

Assuming no equalization and the TIA's output is directly connected to the latch's input, the conventional and the proposed noise calculations are compared in Fig. 3.5. The corresponding noise currents are calculated by taking the square root of (3.15) and (3.18) and plotted by the red and black curves, respectively. For the latter, the effective gain is calculated by the VEO in (3.9). The blue line in Fig. 3.5 is the corrected noise when an ideal equalizer is employed to perfectly (noiselessly) remove the ISI, and therefore, the effective gain is now determined by the pulse height. In this simulation, the TIA's pole Q is set to 0.707 and its  $f_{3dB}$  is swept according to the procedure and values in Table 3.2



Fig. 3.5. Input-referred noise current as a function of  $f_{3dB}/f_{bit}$  calculated using midband gain, pulse response height, and VEO. The TIA pole Q is set to 0.707 and the  $f_{3dB}$  is swept according to the procedure and values in Table 3.2.

The conventional noise calculation using the midband gain suggests that  $i_{n,rms}$  continues to decrease as  $f_{3dB}$  shrinks which is an erroneous conclusion when we assume that  $f_{bit}$  is unchanged. The corrected noise calculation using the VEO shows that the equivalent input-referred noise reaches its minimum value at  $f_{3dB} = 0.4 f_{bit}$  then starts to increase again due to the reduced effective gain. This coincides with the conclusion drawn from Fig. 3.4 and [27]. The three labeled points in Fig. 3.5 summarize the motive behind this work. That is, when the bandwidth is reduced from  $0.5 f_{bit}$  (point a) to  $0.2 f_{bit}$  and proper equalization is used (point b), the sensitivity improves by a factor of 2.15x. Erroneously using the midband gain leads to an optimistic estimation of the input-referred noise (point c) where the sensitivity appears to be improved by a factor of 4.17x relative to point a. This leads to a design-time error of  $(10\log(4.17/2.15) = 2.88dB)$  in calculating the OMA sensitivity.

Assuming the effective gain is equal to the pulse-response gain requires an equalizer that can noiselessly remove both pre- and post-cursor ISI. Since practical equalizers cannot do this, the anticipated noise behavior using practical equalizers is considered in the next section.

#### **3.4 Noise Calculation of Equalizer-Based Receivers**

#### 3.4.1 DFE-Based Receivers

Fig. 3.6 (a) and (b) respectively show the block diagram of FIR- and IIR-DFE-based receivers. An ideal IIR or infinite length FIR-DFE cancels all the post-cursor ISI but not the pre-cursor ISI. Assuming a unit pulse input current, the effective gain of the DFE-equalized receiver is calculated as

$$Z_{e,DFE} = V_{h,0} - \sum_{n<0} |V_{h,n}| - \sum_{n>N} |V_{h,n}|$$
(3.19)

where *N* is the number of taps in the FIR-DFE and equal to  $\infty$  in the IIR-DFE. This means that DFEs remove post-cursor ISI but have no bearing on the pulse height as seen at the output of the TIA. The effective gain from (3.19) is inserted in (3.18) to calculate the input-referred noise current of the DFE-based receivers using the noise bandwidths from (3.16). Fig. 3.7 shows the corrected input noise of 1-tap, 2-tap FIR- and IIR-DFEs in contrast with the uncorrected noise (using (3.5.a) and (3.15)). In this simulation, the TIA pole *Q* is set to 0.707.

In the corrected noise calculation, the input-referred noise reaches a minimum point then starts to increase again due to the growth of the pre- and the residual post-cursor ISI (in the case of FIR-DFE) outpacing the slowly growing  $V_{h,0}$ . The noise reaches its minimum at a bandwidth of 26 %, 25 %, and 21 % of the data rate for the 1-tap, 2-tap, and IIR-DFE, respectively. These points are 1.79×, 1.99×, and 2.45× larger than the uncorrected noise at the same bandwidth. This leads to an OMA sensitivity error of 2.5, 2.99, and 3.89 dB, respectively, if the incorrect approach to noise estimation is used.

The red curve with x-marker in Fig. 3.7 shows the corrected noise calculated for the case when an ideal equalizer that removes all pre- and post-cursor ISI is used. The effective gain in this scenario is determined by the pulse height as shown by the blue curve in Fig. 3.7. Although this is not achievable in practical DFE implementations, this curve is plotted here to emphasize the following 1) even with the ideal equalizer, the midband gain should not be used for noise calculation 2) the pre-cursor ISI introduced by the second-order TIA limits the sensitivity improvement in the DFE-based receivers.



Fig. 3.6. DFE-based receivers (a) FIR feedback [24] (b) IIR feedback, modified based on [26].

The fact that DFEs are not able to cancel the pre-cursor ISI motivates the investigation of the impact of the placement of the TIA's poles on the input-referred noise. To do so, the pole Q is changed over a range of 0.5 to 0.9. In each case, the corresponding noise bandwidths are picked from Table 3.1. These noise bandwidths are then inserted in (3.18) to calculate the input-referred noise using the effective gain calculated from (3.19), considering the case of the 2-tap FIR-DFE. The simulation in Fig. 3.8 shows that increasing Q from 0.5 to 0.577 slightly improves the noise minimum. That is, for a given bandwidth, higher Q allows for using higher  $R_F$ . This in turn allows the pulse to reach a higher peak and reduces the noise contribution from the feedback resistor. On the other hand, higher Q results in more pre-cursor ISI which reduces the effective gain and degrades the noise performance. Considering this trade-off, optimum input noise occurs for Q = 0.577 at a  $f_{3dB} = 0.18 f_{bit}$ .

When considering the case of an IIR-DFE, the difference between noise minima at different Qs becomes negligible with slightly lower minimum noise occurring for Q = 0.707 at  $f_{3dB} = 0.2f_{bit}$ . In contrast to the 2-tap FIR-DFE, the removed post-cursor ISI n > 2 allows the receiver to tolerate more pre-cursor ISI and shifts the optimum Q from 0.577 to 0.707. The conclusion about the optimum Q and the optimum  $f_{3dB}/f_{bit}$  ratio for IIR-DFE coincides with the analysis presented in past work [25] [27]. However, they did not study the optimum Q for the FIR-DFE.



Fig. 3.7. Corrected and uncorrected noise for DFE-based receivers. The  $f_{3dB}$  is changed according to the procedure and values in Table 3.2 while Q is kept constant at 0.707.



Fig. 3.8. Impact of the placement of the TIA's poles on the input-referred noise of 2-tap FIR-DFE based receiver. The  $f_{3dB}$  is changed according to the procedure and values in Table 3.2 while the values of the TIA's pole Q are given in the legend.

### 3.4.2 CTLE-Based Receivers

A general block diagram of a CTLE-equalized front-end is shown in Fig. 3.9. An ideal, unity lowfrequency gain and noiseless CTLE flattens the front-end's response over the frequency range of interest. In such a case there is no need to correct the noise calculation because the equalized signal sees an effective gain equal to  $Z_{TIA}(0)$ . However, the relevant 3dB bandwidth to use in noise bandwidth calculations is now the bandwidth of the combination of TIA and CTLE, which will be on the order of  $f_{bit}/2$ . The analysis that follows takes into account the limited capability of a single-stage CTLE in restoring the bandwidth and investigates the notion that a CTLE amplifies the high-frequency noise.



Fig. 3.9. General block diagram of a CTLE-based front-end.

To do so, the noise bandwidths in (3.16 a & b) must be calculated considering the transfer function of the overall front-end  $Z_{FE}(s) = Z_{TIA}(s)H_{CTLE}(s)$ . Further, the effective gain needs to be calculated at the output of the overall front-end using (3.9). This will correctly account for the residual ISI in the signal presented to the receiver's latch(es). The CTLE transfer function is assumed as

$$H_{CTLE}(s) = \frac{Num(s)}{\left(\frac{s}{\omega_{n,e}}\right)^2 + \frac{s}{\omega_{n,e}Q_e} + 1}.$$
(3.20)

That is, the equalizer has two complex poles determined by  $\omega_{n,e}$  and  $Q_e$ . Both cases of real and complex zeros are considered in the numerator Num(s) which is written as

$$Num(s) = \begin{cases} \left(\frac{s}{\omega_z}\right)^2 + \frac{s}{\omega_z Q_z} + 1 & for \ two \ complex \ zeros \\ or \\ \frac{s}{\omega_z} + 1 & for \ a \ single \ real \ zero \end{cases}$$
(3.21)

where  $\omega_z$  and  $Q_z$  are the zero frequency and the quality factor, respectively. For perfect pole-zero cancellation,  $\omega_z$  is chosen to be equal to the natural pulsation frequency of the limited-bandwidth TIA (*i.e.*,  $\omega_z = \omega_n$ ). Moreover,  $\omega_{n,e} = 2\omega_n$  is assumed. That is, an equalizer with complex zeros that perfectly match the TIA's complex poles restores the bandwidth by a factor of 2x if the TIA

and the equalizer have the same pole quality factor. The bandwidth extension factor becomes less than 2x in the case of using a CTLE with a single zero. The integral coefficients ( $I_{n0}$  and  $I_{n2}$ ) that convert the 3dB bandwidth of the overall front-end to the corresponding noise bandwidths, bandwidth extension factor ( $\chi$ ), and amplitude peaking in the overall response are shown in Table 3.3 for different equalizer designs and TIA with Q = 0.707. It is not surprising that the first row in Table 3.3 has the same integral coefficients  $I_{n0}$  and  $I_{n2}$  as the full-bandwidth TIA with Q =0.707 in Table 3.1.

The data in Table 3.3 are used to plot the input-referred noise of the CTLE-based receiver as shown in Fig. 3.10 where the horizontal axis represents the *overall* front-end (TIA/CTLE) bandwidth-to-data rate ratio. The front-end that consists of a limited-bandwidth TIA and a complex-zeros equalizer has nearly the same noise performance as the full-bandwidth TIA with Q = 0.707 (this statement is further quantified later). The minimum noise occurs for  $f_{3dB} = 0.4f_{bit}$ , equal to the case where no equalizer is used (black curve in Fig. 3.5). Note that in this case, the TIA's bandwidth is  $0.2f_{bit}$  and it is extended to  $0.4f_{bit}$  by the equalizer. When the limited-bandwidth TIA is followed by a single-zero equalizer (second row in Table 3.3), the overall front-end has a higher-order amplitude response and steeper high-frequency roll-off. This filters out more high-frequency noise which pushes the optimum  $f_{3dB}/f_{bit}$  to a higher-order amplitude response adds more pre-cursor ISI which degrades the noise performance for  $f_{3dB}$  lower than the optimum point. Further increase in  $Q_e$  to 0.8 (third row in Table 3.3) has insignificant impact on the noise performance.

To investigate the notion that the CTLE amplifies the high-frequency noise, the input-referred noise power is plotted as a function of  $f_{3dB}/f_{bit}$  for full-bandwidth TIA (FBW-TIA) and CTLE-based front-end with complex-zeros equalizer. In this simulation, both full- and limited-bandwidth TIAs have pole Q of 0.707. The equalizer is assumed to be noiseless and is designed as shown in the first row in Table 3.3 with  $\omega_z = \omega_n$  and  $\omega_{n,e} = 2\omega_n$ . The results are shown in Fig. 3.11 which reveals the following 1) the colored-noise contributions are identical in both cases (red and black square-markers). This *negates* the claim that the CTLE amplifies the high-frequency noise compared to the conventional FBW-TIA. A more accurate way to describe the noise behavior of a CTLE is that compared to a DFE-based receiver, the CTLE extends the noise bandwidths from

| CTLE design     | $Q_z$ | $Q_e$ | X   | I <sub>n0</sub> | <i>I</i> <sub><i>n</i>2</sub> | Amplitude<br>peaking |
|-----------------|-------|-------|-----|-----------------|-------------------------------|----------------------|
| 2-complex zeros | 0.707 | 0.707 | 2   | 1.11            | 1.49                          | 0 dB                 |
| 1-real zero     | 0.5   | 0.707 | 1.4 | 1.16            | 1.25                          | 0.77 dB              |
| 1-real zero     | 0.5   | 0.8   | 1.5 | 1.18            | 1.23                          | 0.97 dB              |

Table 3.3: Integral coefficients, bandwidth extension factor ( $\chi$ ) and amplitude peaking for CTLE-equalized receiver with TIA Q = 0.707.

being a function of the low-bandwidth TIA's bandwidth to being a function of the combined TIA/CTLE bandwidth. 2) the white-noise power in the CTLE-based front-end is reduced by a factor of  $\chi^2$  ( $\chi = 2$  in this simulation) compared to its counterpart in the FBW-TIA. This reduction can be explained as follows: the bandwidth of the TIA in the CTLE-based front-end is reduced by a factor of  $\chi^{-1}$  compared to the bandwidth of the FBW-TIA. This in turns allows the low-bandwidth TIA to employ  $\chi^2$  higher feedback resistor which improves the  $\beta_1$  term in (3.11). In the equalized front-end, despite the significant improvement in the white noise, the total noise power is reduced by less than 21 % at the minimum point. This is because the noise is dominated by the  $f^2$ -noise. These findings coincide with the analysis in [23].



Fig. 3.10. Input-referred noise of the CTLE-based receiver. The horizontal axis represents the bandwidth of the *overall* front end to the data rate ratio. The TIA pole Q is set to 0.707 and its  $f_{3dB}$  is swept according to the procedure and values in Table 3.2. The CTLE different designs are summarized in Table 3.3.



Fig. 3.11. The input-referred noise power as a function of the bandwidth-to-data rate ratio for both full-bandwidth TIA and CTLE-equalized front-end. For the latter, the horizontal axis represents the bandwidth of the *overall* front end to the data rate ratio. In this simulation, both full- and limited-bandwidth TIAs have pole Q of 0.707 and the equalizer is assumed to be noiseless and designed as shown in the first row in Table 3.3 with  $\omega_z = \omega_n$  and  $\omega_{n,e} = 2\omega_n$ .

Fig. 3.12 shows an early implementation of the FFE-based optical receiver where the input current is integrated over the total input capacitance  $C_T$  producing an input voltage  $(V_{IN})$  [17]. This voltage is the sum of the incoming signal and the voltage of this node at the end of the previous UI. The integrating receiver samples the voltage  $V_{IN}$  every UI then compares every two consecutive samples to resolve the current bit. That is, if  $\Delta V_{IN} = V_{IN}[N] - V_{IN}[N-1]$  is positive a "1" is resolved otherwise the incoming bit is a "0". The process of double-sampling and differencing is a single-tap FFE that implements the function  $(1 - z^{-1})$ . To function properly, this receiver requires data encoding to limit the number of consecutive identical digits (CIDs). That is, a long run of CIDs causes  $V_{IN}$  to develop toward one of the supply rails which alters the receiver DC biasing. This limitation is avoided in Fig. 3.13 [18] by adding a resistor (R) between the input node and the supply voltage. The time constant  $\tau = RC_T$  is much greater than the UI which limits the integration gain and prevents the out-of-range input due to the long CIDs. However, the insertion of the resistor makes the double-sampled voltage  $\Delta V_{IN}$  input dependent as shown in Fig. 3.13. That is, a "1" following a long run of "0" generates a larger  $\Delta V_{IN}$  than a "1" following a long run of "1". A dynamic offset modulation (DOM) is introduced to address this problem and deliver a constant voltage difference  $\Delta V'_{IN}$  to the input of the comparator.

The sensitivity of the receivers in Fig. 3.12 and Fig. 3.13 is inversely proportional to the total capacitance at the input node. However, charge sharing between the photodiode capacitance and the sampling capacitors  $C_S$  limits receiver performance. To avoid the problem of charge sharing, the double-sampling receiver in Fig. 3.14 [19] [20] employs a low-bandwidth TIA to provide isolation between the photodiode's capacitor and the sampling capacitors. This allows the use of ultra-low capacitance photodiodes available in scaled silicon-photonic technologies. Also, it allows the use of a bigger sampling capacitor (even comparable to the PD's capacitance) to mitigate the  $kT/C_S$  noise. The TIA in Fig. 3.14 is designed to have a first-order amplitude response with dominant pole at the output node. This is achieved thanks to the combination of the advanced 28 nm CMOS technology and the ultra-low capacitance provided by the silicon-photonic photodiode (the total capacitance due to the photodiode, bond-wire, and pad is less than 30 fF).



Fig. 3.12. Integrating double-sampling receiver and its waveform [17].



Fig. 3.13. RC double-sampling receiver and its waveform [18].



Fig. 3.14. Double-sampling receiver employing a low-BW TIA to avoid the charge sharing problem [19], [20].



Fig. 3.15. Block diagram representation of the FFE-based receivers.

The work in [20] presents a thorough noise analysis to calculate the receiver sensitivity considering all of the noise sources in the receiver. However, it is not clear how the input-referral gain of the low-bandwidth TIA (denoted by  $R_{LBW,TIA}$ ) is calculated. From the paper,  $R_{LBW,TIA}$  appears to be the TIA's feedback resistor value, which is approximately equal to the DC gain. However, the effective gain of the TIA for the targeted  $f_{3dB}/f_{bit}$  would be approximately one-third of  $R_{LBW,TIA}$ . This will underestimate the input noise. On the other hand, [20] assumes the input to the latch sees two independent samples of amplified TIA noise. However, the two samples are from the same signal, offset in time by  $T_b$ . Due to the limited bandwidth of the front-end, they exhibit correlation which makes the rms noise of  $v_n - v_{n-1} < \sqrt{2}v_{n,rms}$ . By considering the transfer function of the equalizer and the TIA and calculating the rms noise at the input of the latch, we consider the effective gain and the correlation of consecutive samples of noise.

Another attempt to get around the charge sharing problem is proposed in [21] where the doublesampler is replaced by a continuous-time (CT) delay. The CT-FFE-based receiver consists of a first-order inverter-based TIA, a CT delay cell, and a differencing amplifier. Unlike the work in [20], the TIA in [21] is designed with a dominant pole at the input node.

All the previously described FFE-based receivers can be modeled by the front-end in Fig. 3.15 where the output of the TIA is processed by a transfer function

$$H_{FFE}(s) = 1 - \alpha e^{-sT_b}.$$
(3.22)

This function is equivalent to a 1-tap FFE where  $\alpha$  is the tap coefficient. Starting with a lowbandwidth TIA having a pole Q of 0.5, Fig. 3.16 shows the pulse response at different points in the front-end in Fig. 3.15. Because of the second-order nature of the TIA, its pulse response peaks at a time  $T_1 > UI$ . The delayed pulse is delayed by only one UI, meaning it has a non-zero value at  $T_1$ . This non-zero value will be subtracted from the main-cursor sample. Therefore, the strength of the delayed pulse ( $\alpha$ ) must be carefully chosen to optimize the trade-off between the maincursor reduction and residual 1<sup>st</sup> post-cursor ISI (see Fig. 3.16). Moreover, if  $\alpha$  is selected to fully cancel the 1<sup>st</sup> post-cursor ISI, the delayed pulse will go above the TIA's pulse for the rest of the tail, causing an over equalization for n > 1 post-cursor ISI. The noise performance of the FFE-based receivers can be examined utilizing the model in Fig. 3.15 where the noise bandwidths are calculated from this model as

$$BW_{n0} = \frac{1}{|Z_{FE}(0)|^2} \int_0^\infty |Z_{TIA}(f)H_{FFE}(f)|^2 df \qquad (3.23.a)$$

$$BW_{n2}^{3} = \frac{3}{|Z_{FE}(0)|^{2}} \int_{0}^{\infty} |Z_{TIA}(f)H_{FFE}(f)|^{2} f^{2} df \qquad (3.23.b)$$

where  $Z_{FE}(0)$  is the midband gain of the overall front-end and given by

$$Z_{FE}(0) = Z_{TIA}(0)(1-\alpha).$$
(3.24)

Therefore, the total integrated input-referred noise power of the FFE-based front-end is given by

$$\overline{\iota_n^2} = \beta_1 B W_{n0} \left| \frac{Z_{FE}(0)}{Z_{e,FFE}} \right|^2 + \frac{\beta_2}{3} B W_{n2}^3 \left| \frac{Z_{FE}(0)}{Z_{e,FFE}} \right|^2.$$
(3.25)

The noise bandwidths from (3.23) are used to plot the input-referred noise current as a function of the overall front-end's bandwidth-to-data rate ratio. The effective gain ( $Z_{e,FFE}$ ) is calculated from the VEO at the input of the latch by (3.9). The results are shown in Fig. 3.17 where the lowbandwidth TIA is designed to have Q = 0.5. For a given bandwidth, increasing  $\alpha$  from 0 (no equalization) to 0.35 improves the VEO and hence the noise performance of the front-end. Further increase in  $\alpha$  causes the output pulse to be over equalized and reduces its main-cursor as described above. This is shown in Fig. 3.17 where the case of  $\alpha = 0.6$  shows worse noise performance than  $\alpha = 0.35$ . For a given  $\alpha$ , the input noise improves as the bandwidth decreases, until a point where further reduction in bandwidth results in significant pre- and residual 1<sup>st</sup> post-cursor ISI. At this point, the effective gain starts to drop, causing the input noise to increase.

If Q is increased above 0.5, the TIA's pulse response exhibits an overshoot that causes the postcursor ISI to have positive and negative values. This leads to a sign error between the tails of the TIA's pulse and the delayed pulse. It also increases the pre-cursor ISI which is not removable by the FFE. The simulations show that increasing Q above 0.5 is not beneficial from the noise point of view. The noise performance is also examined for the FFE-based receiver in the case where the TIA has two real distinct poles. To do so, the TIA is designed to have a dominant input pole with a nondominant-to-dominant pole ratio (*PR*) of



Fig. 3.16. Pulse response at different points in the front-end shown in Fig. 3.15. The low-BW TIA has pole Q of 0.5,  $R_F = 9.7 \ k\Omega$  and  $f_{3dB} = 0.2 f_{bit}$ .



Fig. 3.17. Input-referred noise current of the FFE-based front-end shown in Fig. 3.15. The horizontal axis represents the 3dB bandwidth of the overall front-end. The TIA's bandwidth is changed according to the procedure and values in Table 3.2 while its pole Q is kept constant at 0.5.



Fig. 3.18. The noise performance of the FFE-based receiver in the case where the TIA has two real distinct poles. The horizontal axis represents the 3dB bandwidth of the overall front-end. The TIA's bandwidth is changed according to the procedure and values in Table 3.2 while its pole Q is shown in the legend.

Chapter 3. Noise Analysis and Design Considerations for Equalizer-Based Optical Receivers

$$PR = \frac{\omega_{non,d}}{\omega_d} = \frac{1 + \sqrt{1 - 4Q^2}}{1 - \sqrt{1 - 4Q^2}} = 1, 3, 5 \text{ or } 9.$$
(3.26)

This is equivalent to Q = 0.5, 0.433, 0.373 and 0.3, respectively. For each value of *PR*, the inputreferred noise is plotted as a function of the bandwidth-to-data rate ratio for  $\alpha$  ranging from 0 to 1. The curve that shows the deepest minimum noise for each *PR* value is shown in Fig. 3.18. The case of (*PR* = 1 &  $\alpha = 0.35$ ) is the same as ( $Q = 0.5 \& \alpha = 0.35$ ) in Fig. 3.17. It can be observed that increasing the pole separation (*PR*) while selecting the proper amount of equalization ( $\alpha$ ) improves the noise performance of the FFE-based receiver. This explains why all FFE-based reported work in the literature employs a first-order TIA. Wider pole separation (*PR* > 9) does not provide more improvement in the noise performance.

#### **3.5 Comparison and Discussion**

#### 3.5.1 Noise Bandwidths

Fig. 3.19 shows the best-case noise performance for each described receiver architecture. For each receiver, design variables, bandwidth, and input-referred noise current values at the minimum noise point are listed in Table 3.4. For the noise-optimized full-bandwidth TIA, the pole *Q* is set to 0.707 [16]. Two remarkable conclusions can be drawn from Fig. 3.19. First, all equalization techniques have a limited capability in improving the noise performance of the receiver. Each line in Fig. 3.19 decreases as the front-end bandwidth is reduced, reaching a minimum point. As the bandwidth is further reduced, the noise starts to increase due to the residual ISI outpacing improvements in pulse height. Second, the DFE-based receiver shows the deepest minimum noise and this minimum occurs at lower bandwidth relative to other minima. The reason for this is the DFE-based receiver has the smallest noise bandwidth which is determined by the 3-dB bandwidth of the *overall* front-end which, due to the equalizer, is larger than the TIA's bandwidth.

# 3.5.2 Simulation at Higher $f_{bit}$ and $C_D$

As described in Table 3.2, the proposed noise optimization model starts with three initial inputs  $(f_T, f_{bit} \text{ and } C_D)$  and provides the optimum pole locations and optimum  $f_{3dB}/f_{bit}$  depending on the front-end architecture. So far, all simulations are performed for constant values of  $f_T$ ,  $f_{bit}$  and  $C_D$  as shown in Table 3.2. The proposed noise analysis provides a general methodology for calculating the receiver sensitivity with more accuracy considering the gain reduction due to the ISI and reduced settling time. That is, in the equalizer-based receivers, the input-referral gain needs to be calculated from the pulse response regardless of the exact values of  $f_T$ ,  $f_{bit}$  and  $C_D$ . However, to investigate the effect of changing these initial values on the optimum design points listed in Table 3.4, all simulations are repeated for two scenarios. First,  $f_{bit}$  is increased to 30 Gb/s while keeping the values of  $f_T$  and  $C_D$  constant at 150 GHz and 80 fF, respectively.
Chapter 3. Noise Analysis and Design Considerations for Equalizer-Based Optical Receivers



Fig. 3.19. The best-case noise performance for each receiver architecture. The horizontal axis represents the bandwidth of the overall front-end. The TIA's pole Q and equalizer design for each curve are listed in Table 3.4.

| Optimum            |       | TIA                 |                     | f <sub>3dB</sub> | $/f_{\rm bit}$ | i                   |  |
|--------------------|-------|---------------------|---------------------|------------------|----------------|---------------------|--|
| Architecture       | Q     | $R_A$ (k $\Omega$ ) | $R_F$ (k $\Omega$ ) | TIA              | FE             | $(\mu A_{\rm rms})$ |  |
| FBW- TIA           | 0.707 | 0.26                | 5.45                | 0.41             | NA             | 0.293               |  |
| TIA + 2-tap DFE    | 0.577 | 0.375               | 18.3                | 0.18             | 0.18           | 0.192               |  |
| TIA + IIR DFE      | 0.707 | 0.5                 | 21.2                | 0.21             | 0.21           | 0.183               |  |
| $TIA + CTLE^{(1)}$ | 0.707 | 0.3                 | 7.3                 | 0.35             | 0.49           | 0.246               |  |
| $TIA + FFE^{(2)}$  | 0.3   | 0.25                | 32.3                | 0.06             | 0.32           | 0.24                |  |

Table 3.4: Optimum design point for different receivers

<sup>(1)</sup>CTLE design according to the second row in Table 3.3.

<sup>(2)</sup> PR = 9 and  $\alpha = 0.7$ 

In the second scenario,  $f_T$  and  $f_{bit}$  are kept constant at 150 GHz and 10 Gb/s, respectively while  $C_D$  is increased to 200 fF. Increasing  $C_D$  has an effect of increasing  $C_I$ ,  $C_T$ ,  $g_m$  and  $C_L$  by the same factor as indicated in Table 3.2. The best-case noise performance of each receiver architecture in the increased  $f_{bit}$  and  $C_D$  scenarios are respectively shown in Fig. 3. 20 and Fig. 3. 21. Comparing these two figures along with Fig. 3.19 and Table 3.4 shows that changing the initial values  $f_{bit}$  and  $C_D$  only scales the vertical axes but does not change any conclusion about optimum pole Q or optimum  $f_{3dB}/f_{bit}$ .



Fig. 3. 20. The best-case noise performance for each receiver architecture at  $f_{bit} = 30 \ Gb/s$ . The horizontal axis represents the bandwidth of the overall front-end to the data rate ratio. The optimum value of the TIA's pole Q is found to be 0.707 for the IIR-DFE and CTLE-based front-ends (the CTLE is designed as in the second row in Table 3.3). While the optimum values of the TIA's pole Q of the 2-tap FIR-DFE and FFE-based front-ends are found to be 0.577 and 0.3 (and  $\alpha = 0.7$ ), respectively.



Fig. 3. 21. The best-case noise performance for each receiver architecture at  $C_D = 200 fF$ . The horizontal axis represents the bandwidth of the overall front-end to the data rate ratio. The optimum value of the TIA's pole Q is found to be 0.707 for the IIR-DFE and CTLE-based front-ends (the CTLE is designed as in the second row in Table 3.3). While the optimum values of TIA's pole Q of the 2-tap FIR-DFE and FFE-based front-ends are found to be 0.577 and 0.3 (and  $\alpha = 0.7$ ), respectively.

### **3.6 Conclusions**

This Chapter presented general guidelines for noise optimization in equalizer-based optical receivers. The proposed optimization model allows designers to compare the noise performance of different receiver architectures for a given technology, photodiode capacitance, and data rate. Key modifications are introduced to correctly calculate the input-referral gain and noise bandwidths. The proposed notion of the effective gain accounts for the gain reduction due to the introduced ISI and insufficient settling time in narrow-bandwidth front-ends. The proposed calculation of the noise bandwidths considers how the TIA's noise is processed by the subsequent equalizer. Based on this model, the integrated input-referred noise is derived and compared for front-ends using DFEs, CTLEs, and FFEs. In each case, the TIA's pole Q is chosen to optimize the noise performance depending on the receiver architecture. It has been shown that DFEs enable the lowest input-referred noise. The optimum design point of all receivers is summarized in Table 3.4. Simulations showed that conclusions about optimum Q and optimum  $f_{3dB}/f_{bit}$  are robust against changing the data rate and photodiode capacitance for all receiver architectures.

This work is published in the IEEE Transactions on Circuits and Systems I: Regular Paper [28].

## Chapter 4

# Optimization of the Power-Sensitivity Trade-off in CMOS Receivers for Energy-Efficient Short-Reach Optical Links

## 4.1 Introduction

Fig. 1.2 in the first chapter of this thesis shows the system-level diagram of a vertical-cavity surface-emitting laser (VCSEL)-based multi-mode fiber (MMF) optical link typically used for short-reach (up to a few 100 m) communication. The link operation is explained in Chapter 1. In short-reach photonic links, the transmitted optical modulation amplitude (OMA) must be sufficiently large that despite coupling and fiber losses, the received optical power exceeds the receiver's sensitivity. Better sensitivity reduces transmitter power dissipation. However, improving the sensitivity can incur significant power overhead in the receiver. Therefore, the power-sensitivity trade-off in optical receivers needs to be optimized to minimize the link's total power dissipation.

Sensitivity is a function of both the input-referred noise current of the analog front-end (TIA/MA) of the receiver and the voltage amplitude requirements of the CDR driven by the front end [8]. The input-referred noise of optical receivers with a FET front-end is usually minimized by choosing the receiver's input capacitance ( $C_I$ ) equal to the total parasitic capacitance from the PD, pad, and wiring ( $C_D$ ) [16]. The receiver's power dissipation is proportional to its transistor size and hence input capacitance. Therefore, maintaining the capacitive matching rule for high values of  $C_D$  leads to a significant power overhead in the receiver for a marginal improvement in the input-referred noise. The increased total input capacitance ( $C_T = C_D + C_I$ ) also restricts the TIA's maximum achievable gain for a targeted bandwidth [7]. This in turn necessitates cascading more MA stages to mitigate the power penalty incurred by the swing requirements of the CDR, further increasing power dissipation.

In this Chapter, we show that energy-efficient links require low-power receivers with input capacitance much smaller than that required for noise-optimum performance. The TIA's transistor sizes not only set the power dissipation and sensitivity of the receiver, but also set the transmitted optical power. Thus, transmitter power dissipation must be accounted for accurately in considering a noisier yet lower power receiver. Co-optimization of the transmitter and the receiver is essential to achieve optimum energy-efficiency for the overall link.

The main challenges for transceiver co-optimization are intuitively discussed in [29]. In [12], [30]- [31], co-optimization is performed on actual links by changing supply voltages and/or bias currents to achieve the best link energy-efficiency at a given data rate and bit-error rate (BER). In [32], the trade-offs that set the limit for the receiver sensitivity are analyzed. Then, the energy-efficiency of the link is calculated using state-of-the-art photonic devices and laser drivers. The end-to-end link modeling in [33] optimizes receiver sensitivity and power by studying their dependence on front-end design as well as follow-on digital sampler requirements.

The on-bench optimization in [12], [30]-[31], is the most accurate methodology. However, input capacitance is not adjustable post-fabrication. Equation-based approaches in [32]- [33] tend to make idealized approximations and assumptions to develop the models which introduce modeling inaccuracies. In this thesis, a simulation-based design flow for optimization of energy-efficiency of short-reach photonic links is presented. The design framework, based on extracted parameters, selects the optimum FET size, the number of MA stages, and transmitted OMA for minimum link power dissipation. It considers both frequency- and time-domain representation to accurately model the impact of design parameters on signal integrity. Transistor-level Spectre simulations confirm the accuracy of the framework.

The rest of this chapter is organized as follows: Section 4.2 discusses receiver modeling and revisits the analysis of the inverter-based TIA. Section 4.3 investigates the power-sensitivity tradeoff for various receiver architectures, showing that maintaining the capacitive matching rule leads to increased power dissipation for only marginal improvement in sensitivity. Section 4.4 models the transmitter side of the optical link and discusses the link budget. The optimization procedure is presented in Section 4.5 and then used to study how small (noisy) the receiver should become to minimize the link's total power dissipation. Section 4.6 discusses the impact of improvements of photonic, interconnect, and CMOS technology on the link performance. Finally, Section 4.7 concludes the work.

## 4.2 Optical Receiver Modelling

#### 4.2.1 Transimpedance Amplifier

The inverter-based (Inv)-TIA in Fig. 4.1 (a) is chosen for this work due to its superior noise performance and its moderate power dissipation due to the current-reuse between the PMOS and the NMOS transistors. Further, unlike the common-gate (CG) TIA, the Inv-TIA is a self-biased topology that decouples the gain element from the transconductance of the input transistor and allows for optimization without being limited by the DC bias constraint. The Inv-TIA is extensively used in recent research either as a wideband pre-amplifier followed by a multi-stage MA [10], [12], and [34] or as a limited-bandwidth pre-amplifier followed by an equalizer [20] [23] [25] [35].

#### 4.2.2 Small-Signal Model

The small-signal model of the Inv-TIA is depicted in Fig. 4.1 (b). The CMOS inverter is modeled by its total trans-conductance  $g_m$ , and equivalent output resistance  $r_{ds}$ .  $C_D$  includes the photodiode, wiring and pad capacitance.  $C_{gs}$ , and  $C_f$  are the total gate-to-source, and the gate-to-drain capacitance  $C_{gd}$ , respectively. The capacitance  $C_o$  includes the total drain-to-bulk capacitance  $C_{db}$ and the loading capacitance of the subsequent stage  $C_{next}$ . Therefore, the open-loop transfer function of the voltage amplifier can be written as  $A(s) = A_0/(1 + s/2\pi T_A)$ , where  $A_0 = g_m r_{ds}$ is the low-frequency voltage gain of the core amplifier and  $T_A = r_{ds}C_o$  is the time constant at the output node. For a particular technology,  $A_0$  is constant for a given supply voltage and  $W_p$  to  $W_n$ ratio. Considering this model, the Inv-TIA exhibits a second-order transfer function given by

$$Z_{TIA}(s) = \frac{\left(R_{F,TIA}C_{f}s + 1 - g_{m}R_{F,TIA}\right)r_{ds}}{D_{1}s^{2} + D_{2}s + A_{0} + 1}$$
(4.1.*a*)

where

$$D_{1} = R_{F,TIA} r_{ds} (C_{f} C_{o} + C_{i} C_{o} + C_{i} C_{f})$$
(4.1.b)

$$D_2 = R_{F,TIA} \left( (1 + A_0)C_f + C_i \right) + r_{ds}(C_o + C_i)$$
(4.1.c)

where  $C_i = C_D + C_{gs}$ . Therefore, the low-frequency transimpedance gain is given by

$$Z_{TIA,0} = \frac{-(g_m R_{F,TIA} - 1)r_{ds}}{A_0 + 1}$$
(4.2)

Comparing the denominator of (1) with the standard transfer function of a second-order system, the natural frequency  $\omega_n$  and the pole quality factor Q can be calculated. The TIA's 3dBbandwidth ( $f_{TIA}$ ) is calculated as  $f_{TIA} = \rho(Q)\omega_n/2\pi$ , where  $\rho$  is given in (3.6). It is a function of the pole quality factor and is used to convert the natural frequency to the corresponding 3 dB bandwidth based on the shape of the TIA's amplitude response [16].



Fig. 4.1. Inv-TIA (a) circuitry, (b) small-signal model with noise sources.

Due to the pole-splitting effect introduced by  $C_f$ , the TIA's effective input and output capacitances differ from  $C_{gs}$  and  $C_o$ . They are respectively calculated as  $C_I = C_{gs} + (1 + A_0)C_f$ , and  $C_L = [C_iC_o + (C_i+C_o)C_f]/[C_i + (1 + A_0)C_f]$ . This means that the input capacitance  $C_I$  is much larger than the gate-to-source capacitance due to the well-known Miller effect and  $C_L$  being smaller than  $C_o$ . Ignoring  $C_f$  oversimplifies the model and may lead to inaccurate outcomes [11].

Although the model includes many variables, parasitic capacitances  $C_{gs}$ ,  $C_{db}$  and  $C_{gd}$ , the transconductance  $g_m$ , and the output conductance  $r_{ds}^{-1}$  are proportional to transistor width (*W*). Therefore, the TIA's design space is defined by only three variables:  $R_{F,TIA}$ ,  $C_D$  and *W*. The number of variables can be further reduced by fixing  $C_D$  at 200*fF*. The effect of changing  $C_D$  is studied in Section 3.5.

The parameters of a CMOS inverter with  $C_{next} = C_I$  are extracted through simulation using Cadence Spectre and listed in Table 4.1. The circuit is simulated in 65 nm CMOS with 1 V supply and biased at  $V_{IN} = V_{OUT} = 0.44$  V. The biasing point is slightly less than  $V_{DD}/2$  because PMOS and NMOS transistors have equal width ( $W_p = W_n = 1 \ \mu m \times NF$ ) where NF is the number of fingers. The equal sizing strategy maximizes the total transconductance for a given total width ( $W = W_p + W_n$ ) [36]. Using NF as a proxy for parasitic capacitances, transconductance, and output resistance allows the TIA's bandwidth, sensitivity, and power dissipation to be calculated.

#### 4.2.3 Bandwidth and Transimpedance Gain

In Fig. 4.2 (a)  $R_{F,TIA}$  is swept for three different values of NF to calculate the TIA's 3 dB bandwidth ( $f_{TIA}$ ). For each NF value, the corresponding parameters are calculated from Table 4.1 then used with  $R_{F,TIA}$  to calculate the bandwidth using (1). Points with amplitude peaking (Q > 0.707) are indicated by hollow markers. For a given NF, the bandwidth is reduced toward larger  $R_{F,TIA}$  due to the direct trade-off between the bandwidth and the gain. For a targeted bandwidth, R<sub>F,TIA</sub> needs to be reduced for too large and too small values of NF, indicating that there is an optimum value for NF that maximizes the gain for a fixed  $f_{TIA}$ . For example, in Fig. 4.2 (b) the required  $R_{F,TIA}$  and the resulting pole Q are plotted as a function of  $C_I/C_D$  for  $f_{TIA} =$ 8 GHz. For a very narrow front-end ( $C_I \ll C_D$ ), the total output capacitance  $C_L$  is much smaller than  $C_D$  while the total input capacitance  $C_T$  is dominated by the parasitic capacitance  $C_D$ . This gives the Inv-TIA two real poles (*i.e.*, Q < 0.5) with the input pole at lower frequency. As the transistor width increases,  $C_L$  increases while  $C_T$  is still dominated by  $C_D$ . As a result, the TIA exhibits an underdamped response with Q > 0.5. Increased Q allows the TIA to employ higher  $R_{F,TIA}$  for a fixed  $f_{TIA}$ . As the width continues to increase, the self-loading from  $C_f$  forces the pole Q to drop which necessitates reducing  $R_{F,TIA}$  to maintain the targeted bandwidth [11]. The gain from (4.2) is also plotted in Fig. 4.2 (b) and it follows the shape of  $R_{F,TIA}$ . The gain reaches a maximum value of 384  $\Omega$  at  $C_I/C_D$  of 0.48 compared to a gain of 330  $\Omega$  at  $C_I/C_D = 1$ .

| Parameters that linearly depend on NF |                                                     |          |       |               |       |  |  |  |  |  |
|---------------------------------------|-----------------------------------------------------|----------|-------|---------------|-------|--|--|--|--|--|
| $g_m$                                 | $g_m$ $C_{gs}$ $C_{gd}$ $C_{db}$ $r_{ds}$           |          |       |               |       |  |  |  |  |  |
| 1.45                                  | 1.39                                                | 0.37     | 0.45  | 4.31          | 0.098 |  |  |  |  |  |
| $m\Omega^{-1}/\mu m$                  | fF/µm                                               | fF/µm    | fF/μm | <u>Ω</u> . μm | mW/µm |  |  |  |  |  |
| Pa                                    | Parameters that depend on the biasing but not on NF |          |       |               |       |  |  |  |  |  |
| $A_0$                                 |                                                     | $f_T$    |       |               |       |  |  |  |  |  |
| 6.23 V/V                              |                                                     | 57.3 GHz |       |               |       |  |  |  |  |  |

Table 4.1: Extracted parameters of a replica-loaded CMOS inverter with  $W_p = W_n = 1 \mu m * NF$ , simulated in 1V-65nm CMOS technology.

 $A_0$ : Low-frequency voltage gain of the core amplifier

 $f_T$ : Transit frequency at the biasing point

 $P_{DC,1\mu m}$ : DC power dissipation of an inverter with  $W_p = W_n = 1 \ \mu m$ .



Fig. 4.2. (a) Inv-TIA bandwidth as a function of  $R_F$  for a given total transistor width W (b) The required  $R_F$  and the resulting gain and pole Q as a function of  $C_I/C_D$  for a targeted bandwidth of 8 *GHz*.

#### 4.2.4 Input-Referred Noise Current

As explained in the previous chapter, the main noise contributors in the Inv-TIA are the thermal noise of the transistors and feedback resistor, depicted in Fig. 4.1 (b) as  $I_{n,ch}^2$  and  $I_{n,RF}^2$ , respectively. The total integrated input-referred noise power  $i_n^2$  is determined by [16]

$$i_n^2 = \left(\frac{4kT}{R_F} + \frac{4kT\gamma}{g_m R_F^2}\right) BW_{n0} + \left(\frac{4kT\gamma(2\pi C_T)^2}{3g_m}\right) BW_{n2}^3$$
(4.3)

where k is the Boltzmann constant, T is the temperature in Kelvin and  $\gamma$  is the excess noise factor.  $BW_{n0} = \pi Q f_{TIA}/2\rho$ ,  $BW_{n2}^3 = 3\pi Q f_{TIA}^3/2\rho^3$  are the noise bandwidths for white and colored noise, respectively [16]. The root mean-squared input-referred noise current is the square-root of (4.3). Fig. 4.3 (a) shows  $i_{n,rms}$  as a function of  $C_I/C_D$  for a TIA bandwidth of 8 GHz. Setting  $\gamma =$ 0.75 achieves the best match between model-generated and circuit-simulated noise. The noise current reaches a minimum value of 0.91  $\mu$ A<sub>rms</sub> at  $R_{F,TIA} = 397 \Omega$  and  $C_I/C_D = 1$ , showing good agreement with the capacitive matching rule. However, simulation results show that the noiseoptimum size depends on the 3 dB bandwidth. For example, at  $f_{TIA} = 12.5$  GHz, the noiseoptimum size is  $C_I = 1.25C_D$ .

The capacitive matching rule in [16] is reached under assumptions of constant  $R_{F,TIA}$  and constant pole Q which can be approximated as  $\sqrt{A_0R_{F,TIA}C_TT_A}/(R_{F,TIA}C_T + T_A)$ . When the TIA is sized up, large  $R_{F,TIA}$  makes  $R_{F,TIA}C_T \gg T_A$ . Therefore, maintaining a constant Q requires both  $A_0$  and  $T_A$  to increase. Practically, this is not feasible since the voltage gain of a single-stage CMOS inverter is constant for a given biasing and its maximum value is limited by the technology node. In this work, when the TIA is sized up,  $R_{F,TIA}$  is chosen to satisfy the required bandwidth under a constant  $A_0$  constraint. This makes both the resulting Q and the noise-optimum size depend on the bandwidth.



Fig. 4.3. (a) TIA's input-referred noise current as a function of  $C_I/C_D$  for a fixed 3dB bandwidth of 8 GHz. (b) Receiver sensitivity as a function of  $C_I/C_D$  for a FE that includes only a TIA.  $f_{TIA}$  and  $V_S^{PP}$  are fixed at 8 GHz and 50  $mV_{pp}$ , respectively. The bold markers indicate the locations of maximum gain (MG), minimum noise (MN), and best overall sensitivity (BS).

## 4.3 Receiver Sensitivity-Power Trade-Off

#### 4.3.1 Power Penalty due to the Swing Requirements of the CDR

As explained in Chapter 2, a noise-limited input signal produces a peak-to-peak output voltage of  $V_0^{PP}$  at the output of the receiver's analog front-end (FE) given by  $V_0^{PP} = SNR i_{n,rms} Z_{FE,0}$ , where SNR is the required signal-to-noise ratio for a given bit-error rate (BER). It equals 14.07 for a BER of  $10^{-12}$ .  $i_{n,rms}$  and  $Z_{FE,0}$  are the input-referred noise current and the mid-band gain of the overall FE.  $V_0^{PP}$  is sufficient to drive an ideal CDR circuit to achieve the desired BER. However, the decision circuit in a realistic CDR has a finite sensitivity and requires a certain minimum peak-to-peak input voltage swing ( $V_s^{PP}$ ) to function properly. Therefore, the FE's output voltage needs to be increased by  $V_s^{PP}$  to attain the same BER as for the ideal CDR. The receiver OMA sensitivity (in linear units) is then calculated as

$$OMA_{RX}^{sens} = \frac{SNRi_{n,rms}}{\mathcal{R}_{PD}} \left( 1 + \frac{V_S^{PP}}{SNRi_{n,rms}Z_{FE,0}} \right) \quad (Watts)$$
(4.4)

where  $\mathcal{R}_{PD}$  is the responsivity of the photodiode in A/W. Unless mentioned otherwise,  $\mathcal{R}_{PD}$  is fixed at 0.55 A/W. The first and second terms in (4.4) represent the noise-based and swing-based

sensitivities, respectively. Thus, the term between brackets represents the power penalty (PP) incurred by the swing requirements of the CDR.

In Fig. 4.3 (b), the sensitivity is plotted as a function of  $C_I/C_D$  for a front-end that includes only a TIA. In this simulation,  $f_{TIA}$  and  $V_S^{PP}$  are fixed at 8 GHz and 50 mV<sub>pp</sub>, respectively. The maximum gain (MG), minimum noise (MN), and best overall sensitivity (BS) points are indicated by bold markers and the performance at these points is summarized in Table 4.2. With no MA, the gain is limited, and the overall sensitivity is dominated by the swing requirements. As a result, the BS and the MG points are almost identical. Moving from MN to MG improves the transimpedance gain by a factor of 1.16× but worsens the input-referred noise by 1.12×. This reduces the PP due to the CDR requirements by 1.04 dB while worsening the noise-based sensitivity by 0.48 dB for a net improvement in sensitivity of 0.56 dB. Also, higher gain in the TIA is useful in suppressing the noise contribution from downstream circuits. This in addition to reducing the DC power dissipation from 4.9 mW to 2.35 mW, further motivating a reduced TIA input capacitance.

#### 4.3.2 Main Amplifier

To alleviate the PP incurred by the swing requirements of the CDR, the TIA is followed by an nstage inverter-based Cherry-Hooper (Inv-CH) main amplifier (MA). The schematic of the Inv-CH is shown in Fig. 4.4. Inv1 acts as a transconductance converter while Inv2 together with  $R_{F,CH}$ implement a transimpedance transfer function. This topology is widely adopted for various data rates and technologies [9], [10], [11], [12]. Similar to Section 4.2, the transfer function of the Inv-CH amplifier is derived taking into account the output resistance and Miller capacitance of both inverters. The voltage gain of the Inv1 is reduced due to the low input impedance of the transimpedance stage formed by Inv2 and  $R_{F,CH}$ . This in turn reduces the Miller effect from  $C_{gd}$ to the input of Inv1, minimizing the loading capacitance to the preceding stage.

Cascaded MA stages can have equal device dimensions [10], scaled-up [6] (Section 5.1.2), or inversely scaled [37] relative to the TIA's inverter, depending on the ratio of the total output capacitance to the total input capacitance. Once the scaling factor is fixed, the receiver's design space is defined by only three variables: W,  $R_{F,TIA}$ , and  $R_{F,CH}$ , assuming that  $C_D$  is still fixed at 200*fF*. Identical inverters are assumed in this work.

The sensitivity is plotted in Fig. 4.5 as a function of  $C_I/C_D$  for receiver architectures with a single-stage and a three-stage MA,  $V_S^{PP}$  of 50 mV<sub>pp</sub>, and data rate  $(f_{bit})$  of 16 Gb/s. To calculate the sensitivity for a given NF and receiver architecture,  $R_{F,CH}$  is first chosen to set the bandwidth of the MA  $(f_{MA})$  to the targeted  $f_{bit}$ . Then,  $R_{F,TIA}$  is chosen to achieve an overall receiver bandwidth  $(f_{FE})$  of  $0.5f_{bit}$ . To avoid signal distortion due to circuit nonlinearities, a constraint on the maximum peak-to-peak voltage amplitude at the output of the MA is set. Whenever this voltage exceeds 600 mV<sub>pp</sub>, the MA's gain is reduced to keep the output voltage within the permitted range. The input-referred noise current is calculated taking into consideration all noise sources from the TIA and the MA.

Fig. 4.5, both the MG and MN points are set by the TIA, staying relatively constant as the number of MA stages increases. However, more gain stages reduce the CDR's PP, which in turn moves the receiver's overall sensitivity minimum (BS) toward the noise-optimum size (MN). Therefore, the power dissipation of a sensitivity-optimized receiver increases due to the increase in both the number of stages and the per-stage power dissipation.

| Table 4.2. Woder predicted performance for various receiver architectures |              |        |        |        |            |          |                         |        |        |  |
|---------------------------------------------------------------------------|--------------|--------|--------|--------|------------|----------|-------------------------|--------|--------|--|
| f = -16  Ch/s                                                             | Inv-TIA only |        |        | Inv-TL | A + a sing | le-stage | Inv-TIA + a three-stage |        |        |  |
| $V_s^{PP} = 50 \text{ mV}_{pp}$                                           |              |        |        |        | MA         |          | MA                      |        |        |  |
|                                                                           | MG           | BS     | MN     | MG     | BS         | MN       | MG                      | BS     | MN     |  |
| $C_I/C_D$                                                                 | 0.48         | 0.5    | 1      | 0.5    | 0.65       | 0.95     | 0.5                     | 0.89   | 0.99   |  |
| Gain $(k\Omega)$                                                          | 0.3839       | 0.3837 | 0.3299 | 4.01   | 3.93       | 3.56     | 38.57                   | 34.83* | 33.44  |  |
| $i_{n,rms} (\mu A_{rms})$                                                 | 1.012        | 0.9986 | 0.9059 | 0.9165 | 0.8687     | 0.8447   | 0.974                   | 0.892  | 0.890  |  |
| PP (dB)                                                                   | 10.08        | 10.14  | 11.12  | 2.95   | 3.1        | 3.42     | 0.39                    | 0.47   | 0.49   |  |
| Noise-based Sensitivity                                                   | 15.80        | 15.05  | 16.37  | 16.32  | 16.55      | 16.68    | 16.06                   | 16.44  | 16.45  |  |
| (dBm)                                                                     | -13.89       | -15.95 | -10.57 | -10.32 | -10.55     | -10.08   | -10.00                  | -10.44 | -10.45 |  |
| Overall Sensitivity                                                       | 5.81         | 5.81   | 5.25   | 12 27  | 12 11      | 12.26    | 15.66                   | 15.07  | 15.06  |  |
| (dBm)                                                                     | -5.61        | -3.01  | -3.23  | -13.37 | -13.44     | -13.20   | -13.00                  | -13.97 | -13.90 |  |
| DC Power (mW)                                                             | 2.35         | 2.45   | 4.90   | 7.34   | 9.40       | 13.80    | 17.13                   | 30.15  | 33.58  |  |
| MG: Maximum gain BS: Best sensitivity MN: Minimum noise                   |              |        |        |        |            |          |                         |        |        |  |

Table 4.2: Model-predicted performance for various receiver architectures

\* For n = 3, the MA's bandwidth is extended to  $1.275 f_{fbit}$  to reduce its gain and satisfy the linearity constraint.



Fig. 4.4. Inv-based Cherry-Hooper MA.



Fig. 4.5. Receiver sensitivity for  $f_{bit} = 16 \ Gb/s$ ,  $V_s = 50 \ mV_{pp}$ , and various receiver architectures (a) n = 1, and (b) n = 3.

#### 4.3.3 Receiver Power Dissipation

At a fixed  $V_{DD}$  and hence fixed current density, the power dissipation of a CMOS inverter increases linearly with its input capacitance. The receiver's front-end employs an inverter for the TIA and two inverters for each MA stage. Defining the power dissipation of an inverter with  $W_p = W_n =$ 1 µm as  $P_{DC,1µm}$  and considering that all inverters are identical in device dimensions, the receiver power dissipation is calculated as

$$P_{DC,RX} = (2n+1)NFP_{DC,1\mu m} \tag{4.5}$$

where *n* is the number of main amplifier stages. Given the simulated value of  $P_{DC,1\mu m}$  in Table 4.1,  $P_{DC,RX}$  can be calculated as a function of the TIA size and the number of MA stages.

Table 4.2 shows that the energy efficiency of a noise-optimized receiver with a single-stage and a three-stage MA is 0.86 pJ/bit and 2.1 pJ/bit, respectively. As the number of gain stages increases to improve the sensitivity, the energy-efficiency becomes inadequate to meet standards that require links with 1 pJ/bit efficiency at data rates of at least 25 Gb/s [31]. Even at the best overall sensitivity point, the energy efficiency is 0.59 pJ/bit, and 1.88 pJ/bit for n = 1 and 3, respectively. On the other hand, for  $n \ge 1$ , the shallowness of the overall sensitivity curves around their minima motivates reducing the power dissipation of the receiver. For example, for n = 3, reducing transistor dimensions such that  $C_I/C_D$  is reduced from 0.89 (BS) to 0.5 (MG) decreases power dissipation from 30.15 to 17.13 mW while the sensitivity is degraded by only 0.3 dB. However, to investigate exactly how small the receiver can become before its power reduction is offset by the transmitter's increase in power requires appropriate calculations for power dissipation of transmitter circuits as well as the link budget.

## 4.4 Optical Transmitter and Link Budget

#### 4.4.1 Laser Diode

Most short-reach optical links in data centers are based on VCSELs operating at 850 nm over MMF [5]. The VCSEL is an electro-optical converter that emits optical power  $(P_{out})$  proportional to its current  $(I_v)$  as shown in Fig. 4.6 (a), approximated as  $P_{out} = \eta (I_v - I_{th})$ , where  $\eta$  is the slope efficiency in W/A and  $I_{th}$  is the threshold current.  $I_{bias}$  is the VCSEL's biasing current which is supplied by the laser driver to transmit a binary "0". The modulation current  $(I_{mod})$  is the current added above the bias current to transmit a binary "1". The peak-to-peak value of the VCSEL current is  $I_{mod}$  giving an OMA of  $\eta I_{mod}$ . The output power has a diminishing return at a current of  $I_{v,max}$  that must not be exceeded to avoid spending electrical power that is not converted into optical power. On the other hand, the lower limit of the VCSEL's current is determined by the threshold current. The more the VCSEL is biased above the threshold current the faster it becomes. The diode-shaped (V-I) characteristic of the VCSEL is illustrated in Fig. 4.6 (b). It can be approximated to  $V_v = V_{th} + R_v I_v$ , where  $V_v$ ,  $V_{th}$ , and  $R_v$  are the forward voltage, the threshold voltage, and the differential resistance, respectively. The V-I curve can be used to find the voltages  $V_{v,min}$  and  $V_{v,max}$  across the VCSEL terminals when its current is set to  $I_{bias}$  or  $I_{bias} + I_{mod}$ , respectively.

The static characteristics in Fig. 4.6 provide an intuitive understanding of the VCSEL's operation but are not sufficient to describe its dynamic behavior and inherent nonlinearity. Therefore, more accurate modeling of the VCSEL, driver, and packaging parasitics is considered later in this section.



Fig. 4.6. VCSEL characteristics (a) P-I curve (b) V-I curve. Curves are not plotted into scale.

#### 4.4.2 Laser Diode Driver

The laser diode driver (LDD) consists of two stages, the pre-driver and the driver to which the VCSEL is connected. The pre-driver decouples the large input capacitance of the driver from the signal source and provides a broadband matching with the 50  $\Omega$  environment. The main task of the driver is to provide the required current to the VCSEL. The current steering circuit in Fig. 4.7 (a) is a common implementation [12]. The circuit is a differential amplifier with one side wirebonded to the VCSEL while the other side is terminated by an on-chip dummy load. The driver is powered by  $V_{DD_{-}D}$ . The VCSEL is biased by  $V_{DD_{-}V}$  and its DC biasing current is tuned by  $I_{bias}$ . The pre-driver is usually operated in limiting mode and therefore the driver's differential input voltage  $V_{IN}$  is sufficiently large to switch the tail current  $I_0$  to either the left or right transistor as explained using the current switch model in Fig. 4.7. To transmit a binary "0", the tail current  $I_0$  in Fig. 4.7 (b) is switched to the left transistor (the dummy load side). The biasing current of the VCSEL is supplied by  $V_{DD_{-}V}$ . To avoid DC current flowing through the load resistor of the right transistor, the DC voltage of the cathode terminal of the laser diode must be fixed at  $V_{DD_{-}D}$  and therefore its anode must be raised to

$$V_{DD_{-}V} = V_{DD_{-}D} + V_{v,min}$$
(4.6)

To transmit a binary "1", Fig. 4.7 (c), the tail current is switched to the right transistor drawing current  $I_0$  from the parallel combination of  $R_D$  and  $R_v$ . The required tail current can be calculated from the modulation current as

$$I_0 = \frac{R_D + R_v}{R_D} I_{mod} \tag{4.7}$$

A small driver output resistance is required to damp any undesired ringing that can result from the supply and signal package parasitic inductance [38]. However, too small of an  $R_D$  increases the driver's power dissipation [38]. Considering this trade-off,  $R_D$  is chosen to be equal to the VCSEL's differential resistance  $R_v$  [12]. Therefore, the tail current source is equally split between the two resistors (*i.e.*,  $I_0 = 2I_{mod}$ ).



Fig. 4.7. Circuit and operation of the VCSEL driver (a) circuit, (b) current switch model to transmit a binary "0" and (c) to transmit a binary "1".

The maximum modulation current that can be supplied by the driver depends on the permitted output voltage range. Too large of an output voltage may break down the transistors but too small of an output may push the transistors into the triode region which in turn produces pulse-width distortion and jitter [8]. The output voltage changes from  $V_{DD_D}$  in the case of transmitting a logic "0" to  $V_{DD_D} - I_{mod}R_v$  in the case of transmitting a logic "1". If the output voltage is allowed to change by  $0.5V_{DD_D}$  between the two cases, then the maximum modulation current is then calculated as  $I_{mod,max} = V_{DD_D}/2R_v$ .

Although other, more power-efficient approaches to drive a VCSEL are possible [38], we consider this conventional implementation so that we pessimistically estimate transmitter power and the possible increase in transmitter power dissipation introduced when we design a receiver having slightly worse sensitivity, but significantly reduced power dissipation.

#### 4.4.3 Transmitter Power Consumption

For a DC balanced non-return to zero (NRZ) data, the DC power consumption of the transmitter including both the driver and the VCSEL can be calculated as

$$P_{DC,TX} = \frac{P_{DC,0} + P_{DC,1}}{2} \tag{4.8}$$

where  $P_{DC,0} = 2I_{mod}V_{DD_D} + I_{bias}V_{DD_V}$  and  $P_{DC,1} = I_{mod}V_{DD_D} + (I_{bias} + I_{mod})V_{DD_V}$  are the DC power required to transmit a logic "0" and "1", respectively, and  $V_{DD_V}$  is calculated by (4.6). As  $V_{DD_D}$  is set by the nominal supply voltage of the CMOS technology, the above equation reveals that the transmitter power increases at higher data rates, poorer receiver sensitivity, and less efficient optical devices.

#### 4.4.4 VCSEL and Driver Modeling

The dynamic behavior of the VCSEL is described by a second-order transfer function obtained by solving the rate equations as [39]

$$\frac{P_{out}}{I_v} = constant \frac{f_r^2}{f_r^2 - f^2 + j\left(\frac{f}{2\pi}\right)\gamma_v}$$
(4.9.*a*)

$$f_r = D_v \sqrt{I_v - I_{th}}, \qquad \gamma_v = K_v f_r^2 + \gamma_{v,0}$$
 (4.9.b)

where  $f_r$  and  $\gamma_v$  are the relaxation frequency and damping factor of the VCSEL.  $D_v$  and  $K_v$  are the D-factor and the K-factor, respectively. As the VCSEL current increases, the relaxation frequency improves, but the damping factor also increases. Therefore, the VCSEL bandwidth can be enhanced by increasing the VCSEL current until it becomes limited by the damping factor. This means that the bandwidth, instead of being fixed, becomes signal-dependent and varies as  $I_v$  changes from  $I_{bias}$  (to transmit a binary 0) to  $I_{bias} + I_{mod}$  (to transmit a binary 1). This inherent nonlinearity of the VCSEL is modeled in [39] as shown in Fig. 4.8. The description and values of different model parameters are summarized in Table 4.3. The model consists of an electrical part that accounts for the VCSEL's optical dynamics and inherent nonlinearity. The optical part of the model is a second-order RLC



Fig. 4.8. The complete model of the driver, package, and VCSEL.

circuit with signal-dependent oscillation frequency and damping factor, driven by a currentdependent voltage source. The emitted power  $P_{out}$  is measured by the voltage across the capacitor  $C_V$ . Therefore, comparing the transfer function from the voltage source to the output with (4.9) while arbitrary fixing  $C_V$  at 100 fF, allows  $R_V$ , and  $L_V$  to be calculated as a function of the current flowing through the VCSEL's junction  $(R_i)$  as given in Table 4.3.

For accurate modeling of the VCSEL, the P-I characteristics, the relation between the resonance frequency and square root of bias current above the threshold, and the relation between damping factor and the resonance frequency squared are extracted from the measured performance in [40] as polynomial functions. These functions are then used in the calculation of the model's optical parameters. A Verilog-A code is used to implement the optical part of the model, and therefore, the values of the current-dependent voltage source,  $R_V$ , and  $L_V$  are updated each simulation timestep to account for the VCSEL's signal-dependent behavior. Fig. 4.8 also shows the model of the driver's output impedance ( $R_o$  and  $C_o$ ), and packaging inductance ( $L_{pkg1}$  and  $L_{pkg2}$ ) between the driver and VCSEL chip. The model-generated P-I characteristic, and modulation response at various values of the VCSEL current are shown in Fig. 4.9 (a)-(b), respectively, excluding the effect of the driver impedance and packaging inductance. Both figures are in good agreement with the measured performance in [40] which validates the accuracy of the VCSEL model. The work in [40] is used because it provides the most complete set of measurements that allows for accurate modeling of the VCSEL.

| Parameter                     | Description                                                                                                            | Value | Unit |  |  |  |  |  |
|-------------------------------|------------------------------------------------------------------------------------------------------------------------|-------|------|--|--|--|--|--|
| VCSEL's electrical parameters |                                                                                                                        |       |      |  |  |  |  |  |
| $R_j$                         | Junction resistance                                                                                                    | 50-40 | Ω    |  |  |  |  |  |
| $C_j$                         | Junction capacitance                                                                                                   | 270   | fF   |  |  |  |  |  |
| R <sub>s</sub>                | DBR mirror resistance                                                                                                  | 35    | Ω    |  |  |  |  |  |
| R <sub>P</sub>                | Pad resistance.                                                                                                        | 1     | Ω    |  |  |  |  |  |
| $C_P$                         | Pad capacitance                                                                                                        | 10    | fF   |  |  |  |  |  |
| The sum of <i>l</i>           | The sum of $R_j$ and $R_s$ is the VCSEL's differential resistance $R_v$ .                                              |       |      |  |  |  |  |  |
| VCSEL's optical parameters    |                                                                                                                        |       |      |  |  |  |  |  |
| $\eta_{max}$                  | Maximum slope efficiency                                                                                               | 0.78  | W/A  |  |  |  |  |  |
| I <sub>th</sub>               | Threshold current                                                                                                      | 0.6   | mA   |  |  |  |  |  |
| $C_V$                         |                                                                                                                        | 100   | fF   |  |  |  |  |  |
| $L_V$                         | $L_V$ A second-order circuit with signal-dependent damping factor and oscillation frequency to account for the VCSEL's |       |      |  |  |  |  |  |
| R <sub>V</sub>                | $\frac{\gamma_v}{4\pi^2 C_V f_r^2}$                                                                                    |       |      |  |  |  |  |  |
|                               | Driver and wire inductance                                                                                             |       |      |  |  |  |  |  |
| R <sub>o</sub>                | Driver's output resistance taken equal to the VCSEL's differential resistance $R_v$                                    | 85    | Ω    |  |  |  |  |  |
| Co                            | Driver's output capacitance                                                                                            | 150   | fF   |  |  |  |  |  |
| $L_{pkg1\&2}$                 | Bonding wire inductance                                                                                                | 1     | nH   |  |  |  |  |  |

Table 4.3: VCSEL and driver model parameters



Fig. 4.9. Modeled VCSEL performance excluding driver and package (a) P-I curve and (b) modulation response at various values of VCSEL current.

The main objective of modeling the transmitter is to choose the bias and modulation conditions of the VCSEL considering all parameters that could degrade the transmitted signal quality. This allows the power dissipation of the transmitter to be accurately calculated. To do so,  $I_{mod}$  and  $I_{bias}$ are chosen based on eye diagram simulations at the output of the transmitter. For example, Fig. 4.10 shows the simulation results for the eye diagrams at the transmitter output for data rates of 16 Gb/s and 25 Gb/s, a bias current of 4 mA, and a modulation current of 1 mA. The OMA is measured by the internal vertical eye-opening which is less than  $\eta_{max}I_{mod} = 0.78$  mW. This calculation of the OMA accurately accounts for the impact of ringing and inter-symbol interference on the quality of the transmitted signal.

#### 4.4.5 Link Budget

The emitted OMA from the laser must be sufficiently large that despite link losses and penalties, the received optical power exceeds the receiver's sensitivity limit. An example of a link budget in a short-reach optical link is given in [35]. In the worst scenario, losses and penalties can add up to 10.6 dB, including 1 dB of fiber dispersion penalty to account for up to 100 m of OM4 fiber at  $\geq$  25 Gb/s. A margin of 2 dB above the receiver sensitivity limit at BER of  $10^{-12}$  is also considered to ensure that the BER is achieved even if some of the losses or penalties were underestimated. Therefore, the link budget totals up to 12.6 dB, meaning that the launched OMA must be 12.6 dB larger than the receiver sensitivity limit at BER of  $10^{-12}$ .



Fig. 4.10. Model-generated eye diagrams at the output of the transmitter considering the driver, package, and VCSEL for  $I_{bias} = 4 \ mA$ ,  $I_{mod} = 1 \ mA$  and (a)  $f_{bit} = 16 \ Gb/s$ , and (b)  $f_{bit} = 25 \ Gb/s$ .

## 4.5 Optimization Procedure and Link Evaluation

At this point, we can calculate the DC power dissipation of all active parts of the link (TIA, MA, VCSEL, and LDD) for a given data rate and optical channel (PD, MMF, and VCSEL). Table 4.4 shows the procedure, values, and bounds used to calculate the energy efficiency of the receiver (RX), transmitter (TX), and overall link as a function of  $C_I/C_D$ .

#### 4.5.1 Link Evaluation for Moderate Data Rate and Swing Requirement

Fig. 4.11 shows the calculated efficiency as a function of  $C_I/C_D$  for a data rate of 16 Gb/s, swing requirement of 50 mV<sub>pp</sub>, and receiver architectures with a single-stage and a three-stage main amplifier. The vertical lines indicate the locations of the receiver's minimum noise (MN), best sensitivity (BS), and maximum gain (MG) obtained in Section 4.3. The bold markers indicate the minima of the corresponding curve. The TX energy dissipation naturally reaches a minimum value at the receiver's size that achieves the best receiver sensitivity, since this size minimizes the modulation current of the VCSEL and hence the TX's power dissipation. Note that the VCSEL's bias current depends on the VCSEL diode and the data rate but not on the receiver's sensitivity. More importantly, the overall link's energy dissipation reaches a minimum at a narrower receiver size than that required to minimize the TX energy dissipation. This can be explained as follows: as the receiver's width increases, its power dissipation quickly dominates the link's energy efficiency.



Fig. 4.11. Energy efficiency as a function of  $C_I/C_D$  for  $f_{bit} = 16 \ Gb/s$ ,  $V_s^{pp} = 50 \ mV_{pp}$ , and (a) n = 1 and (b) n = 3.

On the other hand, the TX energy efficiency curves show less variation against the receiver size as a result of the shallowness of the sensitivity curves in Fig. 4.5. This allows for significantly shrinking the receiver size before its power reduction is offset by the transmitter's increase in power due to increased modulation current requirements.

Due to the moderate data rate and swing requirements, a single MA stage is sufficient to optimize the performance. For n = 1, Table 4.5 and Fig. 4.11 indicate that the link achieves an efficiency of 1.51 pJ/bit and 1.79 pJ/bit when the receiver is optimized for sensitivity ( $C_I/C_D = 0.65$ ) and noise ( $C_I/C_D = 0.95$ ), respectively. Downsizing the receiver to  $C_I = 0.28C_D$ , improves the efficiency to 1.24 pJ/bit. This clearly implies that energy-efficient links require low-power receivers with transistor size smaller than that required for optimized sensitivity or noise performance. Table 4.5 also shows that as *n* increases, the receiver must employ smaller transistors to compensate for the increased power caused by the increased number of stages. For n = 3, the link achieves an optimum efficiency of 1.38 pJ/bit at  $C_I/C_D = 0.2$ , 1.54 pJ/bit better than the efficiency achieved when the receiver's noise is optimized at  $C_I/C_D = 0.99$ .

| Step |                               | Description                                            | Value and Bounds <sup>(1)</sup> |  |  |  |  |  |  |  |
|------|-------------------------------|--------------------------------------------------------|---------------------------------|--|--|--|--|--|--|--|
|      |                               | Data rate $f_{bit}$                                    | 16 GHz                          |  |  |  |  |  |  |  |
|      | Give data rate and            | PD capacitance $C_D$                                   | 200 fF                          |  |  |  |  |  |  |  |
| 1    | parameters of optical         | PD responsivity $\mathcal{R}_{pd}$                     | 0.55 A/W                        |  |  |  |  |  |  |  |
|      | devices                       | VCSEL and driver model                                 | Table 4.3 and Fig. 4.8          |  |  |  |  |  |  |  |
|      |                               | Link budget (LB)                                       | 12.6 dB                         |  |  |  |  |  |  |  |
| 2    | Set the width of the TIA b    | the width of the TIA by choosing a single value for NE |                                 |  |  |  |  |  |  |  |
| 2    | Set the width of the TIA b    | 100                                                    |                                 |  |  |  |  |  |  |  |
| 3    | Find $R_{F,MA}$ that achieves |                                                        |                                 |  |  |  |  |  |  |  |
| 5    | sets the receiver's overall   | bandwidth to $0.5 f_{bit}$                             |                                 |  |  |  |  |  |  |  |
| 4    | Calaulata                     | RX DC power $P_{DC,RX}$ (4.5)                          | $V_{DD_{RX}} = 1 \text{ V}$     |  |  |  |  |  |  |  |
| 4    | Calculate                     | RX sensitivity OMA <sub>RX</sub> <sup>sens</sup>       | Using (4.4)                     |  |  |  |  |  |  |  |
| 5    | Calculate the required OM     | $OMA_{TX} = OMA_{RX}^{sens}$                           |                                 |  |  |  |  |  |  |  |
| 5    | link budget provided in St    | ep 1                                                   | + LB                            |  |  |  |  |  |  |  |
|      |                               |                                                        | $I_{bias} > I_{th}$             |  |  |  |  |  |  |  |
| 6    | Calculate the VCSEL's         | $I_{mod}$ is determined by the required OMA            |                                 |  |  |  |  |  |  |  |
| 6    | bias and modulation           | while $I_{bias}$ is chosen to achieve the best         | $I_{mod} < I_{mod,max} $ &      |  |  |  |  |  |  |  |
|      | currents                      | $(I_{bias} + I_{mod}) < I_{v,max}$                     |                                 |  |  |  |  |  |  |  |
| 7    | Calculate the TX DC pow       | er $P_{DC,TX}$ (4.8)                                   | $V_{DD_D} = 1 \text{ V}$        |  |  |  |  |  |  |  |
| Q    | Scatter plot the energy e     | fficiency of the receiver, transmitter, and the        |                                 |  |  |  |  |  |  |  |
| 0    | overall link as a function of |                                                        |                                 |  |  |  |  |  |  |  |
| 9    | Repeat Steps 2 to 8 for a d   |                                                        |                                 |  |  |  |  |  |  |  |

Table 4.4: Optimization procedure and bounds.

<sup>(1)</sup> These values are used in simulations unless mentioned otherwise. The effect of changing these initial values is studied in Section 4.6.

| points.                |               |                              |                                 |                    |                                                                 |       |       |       |  |  |  |
|------------------------|---------------|------------------------------|---------------------------------|--------------------|-----------------------------------------------------------------|-------|-------|-------|--|--|--|
|                        | $f_{bit} = 1$ | 16 Gb/s an                   | $\operatorname{d} V_s^{pp} = 5$ | 0 mV <sub>pp</sub> | $f_{bit} = 25 \text{ Gb/s and } V_s^{pp} = 100 \text{ mV}_{pp}$ |       |       |       |  |  |  |
|                        | <i>n</i> =    | = 1                          | n = 3                           |                    | $n = 1 (V_{DD_D} = 1.2 \text{ V})$                              |       | n = 3 |       |  |  |  |
|                        | BS            | BEL                          | BS                              | BEL                | BS                                                              | BEL   | BS    | BEL   |  |  |  |
| $C_I/C_D$              | 0.65          | 0.28                         | 0.89                            | 0.2                | 0.75                                                            | 0.52  | 0.83  | 0.38  |  |  |  |
| RX power (mW)          | 9.4           | 4.11                         | 30.15                           | 6.85               | 10.87                                                           | 7.64  | 28.1  | 13.02 |  |  |  |
| I <sub>mod</sub> (mA)  | 1.15          | 1.49                         | 0.62                            | 1.31               | 6.6                                                             | 6.96  | 2.76  | 3.71  |  |  |  |
| I <sub>bias</sub> (mA) | 4             | 4                            | 4                               | 4                  | 4                                                               | 4     | 4     | 4     |  |  |  |
| TX power (mW)          | 14.72         | 15.71                        | 13.18                           | 15.19              | 33.93                                                           | 35.12 | 19.4  | 22.19 |  |  |  |
| Link power (mW)        | 24.11         | 24.11 19.82 43.33 22.04 44.8 |                                 | 44.8               | 42.76                                                           | 47.5  | 35.2  |       |  |  |  |

Table 4.5: Performance comparison between the receiver's best sensitivity, and link's best energy efficiency design points.

BS: Best receiver sensitivity BEL: Best energy efficiency of the link

#### 4.5.2 Link Evaluation for High Data Rate and Swing Requirements

The optimization of the link is repeated for a data rate of 25 Gb/s and a swing of 100 mV<sub>pp</sub> as shown in Fig. 4.12. The hollow markers in the figure indicate the points where the required OMA exceeds the transmitter capability, limited by the maximum modulation current that the LDD can provide. Therefore, in Fig. 4.12 (a),  $V_{DD_D}$  is increased to 1.2 V to increase  $I_{mod,max}$  to 7.1 mA<sub>pp</sub>. At this high data-rate, the bandwidth requirements of the receiver's front-end (TIA/MA) become more difficult to meet in the given CMOS processes which limit its gain. This in addition to the increased swing requirement moves the receiver's BS point toward the MG point and three MA stages become required to optimize the link performance. Table 4.5 and Fig. 4.12 (b) show that the link with n = 3 achieves an efficiency of 1.90 pJ/bit and 2.55 pJ/bit when the receiver is optimized for sensitivity ( $C_I/C_D = 0.83$ ) and noise ( $C_I/C_D = 1.29$ ), respectively. The efficiency is improved to 1.41 pJ/bit when the receiver is downsized to  $C_I = 0.38C_D$ , confirming that transistor size much smaller than the noise-optimum size and even smaller than that required for optimized sensitivity is needed for optimal energy efficiency. Table 4.5 also indicates that a larger number of gain stages in the receiver reduces modulation current requirements which is desirable for the long-term reliability of the VCSEL.



Fig. 4.12. Energy efficiency as a function of  $C_I/C_D$  for  $f_{bit} = 25 \ Gb/s$ ,  $V_s^{pp} = 100 \ mV_{pp}$ , and (a)  $n = 1 \ (V_{DD_D}$  is increased to 1.2 V) and (b) n = 3.

#### 4.5.3 Validation of Model Accuracy

To validate the accuracy of the presented model and optimization procedure, the receiver with a signal-stage and a three-stage MA are designed and simulated in Cadence Spectre. The circuit parameters (NF,  $R_{F,TIA}$ , and  $R_{F,CH}$ ) required to achieve the best energy-efficiency of the overall link are obtained from the Matlab code, then used in circuit simulations. The simulated and modeled results of the bandwidth, gain, and input-referred noise of the overall FE are in good agreement for all comparison scenarios with a maximum error of less than 1 GHz, 2 dBQ, and  $0.12 \,\mu A_{rms}$ , respectively. Further, the TX model in Fig. 4.8 is used with the designed receivers to simulate the eye diagrams at the output of the receivers as shown in Fig. 4.13. The output power of the TX (the voltage across  $C_{V}$ ) is converted to a current by an ideal voltage-controlled current source (VCCS), then fed to the RX input. The VCCS has a gain of 30.225 mA/V to account for the link budget (12.6 dB) and the photodiode responsivity (0.55 A/W). The internal vertical eyeopening (IVEO) is better than 88 % and 80 % of the peak-to-peak output  $(V_{out,pp})$  required for a BER of  $10^{-12}$  at 16 Gb/s and 25 Gb/s, respectively.  $V_{out,pp}$  is calculated from circuit simulations as  $V_{out,pp} = SNRV_{n,rms} + V_s^{PP}$ , where  $V_{n,rms}$  is the simulated rms output-referred noise voltage. The close agreement between the IVEO and the  $V_{out,pp}$  validates the accuracy of the presented optimization procedure.

### 4.6 Discussion

The initial values in Table 4.4 greatly impact the link energy-efficiency. This section investigates the impact of technology advances on the receiver power-sensitivity trade-off. The performance of the link across a broad range of technologies and data rates is summarized in Table 4.6.



Fig. 4.13. Simulation results for the eye diagrams at the receiver output for various data rates and receiver architectures. The circuit parameters and the required peak-to-peak output voltage are also listed for each eye.

#### 4.6.1 Advances on Photonic and Interconnect Technologies

Advanced photonic and interconnect technologies are assumed where the photodiode and pad capacitance and the photodiode responsivity are changed to 120 fF and 0.8 A/W, respectively. The link budget is reduced to 8.6 dB. Signal degradation due to package inductance is ignored. The VCSEL is assumed to have sufficient bandwidth allowing its slope efficiency to be calculated by its maximum value of 0.78 W/A instead of being calculated from the eye-diagram simulations as in Section 4.5 (see Fig. 4.10).

This advanced platform is used with the extracted parameters for the CMOS inverter in Table 4.1 to evaluate the link performance for various data rates and swing requirements as shown in Fig. 4.14 (a). The advances in photonic and interconnect technologies improve the receiver sensitivity, reduce the cost of conversion from receiver current sensitivity to transmitter emitted optical power, and improves the laser's modulation efficiency. These factors significantly improve the link's energy efficiency and allow for further reducing the receiver power. For example, at 25 Gb/s, the energy dissipation of the link in Fig. 4.14 (a) reaches a minimum for n = 1 and  $C_I/C_D = 0.4$  compared to n = 3 and  $C_I/C_D = 0.38$  for the link in Fig. 4.12 where a typical photonic platform is used as shown in Table 4.6. The table also shows that at lower data rates, the optimum energy efficiency of the overall link is achieved by drastically undersizing the receiver far from the capacitive matching rule. Downsizing the receiver improves the efficiency of the overall link by 0.27 pJ/bit and 0.52 pJ/bit at 25 Gb/s, and at 10 Gb/s, respectively.

#### 4.6.2 Advances in CMOS Technology

As CMOS technology scales, the peak transit frequency improves. Further, FinFET processes overcome the low intrinsic gain in scaled-CMOS technologies and offer an improved transconductance to drain current ratio [41]. To capture these effects, the parasitic capacitances in Table 4.1 are scaled by a factor of  $0.5 \times$  while the transconductance and the output resistance are unchanged. This has and effect of doubling the transit frequency at the biasing point to  $f_{\rm T} = 114$  GHz while keeping the DC gain of the inverter fixed at  $A_0 = 6.2$  V/V. Further, the supply voltage,  $P_{\rm DC,1\mu m}$ , and the excess noise factor are assumed to be 0.8 V, 0.058 mW/µm, and 2, respectively. This hypothetical CMOS technology is used with the typical photonic platform in Table 4.4 to evaluate the link performance for various data rates and swing requirements as shown in Fig. 4.14 (b). Advances in CMOS technology improve the sensitivity of the receiver and reduce the DC power dissipation on both the receiver and the transmitter. This in turn improves the link's energy efficiency and allows for further shrinking the receiver below its noise-optimum size. As a result, at 25 Gb/s, the energy dissipation of the link in Fig. 4.14 (b) reaches a minimum value for a receiver with n = 1 and  $C_1/C_D = 0.27$ , compared to n = 3 and  $C_1/C_D = 0.38$  for the link in Fig. 4.12 where 65 nm CMOS technology is used.

|                         |                             | 65 nm CMOS and<br>Typical Photonics |      |      | 65 nm CMOS and<br>Advanced Photonics |       |      | Advanced CMOS and<br>Typical Photonics |      |      |
|-------------------------|-----------------------------|-------------------------------------|------|------|--------------------------------------|-------|------|----------------------------------------|------|------|
| f <sub>bit</sub>        | (Gb/s)                      | 10                                  | 16   | 25   | 10                                   | 16    | 25   | 10 16 25                               |      | 25   |
| Best Link<br>Efficiency | n                           | 1                                   | 1    | 3    | 1                                    | 1     | 1    | 1                                      | 1    | 1    |
|                         | $C_I/C_D$                   | 0.18                                | 0.28 | 0.38 | 0.13                                 | 0.24  | 0.40 | 0.11                                   | 0.16 | 0.27 |
|                         | Link Efficiency<br>(pJ/bit) | 1.63                                | 1.24 | 1.41 | 1.32                                 | 0.897 | 0.73 | 1.44                                   | 1.06 | 0.96 |
| If RX MN is maintained  | $C_I/C_D$                   | 0.77                                | 0.95 | 1.29 | 0.77                                 | 0.97  | 1.24 | 0.82                                   | 0.95 | 1.12 |
|                         | Link Efficiency<br>(pJ/bit) | 2.41                                | 1.79 | 2.55 | 1.84                                 | 1.23  | 1.00 | 2.58                                   | 1.82 | 1.51 |

Table 4.6: Link performance across a broad range of technologies and data rates.



Fig. 4.14. Link performance at various data rates and swing requirements (a) using 65 nm CMOS technology and advanced photonic and interconnect technologies (b) using advanced CMOS technology and typical photonic and interconnect technologies. A receiver with a single-stage MA is used for both simulations.

Table 4.6 shows that selecting  $C_I/C_D$  based on link efficiency rather than noise optimization improves energy efficiency by 0.55 pJ/bit and 1.14 pJ/bit at 25 Gb/s, and at 10 Gb/s, respectively. As expected, more improvement is observed compared to Fig. 4.14 (a) because of the use of higher  $C_D$ .

#### 4.6.3 Other Implementations of Transmitter and Receiver Subblocks

All receivers discussed in this work are based on single-ended CMOS inverter implementation. This makes them vulnerable to power supply noise especially in noisy environments such as multichannel links. Differential implementation is a potential solution to overcome supply variations. This solution doubles the power dissipation and silicon area compared to a single-ended design that achieves the same gain and bandwidth. Output noise power also doubles in differential implementation which means the input-referred noise is increased by a factor of  $\sqrt{2}$ . This means that the DC power dissipation increases faster than the input noise, further motivating the design of low power receivers with input transistors much smaller that the noise-optimum size.

In this chapter, for simplicity,  $V_s^{PP}$  is chosen to be 50 mV<sub>PP</sub> for  $f_{bit} \leq 16$  Gb/s and 100 mV<sub>PP</sub> for 16 Gb/s  $\leq f_{bit} \leq 25$  Gb/s. An accurate choice for  $V_s^{PP}$  requires an accurate modeling for the decision circuit. In a hypothetical scenario where  $V_s^{PP}$  can be significantly reduced, the CDR's PP can be ignored. Therefore, the receiver should be designed based on the noise-power trade-off. Table 4.2 shows that, for example, reducing the receiver size from the MN point to the MG point saves about 50 % of the power dissipation for a minor degradation in the noise-based sensitivity for all RX architectures. On the other extreme where  $V_s^{PP}$  is significantly large, the CDR's PP dominates the overall sensitivity Therefore, the receiver should be designed for maximum gain. This indicates that our conclusion that energy-efficient links require low-power receivers with input transistors much smaller than the noise-optimum size is extendable for a wide range of  $V_s^{PP}$ .

Simulation results show that a reduction in  $C_D$  enhances the sensitivity of the receiver and reduces the power dissipation of the overall link for a given data rate. This motivates a research to design high-speed and improved- responsivity PDs with advanced integration, packaging and ESD techniques to minimize various parasitic capacitances at the TIA's input. A significant reduction in  $C_D$  (below few tens of fF) emphasises the role of  $C_F$  in determining the sensitivity. As a result, any reduction in  $C_F$  would greatly enhance the RX sensitivity which suggests that CMOS technology scaling can be leveraged to improve the RX sensitivity and the link's overall energy efficiency. With very low  $C_D$  and  $C_F$ , the RX energy efficiency significantly improves. However, this does not lead to substantial improvements in the TX's energy efficiency. That is, the TX power dissipation becomes dominated by bias current requirements of the VCSEL that does not depend on the RX design. As a result, the VCSEL starts to limit the performance of the overall link. Transmitter side equalization and more effective laser devices become necessary to leverage the improved RX performance at low  $C_D$  and  $C_F$  values.

## 4.7 Conclusion

The sensitivity-power trade-off in optical receivers is analyzed to minimize the energy-per-bit dissipation for the overall link. The sensitivity is calculated as a function of the receiver's input capacitance relative to the detector capacitance for various receiver architectures, data rates, and swing requirements. The observed shallowness of the sensitivity curves around their minima suggests that maintaining the capacitive matching rule to optimize the noise performance leads to a significant degradation in the energy-efficiency of the receiver for a minor improvement in the sensitivity. This observation motivated the investigation of how small the receiver can become before its power reduction is offset by the transmitter's increase in power. For that purpose, accurate modeling for the transmitter and link budget is presented. Table 4.6 shows that across a broad range of technologies and data rates, simulation results show that the optimum energy-efficiency of the overall link is achieved by drastically under sizing the receiver far from its noise-optimum size.

## Chapter 5

# An Inductorless Power-Efficient Design Technique for Linear Equalization in CMOS Optical Receivers

## 5.1 Introduction

This Chapter presents a novel inductorless design technique for high-gain optical receiver frontends. Fig. 5.1 illustrates the operation of the proposed front-end in contrast to the traditional wideband front-end. Conventionally, the TIA and the follow-on MA are respectively designed to have bandwidths on the order of  $0.6f_{bit}$  and  $f_{bit}$ , respectively, to achieve an overall bandwidth of approximately  $0.5f_{bit}$  [8]. In the proposed receiver, first, the TIA's bandwidth is reduced to approximately 25 % of the targeted data rate. The reduced TIA bandwidth allows for higher gain, lower input-referred noise, and fewer follow-on gain stages.



Fig. 5.1. The proposed and the conventional receivers are represented by the same block diagram (top). The bottom graph illustrates the operation of the proposed receiver (black) in contrast to that of the conventional receiver (gray).

The reduction in bandwidth also introduces inter-symbol interference (ISI) to the extent that the TIA's output eye diagram is fully closed. Unlike a bandlimited electrical channel which can introduce more than 30 dB of channel loss at the Nyquist frequency ( $f_N = 0.5 f_{bit}$ ), the low-bandwidth TIA introduces a moderate frequency-dependent attenuation. Consequently, a few dBs of amplitude peaking at  $f_N$  is sufficient to restore the required bandwidth. Therefore, in the second step of the proposed design technique, high-frequency peaking is intentionally introduced in the main amplifier's amplitude response without impairing its low-frequency gain. This peaking is realized by inserting a pole in the feedback loop of various possible designs of active feedback-based MA architectures [9] [10] [13] [14] [42]. The amplitude peaking in the equalizing main amplifier (EMA) is then used to compensate for the TIA's limited bandwidth to restore an overall bandwidth of approximately  $0.5 f_{bit}$ . Although Fig. 5.1 shows only the magnitude response of the TIA and EMA, group-delay variation must also be considered.

In contrast to traditional continuous-time linear equalizer (CTLE)-based designs [22] [23] [43], the proposed front-end attains the improved sensitivity and high-gain of these designs, while achieving better energy efficiency due to the elimination of the standalone equalizer stage(s). Further, the traditional approach to CTLE design suffers from limited bandwidth and consequently insufficient peaking at high frequencies. Therefore, inductive peaking is usually employed to extend the bandwidth. On the other hand, various inductorless feedback techniques can be used to design main amplifiers with gain-bandwidth product (GBW) far superior to a cascade of first-order stages. The improvement is the result of poles moving away from the negative real axis. A combination of poles with high- and low-quality factors gives better GBW for the same pole magnitude. The proposed approach to design an EMA improves overall receiver performance by increasing the gain of the TIA and improving noise performance as argued [23], but with the wideband performance of state-of-the-art MA designs.

The proposed design technique requires co-designing the TIA and the subsequent equalizing amplifier. Therefore, both stages are paid equal attention in the analysis. Section 5.2 in this chapter provides a detailed analysis of the TIA, highlighting the trade-off between its gain and bandwidth. Section 5.3 introduces the concept and the block diagram of the proposed EMA. The performance of the overall FE (TIA/EMA) is studied in Section 5.4. Section 5.5 shows the circuitry and simulation results of the implemented FE. Section 5.6 describes the measured performance of the implemented prototype in comparison to prior work. Finally, Section 5.7 concludes the work.
## 5.2 Low-Bandwidth TIA

#### 5.2.1 Small-Signal Model and Frequency Response

The inverter-based TIA (Inv-TIA) is used in this work due to its superior noise performance over its common-gate (CG) counterpart. Further, unlike the CG-TIA, the Inv-TIA is a self-biased topology that decouples the gain from the transconductance of the input device and allows for performance optimization without being limited by DC biasing constraints. The circuitry and the small-signal model of the Inv-TIA are shown in Fig. 3.2. The analysis in chapter 3 shows that the Inv-TIA exhibits a second-order transfer function characterized by a natural oscillation frequency  $\omega_0$ , a pole quality factor  $Q_0$ , and a midband transimpedance gain of  $Z_{TIA,0} \cong R_F$  [8]. The natural oscillation frequency  $\omega_0$  is converted to the corresponding TIA's 3dB-bandwidth  $(f_{TIA})$  through a coefficient  $\rho$  that depends on shape of the TIA's amplitude response. I.e.,  $\rho$  is a function of  $Q_0$  as shown in (3.6) [16].

In the Inv-TIA,  $A_0$  is constant for a given biasing condition, fixed ratio of  $W_p/W_n$ , and technology node. For example, an inverter with  $W_n = W_p$ ,  $V_{DD} = 1$  V, and simulated in TSMC 65 nm CMOS technology achieves  $A_0$  of 6 V/V. Further, the gain-bandwidth product of the core amplifier is also constant. The circuit's input capacitance ( $C_1$ ) is determined by the total transistor width and is usually chosen as a fraction of the photodiode capacitance based on the noise and power constraints [16]. Therefore, for a given  $C_D$ , once  $C_I$  is fixed, the TIA's performance is controllable only through the feedback resistor. In this chapter, unless mentioned otherwise,  $A_0$ ,  $GBW_A$  and  $C_T$  are set to 6 V/V, 75 GHz and 200 fF, respectively, in simulations.

Fig. 5.2 (a) shows that both the 3dB bandwidth and the pole quality factor  $Q_0$  decrease with larger feedback resistor  $R_F$ . The bandwidth degrades almost linearly with the feedback resistor. The bandwidth does not follow the square-law relation  $(R_F \alpha f_{TIA}^{-2})$  predicted by the Transimpedance Limit [7]. This discrepancy can be explained as follows: Unlike [7], the model in this work allows  $Q_0$  to change with  $R_F \left(Q_0 = \sqrt{(A_0 + 1)R_F C_T T_A}/(R_F C_T + T_A)\right)$ . For sufficiently large  $R_F$  that makes  $R_F C_T \gg T_A$ ,  $Q_0$  is proportional to  $R_F^{-0.5}$ . Consequently, it is reasonable to assume that  $\rho$  is also proportional to  $R_F^{-0.5}$  with a percentage error of less than ±8% as shown in Fig. 5.2 (b). Using this relation to rearrange the transimpedance limit from [7]



Fig. 5.2. (a) TIA's 3dB bandwidth and pole  $Q_0$  as a function of the feedback resistor. (b) The exact and the approximate calculations of  $\rho$  as a function of the feedback resistor.

$$f_{TIA}^2 = \frac{(A_0 + 1)}{A_0} \frac{GBW_A \rho^2}{2\pi R_F C_T}$$
(5.1.*a*)

which implies

$$f_{TIA}^2 \sim \frac{1}{R_F^2} \rightarrow f_{TIA} \sim \frac{1}{R_F}$$
 (5.1.*b*)

This means that changing  $R_F$  changes both the pole magnitude ( $\omega_0$ ) and the pole quality factor ( $Q_0$ ) which modifies the bandwidth dependency on the feedback resistor from that given in [7] where  $Q_0$  is assumed to be constant. Assuming a constant  $Q_0$  when  $R_F$  is increases by a factor of r requires both  $A_0$  and  $T_A$  to scale up by a factor of  $\sqrt{r}$ . Practically, this approach is not feasible since the voltage gain of a single-stage CMOS inverter is constant for a given biasing and its maximum value is limited by the technology node.

#### 5.2.2 Effective Gain

As explained in chapter 3, when  $f_{TIA}$  is reduced far below  $f_{bit}$ , severe ISI is introduced to the extent that the output eye diagram is fully closed. Therefore, the DC value of  $Z_{TIA}(s)$  becomes a deceptive measure of the gain. The effective gain must be calculated from the transient response, more precisely, from the pulse response [28]. The TIA's pulse response is the response to an isolated binary one transmitted in a long sequence of binary zeros. Assuming a *linear* time-invariant (LTI) operation, if the TIA's response to a step input with a peak-to-peak value of  $i_{pp}$  is defined as x(t), then its pulse response is calculated as y(t) = x(t) - x(t - UI), where UI is the unit interval. The output pulse response of the Inv-TIA is plotted in Fig. 5.3 (a) for a data rate of 10 Gb/s with  $i_{pp} = 10 \ \mu A_{pp}$  and a bandwidth ranging from  $0.2f_{bit}$  to  $0.6f_{bit}$ . To quantify the ISI, y(t) is sampled at the symbol rate relative to its peak (as shown by the marker points in Fig. 5.3 (a)), resulting in a discrete-time sequence  $V_{h,n}$  given by

$$V_{h,n} = y(nT_b) \qquad -\infty < n < \infty \tag{5.2}$$

The sample at the peak of the pulse is denoted as the main-cursor sample  $(V_{h,0})$ . An effective gain of  $Z_{h,0}$  can be interpreted as  $V_{h,0}/i_{pp}$  if all ISI is canceled. In the absence of equalization, the ISI samples  $(V_{h,n\neq 0})$  can be subtracted from  $V_{h,0}$ , closing the vertical eye-opening (VEO) to

$$VEO = V_{h,0} - \sum_{\substack{n = -\infty \\ n \neq 0}}^{\infty} |V_{h,n}|$$
(5.3)

The VEO can be used to determine an effective gain of  $Z_{VEO} = VEO/i_{pp}$  for the case in which the ISI is not removed or is only partially removed. The midband gain  $Z_{TIA,0}$  can also be interpreted as an effective gain if an ideal unity-gain continuous-time linear equalizer (CTLE) is employed. The CTLE compensates for the bandwidth limitation of the TIA and restores an overall bandwidth on the order of  $0.6f_{bit}$  without impairing the low-frequency gain. Therefore, the TIA's midband gain  $Z_{TIA,0}$  at the low bandwidth point can be used as the effective gain for the combined (TIA/CTLE).



Fig. 5.3. (a) Output pulse response for various values of  $f_{TIA}/f_{bit}$ . The input current pulse has a peak-to-peak value of 10  $\mu A_{pp}$  and width of 100 ps. (b) Different gains as a function of  $f_{TIA}/f_{bit}$ .  $f_{bit}$  is fixed at 10 Gb/s while  $f_{TIA}$  is swept by varying  $R_F$ . The labeled points in (b) illustrate that linear equalization is favorable for applications that require high gain in the receiver FE.

Fig. 5.3 (b) shows that linear equalization improves the effective gain over both full-bandwidth and ISI canceller-based designs. For example, if the TIA's bandwidth is reduced from  $0.6f_{bit}$ (point a) to  $0.3f_{bit}$  and an ideal CTLE is employed (point b), the effective gain improves by a factor of 1.86x compared to point a. The gain at  $0.3f_{bit}$  (point b) is also 1.23x larger than that where an ideal ISI-canceller is employed (point c). That is, ISI cancellers have no bearing on the TIA's bandwidth which means that the output pulse of a limited-bandwidth TIA does not have enough time to settle at the voltage value  $(i_{pp}Z_{TIA,0})$ . Further, ideal cancelers that remove all preand post-cursor ISI are not implementable. For example, decision feedback equalizers (DFEs) [25] cancel only the post-cursor ISI. DFEs also suffer from a tight timing constraint where the feedback signal from the previously decided bit must arrive within one unit interval (UI) to resolve the current bit. These limitations make linear equalization a more attractive choice for applications that require high gain in the receiver FE. DFEs, on the other hand, are favorable over CTLEs from the noise point of view. That is, CTLEs extend the noise bandwidth to be a function of the bandwidth of the combined TIA/CTLE instead of being a function of the bandwidth of the lowbandwidth TIA as in the DFE-based receivers [28].

# 5. 3 Equalizing Main Amplifier

In addition to high-gain and broadband operation, adjustable high-frequency peaking (HFP) is a desirable feature in MA design. The amplitude peaking at the Nyquist frequency can mitigate the bandwidth limitation introduced by other components in the optical link. For example, in [44], shunt and series passive inductors are employed between cascaded stages of a programmable gain amplifier to realize a HFP. The HFP is then used to partially compensate for the varying performance of the multi-mode fiber. In this work, passive inductors are avoided because they consume significant silicon area and potentially increase substrate coupling. The HFP is realized by introducing a pole in the feedback loop of an active feedback-based MA architecture and used to compensate for the TIA's limited bandwidth.

## 5.3.1 Equalizing MA Based on a Third-Order Gain Stage

The block diagrams of the conventional and proposed gain stages are shown in Fig. 5.4 (a) and (b), respectively. The conventional architecture is presented in [15] where a third-order nested feedback technique achieves high-speed operation while maintaining robust stability compared to the traditional third-order gain stage. In the block diagram in Fig. 5.4 (a), the first-order gain cell, A(s), is modeled by the transconductance of the input device  $g_{m1}$ , load resistance  $R_1$ , and load capacitance  $C_1$ . The adjustable active feedback  $\beta_{con}(s)$  cell is modeled by the transconductance  $-g_{mf}$ . Therefore, the transfer functions of the first-order gain and feedback cells are given by

$$A(s) = \frac{A_1}{\frac{s}{\omega_1} + 1}, \qquad \beta_{con}(s) = \frac{\beta_1}{\frac{s}{\omega_1} + 1}, \tag{5.4}$$

where  $A_1 = g_{m1}R_1$  and  $\omega_1 = (R_1C_1)^{-1}$  are the DC gain and cut-off frequency of the first-order gain cell, respectively.  $\beta_1 = g_{mf}R_1$  is the DC feedback gain. The transfer function of the overall architecture in Fig. 5.4 (a) is given by

$$H_{MA}(s) = \frac{A^3(s)}{A^2(s)\beta_{con}(s) + A(s)\beta_{con}(s) + 1}$$
(5.5)

In this work, two poles are introduced in the feedback loops to create an adjustable HFP without impairing the low-frequency gain. The transfer function of the proposed EMA is calculated using (5) by replacing  $\beta_{con}(s)$  by  $\beta_{Pro}$  given in (5.6).

$$\beta_{pro}(s) = \frac{\beta_1}{\left(\frac{s}{\omega_1} + 1\right)\left(\frac{s}{\omega_Z} + 1\right)}$$
(5.6)

where  $\omega_Z = (R_Z C_Z)^{-1}$  is the cut-off frequency of the introduced low-pass filter which is assumed to have negligible loading on the output node. Therefore, the transfer function of the EMA in Fig. 5.4 (b) is given by

$$H_{EMA}(s) = \frac{A_1^3 \left(\frac{s}{\omega_Z} + 1\right)}{\left(\frac{s}{\omega_1} + 1\right)^3 \left(\frac{s}{\omega_Z} + 1\right) + A_1 \beta_1 \left(\frac{s}{\omega_1} + 1\right) + A_1^2 \beta_1}$$
(5.7)

The pole-zero locations of (5.7) are plotted in Fig. 5.5 (a) in comparison with that of (5.5) for  $\beta_1$ ,  $A_1$  and  $\omega_1$  fixed at 0.25, 2.5 and  $2\pi \times 30$  GHz, respectively. The poles of the conventional architecture are indicated by black x-markers. For the proposed EMA,  $\omega_Z$  is swept from  $0.5\omega_1$  to  $5\omega_1$ . The insertion of the LPF in the feedback loops of the proposed EMA creates a real zero at  $\omega_Z$  (shown in blue). It also increases the order of the denominators of  $\beta_{pro}(s)$  and  $H_{EMA}(s)$  compared to their conventional counterparts. As a result, for low values of  $\omega_Z$ , the proposed EMA has two sets of complex-conjugate poles ( $P_A$  and  $P_B$ ) (shown in red). As  $\omega_Z$  increases,  $P_A$  travels toward the complex poles of (5) while the damping factor of  $P_B$  increases until the two poles become real and start traveling in opposite directions. At sufficiently high  $\omega_Z$ ,  $P_{B2}$  and the real zero cancel each other,  $P_{B1}$  reaches the real pole of (5.5) and the overall architecture degenerates to the third-order gain stage in [15].

The impact of varying  $\omega_Z$  on the amplitude response of the proposed EMA is depicted in Fig. 5.5 (b). For a given  $\beta_1$ , HFP can be introduced independent from the low-frequency gain. The peak of the amplitude response moves to a lower frequency as  $\omega_Z$  is reduced.



Fig. 5.4. Block diagram of (a) the third-order gain stage in [15] (b) the proposed EMA with a LPF inserted in each feedback path.



Fig. 5.5. (a) Pole-zero locations of the proposed EMA for various values of  $\omega_Z$  in comparison to the conventional third-order gain stage where  $\omega_Z = \infty$ . The dashed arrows indicate the direction of pole-zero movements as  $\omega_Z$  increases (b) amplitude response of the proposed EMA for various ratios of  $\omega_Z/\omega_1$ .  $\beta_1$ ,  $A_1$  and  $\omega_1$  are fixed at 0.25, 2.5, and  $2\pi \times 30$  GHz, respectively.

As a numerical example, for  $\omega_Z = 0.1\omega_1$ , the EMA achieves amplitude peaking of 6 dB at 5 GHz increases to 10.5 dB at 11 GHz. In the presence of such a high amplitude peaking, it is not instructive to explore the bandwidth of the EMA. Instead, the bandwidth extension ratio and the signal integrity are inspected in the following section for the overall front-end which includes the limited-bandwidth TIA and the EMA.

# 5. 4 Front-end Performance Analysis

## 5.4.1 Performance Requirements for the EMA

As explained in chapter 2, a noise-limited input signal produces a peak-to-peak voltage of  $V_0^{PP} = SNR i_{n,in}^{rms} Z_{TIA,0} A_{EMA,0}$  at the output of the front-end assuming that the EMA restores a wide overall bandwidth, where *SNR* is the required signal-to-noise ratio and equal to 14.07 for BER of  $10^{-12}$  [8],  $i_{n,in}^{rms}$  is the input-referred noise current and  $A_{EMA,0}$  is the DC gain of the EMA.  $V_0^{PP}$  is sufficiently large to drive an ideal clock-and-data recovery (CDR) circuit to achieve the desired BER. However, the decision circuit in a realistic CDR has a finite sensitivity and requires a minimum input voltage amplitude ( $V_{CDR}^{PP}$ ). Therefore, the FE's output voltage needs to be increased by  $V_{CDR}^{PP}$  to attain the same BER as an ideal CDR. The finite sensitivity of the CDR incurs a power penalty (PP) of

$$PP = \frac{V_0^{PP} + V_{CDR}^{PP}}{V_0^{PP}} = 1 + \frac{V_{CDR}^{PP}}{SNR \ i_{n,in}^{rms} \ Z_{TIA,0} \ A_{EMA,0}}$$
(5.8)

The equation reveals that higher transimpedance gain relaxes the gain requirements for the EMA for a given PP. Fig. 5.3 (b) shown earlier indicates that reducing the ratio  $f_{TIA}/f_{bit}$  is beneficial for the gain as long as the equalizer is able to recover an overall bandwidth to approximately 50% to 60 % of the targeted data rate. Therefore, the equalizer capability in restoring the bandwidth determines how far the TIA's bandwidth can be reduced below the data rate. That is, excessive reduction of the TIA's bandwidth would require the equalizer to introduce a large amount of amplitude peaking which translates into large group delay variation (GDV). The latter causes horizontal and vertical eye closure which reduces the gain and noise improvements gained from equalization. In [23], it is concluded that the equalizer can restore the bandwidth by a factor of approximately 2× while simultaneously maintaining a good noise performance and a good quality of the equalized eye diagram.

For the conventional wideband TIA, a feedback resistor of  $1.25 \text{ k}\Omega$  is chosen to achieve a bandwidth of  $0.57f_{bit}$ , sufficiently large to introduce no ISI. The TIA's bandwidth drops almost linearly with  $R_F$  as observed in Fig. 5.2 (a). Therefore, in the proposed design, the value of the feedback resistor is doubled, leading to a bandwidth  $0.26f_{bit}$ . At this bandwidth, the TIA achieves a  $Z_{TIA,0}$  of 66.6 dB $\Omega$  (2143  $\Omega$ ) while introducing an attenuation of 7.2 dB at the Nyquist frequency ( $f_N = 0.5f_{bit} = 5$  GHz). The EMA is now required to recover the bandwidth by a factor ranging from  $1.9 \times to 2.3 \times to$  achieve an overall bandwidth on the order of 50 % to 60 % of  $f_{bit}$ . For example, using the gain of the low-bandwidth TIA while assuming  $V_{CDR}^{PP}$ , SNR, and  $i_{n,in}^{rms}$  of 50 mV<sub>pp</sub>, 14.07, and 1  $\mu$ A<sub>rms</sub>, respectively, the PP in (8) can be used to calculate the required gain of the EMA.In addition to recovering the bandwidth, the EMA is required to amplify the TIA's output by a low-frequency gain of approximately 20 dB to reduce the PP to less than 0.67 dB (1.17). Practically, the EMA's gain is determined to reduce the PP to a pre-determined value obtained from link budget analysis.

## 5.4.2 Bandwidth Extension Ratio and Signal Integrity

Fig. 5.6 shows the block diagram of the proposed front-end where the limited-bandwidth TIA is followed by a two-stage EMA. The EMA's second stage is added to relax the gain requirements. The two-stage EMA is modified based on the two-stage MA presented in [14] by inserting low pass filters in the feedback loops of the second stage. Therefore, the transfer function of the overall front-end (FE) is given by  $Z_{FE}(s) = Z_{TIA}(s)H_{2-EMA}(s)$ , where  $H_{2-EMA}(s)$  is the transfer function of the two-stage EMA and given by

$$H_{2-EMA}(s) = \frac{A^5(s)}{Den(s)}$$
 (5.9)



Fig. 5.6. Block diagram of the proposed front-end. The two-stage EMA is modified based on the two-stage MA in [14]. The grayed feedback cells indicate the locations of the inserted poles.

The denominator Den(s) is expressed as

$$Den(s) = 1 + A(s) [\beta_{con}(s) + \beta_{pro}(s)] + A^2(s) [\beta_{con}(s) + \beta_{pro}(s) + \beta_{con}(s)\beta_{pro}(s)] + A^3(s)\beta_{con}(s)\beta_{pro}(s)$$
(5.10)

Once the TIA's feedback resistor is fixed, the full design space is reduced to only two variables:  $\omega_Z$  and  $\beta_1$ . These two variables are swept, and the following equations are solved numerically to calculate the bandwidth ( $f_{FE}$ ) the low-frequency gain ( $Z_{FE,0}$ ) and the peaking ( $M_p$ ) of the overall FE

$$|Z_{FE}(2\pi f_{FE})| = \frac{1}{\sqrt{2}} |Z_{FE}(j\omega)|_{\omega=0}$$
(5.11.*a*)

$$Z_{FE,0} = 20 \log_{10} |Z_{FE}(j\omega)|_{\omega=0}$$
(5.11.b)

$$M_{P} = 20 \log_{10} \frac{\max(|Z_{FE}(j\omega)|)}{|Z_{FE}(j\omega)|_{\omega=0}}$$
(5.11.*c*)

Several combinations of  $\beta_1$  and  $\omega_z$  can achieve the required bandwidth extension but with different noise performance (the noise analysis is presented in the following section). The feedback gain  $\beta_1$  directly impacts the low-frequency gain of the EMA and is chosen to satisfy the power penalty condition indicted earlier. Then,  $\omega_z$  is swept to achieve the required bandwidth extension ratio defined as  $f_{FE}/f_{TIA}$ . The pair of  $\omega_z = 0.075\omega_1$  and  $\beta_1 = 0.25$  is chosen as it archives a good noise performance as well as a good quality of the output eye. The corresponding frequency response is plotted in Fig. 5.7 (a) where the EMA introduces 5 dB of peaking and extends the bandwidth by a factor of 2.2×. The gain peaking in the overall frequency response is less than 0.1 dB. Fig. 5.7 (b) shows the pulse response at the output of the FE. To quantify the vertical and horizontal eye openings, the output pulse is sampled at a bit rate clock relative to its peak. The pulse is sampled at both the rising and falling edges of the clock.

The sum of the magnitude of the samples at the even clock edges (filled markers for  $n \neq 0$ ) quantifies the ISI. The sum of the samples at the odd clock edges (hollow markers) is considered as a jitter indicator (JI). Note that the falling edges of the clock are the zero-crossing points of the data. Therefore, the defined JI includes only the deterministic jitter caused by the residual ISI or ringing in the time domain [45]. The sum of ISI and JI samples is less than 6.5% of the main cursor sample which implies that the eye has a wide internal opening area as evident also from the eye diagram in Fig. 5.8 (a) obtained through simulation. Fig. 5.8 (b), shows the output eye diagram when the limited bandwidth TIA is followed by a wideband MA that consumes the same power as the EMA. The comparison between the two eyes in Fig. 5.8 demonstrates the capability of the presented technique in restoring the bandwidth without impairing the midband gain.



Fig. 5.7. (a) Amplitude response (b) output response to an input current pulse with a peak-to-peak value of  $15 \,\mu A_{pp}$  and width of 100 ps. The EMA parameters are  $\omega_Z/\omega_1 = 0.075$  and  $\beta_1 = 0.25$ .



Fig. 5.8. Matlab generated 10 Gb/s output eye diagrams when the limited-bandwidth TIA is followed by (a) an EMA and (b) a wideband MA. The peak-to-peak value of the input current is fixed at 15  $\mu A_{pp}$ .

#### 5.4.3 Noise Analysis

Fig. 5.9 (a) shows the model used for noise analysis. The main noise sources in the Inv-TIA are channel and feedback thermal noise, shown in Fig. 3.2 as  $I_{n,ch}^2$  and  $I_{n,RF}^2$ , respectively. The power spectral densities of these two sources can be expressed as:  $I_{n,ch}^2 = 4kT\gamma g_m$  and  $I_{n,RF}^2 = 4kT/R_F$  where k is the Boltzmann constant, T is the temperature in Kelvin and  $\gamma$  is the excess noise factor. Under a constant gain-bandwidth product constraint, the noise-optimum FET size is  $C_I = 0.7C_D$  [16]. Therefore, the transconductance of the TIA's input device can be calculated as  $g_m = 2\pi f_T C_I$  where  $f_T$  is the technology transit frequency at the selected bias point. In Fig. 5.9 (a), the amplifier following the TIA is modeled by  $H_{post}(s)$  and its input-referred noise PSD is denoted by  $V_{n,in}^2 = 4kT/g_{m,post}$ .  $H_{post}(s)$  is given by (9)-(10) for both the proposed and conventional designs, using  $\beta_{pro}(s)$  and  $\beta_{con}(s)$ , respectively. In simulations that follow,  $g_{m,post}$ ,  $\gamma$ , and  $f_T$  are fixed at 10 m $\Omega^{-1}$ , 2, and 150 GHz, repectively.

As explained in chapter 3, linear equalization extends both the signal and the noise bandwidths [28]. Therefore, the integration of the noise power spectral density (PSD) must be performed at the receiver output to take into consideration how the equalizer processes the noise. To do so, the contribution to the output noise PSD from each noise source is first calculated. Because all noise sources are uncorrelated, the total output noise PSD is constructed by adding up all individual power spectra. The total output noise PSD is then integrated up to infinity to calculate the integrated output-referred noise power  $(v_{n,total}^2)$  having units of  $V^2$ . The total integrated input-referred noise power  $(z_{n,total}^2)$  is then determined by dividing the  $v_{n,total}^2$  by the squared effective gain  $(Z_{TIA,eff})^2$  calculated from the VEO at the output of the FE. This gain calculation accounts for the residual ISI in the signal presented to the decision circuit. The input-referred noise current is then calculated as the square root of  $i_{n,total}^2$ . Further discussion about the noise analysis for equalizer-based optical receivers is available in our previously published work [28].



Fig. 5.9. (a) Circuit model used for noise analysis (b) Matlab simulated noise reduction in the proposed FE compared to its conventional counterparts. The arrows indicate the amount of change for each noise component.

## 5.4.4 Performance Comparison

To assess the improvement of the proposed FE versus its conventional counterpart, both FEs are simulated in Matlab. The traditional FE has the same block diagram as in Fig. 5.6 without the pole insertion in the feedback loops. Therefore, its analysis is the same as what is presented earlier but by replacing each  $\beta_{pro}(s)$  in (10) by  $\beta_{con}(s)$ . The value of the TIA's feedback resistor is tuned to set the ratio of  $f_{TIA}/f_{bit}$  to 0.57 and 0.26 for the conventional and the proposed FEs, respectively. In the latter, the values of  $\beta_1$  and  $\omega_2$  are chosen to achieve an overall bandwidth of  $f_{FE} = 0.56 f_{bit}$ . The power consumption and the DC gain of the proposed EMA are kept equal to that of the conventional MA by fixing the values of  $A_1$  and  $\beta_1$  in both circuits. The performance of the two FEs is summarized in Table 5. *1*. Although the two FEs have approximately the same overall bandwidth, the proposed FE achieves a 6 dB higher gain compared to its conventional version. This improvement in the transimpedance gain has resulted from the increased value of  $R_F$  for the limited-bandwidth TIA. It is worth mentioning that this gain improvement comes without any additional power dissipation because changing  $R_F$  and  $\omega_Z$  do not affect the DC power dissipation as will be shown in the practical implementation in the next section.

|                              |                                        | MATLAB <sup>(1)</sup> |       | Spectre <sup>(2)</sup> |      |                                |       |  |
|------------------------------|----------------------------------------|-----------------------|-------|------------------------|------|--------------------------------|-------|--|
|                              |                                        | 10 Gb/s               |       | 10 Gb/s                |      | $20 \text{ Gb}/\text{s}^{(4)}$ |       |  |
|                              |                                        | Con.                  | Pro.  | Con.                   | Pro. | Con.                           | Pro.  |  |
| TIA                          | $R_F(k\Omega)$                         | 1.25                  | 2.5   | 0.7                    | 1.6  | 0.4                            | 0.8   |  |
|                              | $f_{TIA}/f_{bit}$                      | 0.57                  | 0.26  | 0.64                   | 0.27 | 0.68                           | 0.3   |  |
| MA/<br>EMA                   | $\omega_Z/2\pi$ (GHz)                  | 8                     | 2.25  | 8                      | 5.25 | 8                              | 11.47 |  |
|                              | $eta_1$                                | 0.25                  | 0.25  | 0.14                   | 0.14 | 0.15                           | 0.15  |  |
|                              | Peaking (dB)<br>( $a$ ) $f_N$          | 0                     | 5.05  | 0                      | 4.8  | 0                              | 3.5   |  |
| FE                           | $Z_{VEO}(dB\Omega)$                    | 83.6                  | 89.98 | 79.7                   | 87.1 | 71.2                           | 77.2  |  |
|                              | $f_{FE}/f_{bit}$                       | 0.57                  | 0.56  | 0.6                    | 0.61 | 0.59                           | 0.54  |  |
|                              | Peaking (dB)                           | 0                     | 0.084 | 0                      | 0    | 0                              | 0     |  |
|                              | $i_{n,rms} \left( \mu A_{rms} \right)$ | 0.598                 | 0.531 | 1.2                    | 0.95 | 2.41                           | 1.74  |  |
| Sensitivity Improvement (dB) |                                        |                       |       |                        |      |                                |       |  |
|                              | Noise-based                            |                       | 0.52  | -                      | 1    |                                | 1.4   |  |
|                              | <b>PP</b> <sup>(3)</sup>               |                       | 0.61  |                        | 0.5  |                                | 0.84  |  |
|                              | Total                                  |                       | 1.125 |                        | 1.5  |                                | 2.24  |  |

 Table 5. 1: Design parameters and performance summary of the proposed front-end in comparison to its conventional counterpart.

<sup>(1)</sup> Simulations based on Fig. 5.6.

 $^{(2)}$  Simulations based on Fig. 5.10.

<sup>(3)</sup> For  $V_{CDR}^{PP} = 50 \text{ mV}_{pp}$ .

<sup>(4)</sup> The 20 Gb/s simulations are discussed in Section 5.6.3.

The input-referred noise power of both FEs is compared in Fig. 5.9 (b). In the proposed FE, the feedback resistor and the post-amplifier noise powers are improved compared to their counterparts in the conventional design. That is, increasing the value of  $R_F$  in the proposed FE reduces its thermal noise contribution and increases the input-referral gain which suppresses the noise from the follow-on amplifier. The channel noise is slightly increased in the proposed FE due to HFP that amplifies the high-frequency noise. Overall, the presented design technique reduces the input-referred noise current by 11.2 %. The lower noise and higher gain in the presented FE led to 0.52 dB and 0.61 dB improvements in the noise-based sensitivity and the PP compared to the traditional design.

# 5. 5 Circuitry of the Implemented Front-End

Fig. 5.10 (a) shows the block diagram and the circuitry of the implemented front-end. A replica TIA is used to provide pseudo-differential power-supply noise rejection. The TIA is followed by a three-stage EMA. A series resistor ( $R_Z$ ) is inserted in the feedback loops of the second and third stages. This resistor in combination with the parasitic capacitance of the transistor in the feedback loops creates the zero required for bandwidth extension. Compared to Fig. 5.6, the EMA's third stage is added to relax the gain requirements and assist in recovering the bandwidth. A low-pass feedback network (LPFN) is connected between the output of the EMA and the input of the TIA. The LPFN amplifies the difference between the DC levels at  $V_{out}$  and returns a feedback voltage of  $V_F$  that is then converted to a current  $I_{os}$  by the transconductance of  $M_{os}$  and subtracted from the input current for offset compensation. The LPFN is a single-pole RC filter using a Millerboosted 5 pF capacitor and a 1.1 M $\Omega$  resistor. A low cut-off frequency of 1 MHz is achieved as a trade-off between on-chip area and tolerable baseline wander for long runs. The low common-mode voltage at the TIA's output prevents the use of a tail current source for the first differential pair in the EMA's first stage, and therefore, a polysilicon resistor is used instead.

The FE is simulated in TSMC-65nm using Cadence Spectre simulator. The input parasitics are modeled by a pad capacitance of ( $C_{Pad} = 45$  fF), a photodiode capacitance of ( $C_D = 80$  fF) and a bond wire inductance of ( $L_{wire} = 0.5$  nH). The loading from the subsequent output buffer is modeled by a load capacitance of ( $C_L = 150$  fF) connected at the output of the EMA. An additional 50 fF capacitance is added to all nodes to model the wiring and layout parasitic. The receiver's output stage (not shown in Fig. 5.10 (a)) is a conventional differential amplifier with a load resistance of 100  $\Omega$  chosen as a trade-off between output signal amplitude and compatibility with the off-chip 50  $\Omega$  environment.

#### 5.5.1 Validation of Bandwidth Extension

Similar to the previous section, both the proposed and the conventional FEs are simulated and compared. The proposed FE's TIA bandwidth is 27% of the targeted 10 Gb/s data rate. The tail current source in the feedback pair  $I_F$  sets the feedback gain  $\beta_1$  and is chosen to satisfy the power penalty condition. The series resistor  $R_Z$  is then chosen to achieve the required bandwidth extension. The device dimensions and component values are tabulated in Fig. 5.10 for nominal 10 Gb/s operation. The corresponding amplitude responses are shown in Fig. 5.11 (a). The EMA introduces a peaking of 4.8 dB at the Nyquist frequency and restores the bandwidth by a factor of 2.28x, achieving an overall bandwidth of 6.1 GHz.

The simulated group-delay is also shown in Fig. 5.11 (b) where the GDV is within  $\pm 10$  % of the unit interval over the frequency range of interest. Fig. 5.12 (a) and (b) show the 10 Gb/s eye diagrams at the output of the FE when the limited-bandwidth TIA is followed by a wideband MA or by the EMA, respectively. The eye diagrams obtained through simulation demonstrate the capability of the proposed peaking technique in restoring the bandwidth without impairing the low-frequency gain. The bandwidth extension improves the VEO by a factor of 1.7×. Fig. 5.12 (c) shows the eye diagram of the traditional FE. In this simulation,  $R_Z$  is shorted and  $R_F$  is reduced to widen the TIA's bandwidth while the current sources ( $I_F$  and  $I_B$ ) are unchanged. Comparing Fig. 5.12 (b) and (c) shows that the presented design technique improves the effective gain by a factor of 2.34×. Interestingly, for the proposed design, the gain is improved by almost the same amount as the TIA bandwidth is reduced. This emphasizes the linear relation between the gain and the bandwidth in the single-stage Inv-TIA. Fig. 5.1 summarizes the simulated performance of the two FEs where the presented FE shows 1.5 dB better sensitivity compared to its conventionally designed counterpart.

Chapter 5. An Inductorless Power-Efficient Design Technique for Linear Equalization in CMOS Optical Receivers



Fig. 5.10. Block diagram and circuitry of the implemented front-end. Parameter values for 10 Gb/s operation are tabulated.



Fig. 5.11. (a) Simulated amplitude response. (b) Simulated group-delay

110



Fig. 5.12. Simulation results for the 10 *Gb/s* output eye diagrams when the limited-bandwidth TIA is followed by (a) a wideband MA and (b) the proposed EMA. In (c), the TIA's bandwidth is widened and a wideband MA is employed. The input current is fixed at  $15 \mu A_{pp}$  for all simulations.

#### 5.5.2 Sensitivity to Process and Temperature Variations

Fig. 5.13 shows the simulated performance of the presented receiver under process and temperature variations. Fig. 5.13 (a) shows that the EMA exhibits more peaking at lower temperatures. For a given temperature, the peaking can vary by up to 6.5 dB over different process corners. The FE gain and bandwidth in Fig. 5.13 (b) can vary up to 13.5 dB and 3.4 GHz over different corners, respectively. The gain and bandwidth variations relative to their values at room temperature reach up to 24.3 % and 22.5 %, respectively, as the temperature varies from 20 °C to 80 °C. This performance variation is due to the change in transconductance and resistor values over different process corners and temperatures. Temperature-dependent biasing and adaptation techniques can be employed to continuously monitor the output eye diagram and set the circuit parameters accordingly to maintain the best quality for the equalized eye. In the implemented prototype, the TIA's feedback resistor and current sources in the forward and feedback paths are made variable. This allows for post-fabrication control on peaking frequency, peaking magnitude, and TIA's high-frequency roll-off. Therefore, the amplitude responses of both the EMA and the TIA track each other to achieve the targeted bandwidth with minimal GDV.



Fig. 5.13. Simulated performance under process and temperature variations (a) EMA's peaking at Nyquist frequency (b) gain and bandwidth of the overall FE.

### 5.5.3 Stability

In the presence of complex feedback and high amplitude peaking in the EMA, the stability of the presented FE becomes an important consideration. The pole-zero simulation in Fig. 5.5 (a) shows that a pair of complex poles ( $P_A$ ) moves toward the y-axis as  $\omega_z$  is reduced.  $\omega_z$  is the frequency of the introduced zero that ideally cancels the bandwidth-limiting pole created by the low-bandwidth TIA. As a result, the TIA's 3-dB bandwidth cannot be made arbitrarily small to avoid the EMA's pole pair traveling to the right-hand plane (RHP). Further, for a given  $\omega_z$ , the poles  $P_A$  may enter the RHP at excessively large feedback gain  $\beta_1$ . However, the values of  $\beta_1$ . that lead to RHP poles are far from those in the proposed design. For example, in the FE in Fig. 5.6, when  $\omega_z$  is set to  $2\pi f_{bit}/4$ , the poles  $P_A$  do not travel to the RHP until after  $\beta_1 > 6$  and  $\beta_1 > 5.5$  for  $f_{bit}$  of 10 Gb/s and 20 Gb/s, respectively.

# 5. 6 Experimental Validation

Fig. 5.14 (a) shows the micrograph of the prototype chip fabricated in TSMC 65 nm CMOS technology. The chip includes two standalone FEs. One FE is the direct implementation of the circuit in Fig. 5.9 while the other is its conventional version (i.e.,  $R_Z$  is replaced by a short circuit). The total size of the chip is  $1 \text{ mm} \times 0.7 \text{ mm}$ . Each front-end is pad limited and occupies 665 µm × 460 µm (0.31 mm<sup>2</sup>) including the I/O RF pads while the active area, including the offset compensation loop, is about 0.0114 mm<sup>2</sup>. The fabricated chip is packaged in a ceramic quad flat package CQFP80 and is partially wire bonded. The high-speed RF input and output probing pads are differential G-S-G-S-G since each FE has differential inputs and outputs. The TIA, the MA/EMA, and the output buffer are powered by different supplies. The supply voltages and the power breakdown of different blocks in each FE are listed in Table 5.2.

#### 5.6.1 Transient Measurement

The implemented FEs are characterized electrically where a voltage signal is applied to the input and a voltage signal is measured from the output. Each of the implemented FEs acts as a multistage CMOS voltage amplifier. In this scenario, the CMOS inverter with shunt-feedback (SF-Inv) acts as a first gain stage with limited bandwidth. Whether driven by a 50  $\Omega$  voltage source or the current from a PD, the bandwidth of the SF-Inv is reduced by increasing the feedback resistor. The EMA is then responsible for providing more gain and compensating for any bandwidth limitation from the SF-Inv, output buffer, or the last MA's stage that is loaded by a large capacitance from the output buffer. Considering this scenario, electrical measurements are sufficient to demonstrate the capability of the presented peaking technique in restoring the bandwidth.

The test setup used for BER and eye-diagram measurements is shown in Fig. 5.14 (b). The output of an Agilent MP 1800A bit pattern generator (BPG) is attenuated before being applied to one of the SF-Inv's differential inputs while the other input is left floating. One of the amplified differential outputs is detected by the Agilent MP 1800A error analyzer (EA) for BER measurement and by a 30-GHz scope for eye-diagram measurement (one at a time) while the other output is terminated by a 50  $\Omega$  termination. The loss of the cables and connectors is ignored in the measurement results.



Fig. 5.14. (a) Chip micrograph (b) Test setup for electrical characterization.

Fig. 5.15 shows the measured BER as a function of the peak-to-peak input voltage. Three different BER measurements are performed to validate the equalization capability of the presented technique and its performance advantage over its conventional wideband counterpart. The three experiments are described below.

The first experiment is performed on the conventional FE (SF-Inv with wideband MA) and shown in Fig. 5.15 by circle markers. In this experiment, the SF-Inv's bandwidth is set to its minimum by fixing the voltage that controls the gate of the NMOS transistor shunting the feedback resistor to zero (i.e.,  $V_c$  in Fig. 5.10 (a) is set to zero). Then, the FE is optimized to achieve the best sensitivity at a 10 Gb/s data rate with a PRBS31 pattern. The bandwidth limitation of this FE manifests itself in several ways. First, the slow slope of the line with circle markers demonstrates that the performance is limited by the ISI, not by the noise. Second, the FE achieves poor sensitivity of 17 mV<sub>pp</sub> for a BER of  $10^{-12}$  at 10 Gb/s.



Fig. 5.15. Electrically measured BER as a function of input voltage amplitude for PRBS pattern length of 31. The inset shows the measured 10 Gb/s single-ended eye diagrams for both the conventional (black) and the proposed (white) FEs. The eye diagrams are measured for an input voltage set to the receiver's sensitivity limit and a PRBS31pattern.

In the second experiment, a similar set of measurements is applied to the proposed FE (SF-Inv with EMA). In this setup, the SF-Inv's bandwidth is kept to its minimum ( $V_c = 0$ ). The results are shown in Fig. 5.15 by the diamond markers. In comparison to the measurements obtained from the bandlimited conventional FE (circle markers), a significant improvement in sensitivity and steeper slope are observed for the proposed FE. To maintain a BER of  $10^{-12}$  at 10 Gb/s data rate, an input voltage of only 9 mV<sub>pp</sub> is required for the proposed FE in comparison to 17 mV<sub>pp</sub> is required for the bandlimited conventional one. These measurements demonstrate the effectiveness of the introduced peaking technique in widening the bandwidth to mitigate the ISI.

The first experiment showed that the sensitivity of the conventional FE is severely affected by its limited bandwidth. Therefore, in this third experiment, the bandwidth of the SF-Inv is extended by increasing  $V_c$  to 0.8 V as compared to  $V_c = 0$  in the first two setups. The measurements in this experiment are taken from another die that includes an identical copy of the conventional FE but with the 50  $\Omega$  output buffer replaced by four interleaved quarter-rate CML latches. An injection-locked oscillator and clock distribution circuitry are also included to provide the required clock for the latches. The latches and the clocking circuitry are tested separately (i.e., without the analog FE) and they are found to operate up to 12.5 Gb/s with minimum input to the latches of 40 mV<sub>pp</sub> for BER of 10<sup>-12</sup>. However, when the analog FE is included, a maximum data rate of 8 Gb/s is

115

|                                            |                |          | Conventional         |                |  |  |  |
|--------------------------------------------|----------------|----------|----------------------|----------------|--|--|--|
|                                            |                | Proposed | Limited<br>Bandwidth | Wide Bandwidth |  |  |  |
| 17                                         | TIA            | 1.2      | 1                    | 1              |  |  |  |
| V <sub>DD</sub>                            | MA/E-MA        | 1        | 1                    | 1              |  |  |  |
| (V)                                        | Buffer         | 1.2      | 1.15                 |                |  |  |  |
| л                                          | TIA            | 6.5      | 3                    | 5              |  |  |  |
| $P_{DC}$                                   | MA/EMA         | 19       | 17                   | 10             |  |  |  |
| (IIIVV)                                    | Buffer         | 8        | 7                    |                |  |  |  |
| <i>V<sub>C</sub></i> (V)                   |                | 0        | 0                    | 0.8            |  |  |  |
| Input impedat                              | nce $(Z_{11})$ | 90 Ω     | 92 Ω                 | 69.5 Ω         |  |  |  |
| Performance Summary                        |                |          |                      |                |  |  |  |
| Data Rate (Gb/s)                           |                | 10       | 10                   | 8              |  |  |  |
| Sensitivity (mV <sub>pp</sub> )            |                | 9        | 17                   | 10             |  |  |  |
| Power (mW)                                 |                | 25.5     | 20                   | 15             |  |  |  |
| Diff. Output voltage amplitude $(mV_{pp})$ |                | 664      | 602                  | NA             |  |  |  |

Table 5.2: Performance comparison of the three measured FEs.

obtained (even with the increased  $V_c$ ) with an input voltage of 10 mV<sub>pp</sub> for BER of  $10^{-12}$ . The BER measurements from this setup are shown in Fig. 5.15 by the line with triangle markers. The proposed peaking technique succeeded to increase the operation speed by a factor of  $1.25 \times$  while achieving 1 mV<sub>pp</sub> better sensitivity compared to the wideband conventional design approach. The performance of all measured FEs is summarized and compared in Table 5.2. Despite showing better energy efficiency, the conventional wideband FE cannot support the 10 Gb/s operation even with the high-power setting used for the proposed FE. Finally, it is verified by extracted simulations that the four interleaved latches introduce less capacitive loading than the 50 $\Omega$  output buffer, meaning that the different loading is not the cause of the lower speed obtained in this setup. Further details about the conventional design with on-chip latches are available in [46].

The input impedances of the above described FEs are also compared in Table 5.2. Simulation results indicate that the limited-bandwidth conventional FE has the largest input impedance of 92  $\Omega$ , sufficiently low to not introduce a significant mismatch in the presence of the 50  $\Omega$  probe impedance. Further, the variation of input impedance between the different FEs is limited to less than 22% over the entire frequency of interest. The variation is sufficiently small not to change the conclusions drawn from the voltage-mode sensitivity comparisons.



Fig. 5.16. (a) Bathtub curves measured at 10 Gb/s and PRBS pattern length of 31 (y-axis is shown in log scale) (b) receiver sensitivity as a function of the input PRBS length.

The inset of Fig. 5.15 shows the measured 10 Gb/s single-ended eye diagrams for both the bandlimited conventional (black) and the proposed (white) FEs. For an input voltage set to the receiver's sensitivity limit and a PRBS31 pattern, they respectively show measured eye width of 82.4 ps and 82 ps and measured output peak-to-peak voltage of 301 mV<sub>pp</sub> and 332 mV<sub>pp</sub> across the 50  $\Omega$  input impedance of the scope.

The effect of sampling phase error on the BER is investigated by plotting the bathtub curve for both FEs as shown in Fig. 5.16 (a). In this test, the input voltage is fixed at  $1 \text{ mV}_{pp}$  above the sensitivity level with a PRBS31 pattern. At 10 Gb/s, both FEs show BER better than  $10^{-12}$  even with a sampling time error of  $\pm 10\%$  UI. The widely open eye diagram and bathtub curve obtained in Fig. 5.15 and Fig. 5.16 demonstrate that the introduced peaking (and the resultant GDV) in the proposed FE does not degrade its performance. The effect of changing the length of the input PRBS on the sensitivity is also investigated for both FEs as shown in Fig. 5.16 (b). In both designs, the sensitivity is improved by less than  $1 \text{ mV}_{pp}$  when the PRBS length is reduced from 31 to 7. This indicates that the lower cut-off frequency introduced by the offset network is sufficiently low not to limit the performance of the receiver.

#### 5.6.2 Noise Measurement

To further characterize the sensitivity of the proposed front-end, the noise standard deviation is measured at the receiver output in the absence of an input signal. The total standard deviation  $(\sigma_{total})$  is 16 mV<sub>rms</sub>. The disconnected scope has a negligible noise  $(\sigma_{scope})$  of 0.25 mV<sub>rms</sub>. Then, the receiver noise is calculated from  $\sigma_{RX} = (\sigma_{total}^2 - \sigma_{scope}^2)^{0.5}$  to be 16 mV<sub>rms</sub> [47]. Referring this noise to the input using the gain calculated from the measured eye diagram in Fig. 5.15, the receiver has an input-referred noise voltage of 0.43 mV<sub>rms</sub> that translates to a sensitivity of 6.1 mV<sub>pp</sub>. The difference between this sensitivity and the value obtained from the BER measurements in Fig. 5.16 is due to the thermal noise contribution from the 50  $\Omega$  resistance of the measurement equipment connected to the TIA's input in BER measurements [14].

The output noise can also be referred to the input by the simulated transimpedance gain to calculate the input-referred noise current as  $i_{n,rms} = 2\sigma_{RX}/Z_{FE} = 1.313 \,\mu\text{A}_{rms}$ , where  $Z_{FE}$  is the midband value of the FE's amplitude response in Fig. 5.11 (a) and the factor 2 is due to the differential implementation of the FE [48]. The impact of the photodiode capacitance is considered on the simulated gain but not in the noise measurements that account only for the impact of the circuit's input capacitance and the pad capacitance. In [25], it has been shown that the current input-referred noise power is linearly proportional to the total input capacitance for a given bandwidth, technology, and shape of the TIA's amplitude response. Therefore, to account for the impact of the impact of the photodiode capacitance  $C_{PD}$ , the calculated  $i_{n,rms}$  must be scaled by a factor of

 $\sqrt{(C_{PD} + C_I + C_{Pad})/(C_I + C_{Pad})}$ . An input-referred current of 1.61 µA<sub>rms</sub> is anticipated for  $C_{PD}$  of 80 fF and  $C_I + C_{Pad}$  of 160 fF estimated from post-layout simulations. This calculation assumes that when the photodiode is connected at the input, the feedback resistor will be reduced to maintain a fixed bandwidth and shape of the amplitude response.

## 5.6.3 Discussion and Comparison to Prior Work

The performance of the proposed FE is compared to other 10 Gb/s high-gain receivers in the literature as shown in Table 5.3. Although electrical measurements are sufficient to prove the concept behind our design, the absence of optical measurements complicates the comparison with the prior art. Therefore, when possible, only electrical measurements are considered for a fair

118

comparison. The work in [10] is a good example to start with which consists of an Inv-TIA followed by a three-stage Inv-based Cherry-Hooper voltage amplifier. In this architecture, active interleaving feedback and local positive feedback are applied to extend the bandwidth. The circuit is implemented in a single-ended structure and measured with electrical and optical inputs for various data rates. Only electrical measurements at 10 Gb/s are listed in Table 5.3. The work in [10] is measured for two modes of operation denoted in Table 5.3 by best sensitivity mode and lowest power mode (see Fig.18 in [10]). The average of these two modes shows approximately  $2\times$  better sensitivity and  $2.3\times$  better energy efficiency compared to the work presented here. The reason for this better performance is mainly because of the single-ended structure used in this work. Further, the single-ended implementation enabled measurements at low supply voltages which is not available in this work due to the DC biasing requirements on differential amplifiers. The proposed design has a much higher output peak-to-peak amplitude at the sensitivity level than [10] that is not optimized for high-gain operation and incurs a significant PP when the receiver is followed by a practical decision circuit.

The presented receiver shows better energy efficiency than [48] that is implemented in a more advanced technology node and a comparable energy efficiency to [25] which is implemented in the same technology. The combination of multistage shunt-feedback TIA and the noiseless DFE in [25] has resulted in an excellent sensitivity at the cost of more complexity and power dissipation on the equalizer that consumes 74 % of the total power. Therefore, a design that incorporates the high-gain FE in [25] with our proposed equalization technique with no additional power dissipation could lead to significant improvement in the energy-efficiency of the receiver while maintaining a good sensitivity. The work presented here shows comparable voltage sensitivity to the limiting amplifier introduced in [13] built by applying active interleaving feedback to third-order gain cells. Finally, our work shows the largest output voltage amplitude for an input set to the sensitivity limit which makes it suitable to drive the subsequent clock and data recovery (CDR) circuit with a negligible power penalty.

| Performance<br>parameter              |                      | [13]  | [25]  | [1<br>Lowest<br>power | 0]<br>Best<br>Sens.  | [48]               | This<br>work      |
|---------------------------------------|----------------------|-------|-------|-----------------------|----------------------|--------------------|-------------------|
| RX topology                           |                      | Diff. | Diff. | Sing.                 | Sing.                | Diff.              | Diff.             |
| Passive Inductor                      |                      | No    | No    | No                    | No                   | Yes                | No                |
| CMOS Tech. (nm)                       |                      | 130   | 65    | 65                    | 65                   | 40                 | 65                |
| $f_T$ (GHz)                           |                      | 85    | 150   | 150                   | 150                  | 250                | 150               |
| Data rate (Gb/s)                      |                      | 10    | 10    | 10                    | 10                   | 10                 | 10                |
| $C_{PD}(fF)$                          |                      | NA    | 50    | 60 <sup>(2)</sup>     | 60 <sup>(2)</sup>    | 100 <sup>(1)</sup> | 60 <sup>(2)</sup> |
| PRBS Length                           |                      | 31    | 31    | 7                     | 7                    | 7                  | 31                |
| Sens.                                 | $(\mathbf{mV}_{pp})$ | 10    |       | 5                     | 3                    |                    | 9                 |
|                                       | $(\mu A_{pp})$       |       | 13    |                       |                      | 23.9(3)            | <u>22.5</u>       |
| Output voltage<br>(mV <sub>pp</sub> ) |                      | `175  | 400   | 15.85 <sup>(4)</sup>  | 53.55 <sup>(4)</sup> | 136                | 664               |
| Energy Efficiency<br>(pj/b)           |                      | 18.9  | 2.3   | 0.6                   | 1.6                  | 7.5                | 2.55              |

Table 5.3: Performance comparison with published 10 Gb/s receivers.

<sup>(1)</sup> On-chip capacitor is added to consider the effect of the PD junction capacitance.

<sup>(2)</sup> Parasitic capacitance due to probing pad and wiring

<sup>(3)</sup> Calculated from the average input-referred noise current

<sup>(4)</sup> Calculated from measured eye diagrams that are not shown in [10].

## 5.6.4 Operation at Higher Data Rate

The circuit in Fig. 5.10 is also examined for 20 Gb/s operation with the same simulation setups described in Section 5.5.1. First, the TIA's bandwidth is set to 6 GHz (30% of the targeted data rate) by employing a feedback resistor of 800  $\Omega$ . Then the limited-bandwidth TIA is followed by a wideband MA and the EMA one at a time. Both amplifiers have the same value of  $I_B$  and  $I_f$  and therefore they consume the same DC power. The MA has a flat amplitude response with a bandwidth of 18.7 GHz. However, the overall bandwidth of the combined TIA/MA is dominated by the TIA's bandwidth. The EMA, on the other hand, introduces 3.5 dB of amplitude peaking at 10 GHz that extends the overall bandwidth of the combined TIA/EMA to 10.9 GHz. Fig. 5.17 (a) and (b) show the simulation results for the output eye diagram for both scenarios.



Fig. 5.17. Simulation results for the 20 *Gb/s* output eye diagrams when the limited-bandwidth TIA is followed by (a) a wideband MA (b) the proposed EMA (b). In (c), the TIA's bandwidth is widened and a wideband MA is employed. The input current is fixed at  $25 \mu A_{pp}$  for all simulations.

The internal eye-opening improves by  $1.6 \times$  when the EMA is employed compared to the case in which the wideband MA is used, demonstrating the capability of the presented technique in restoring the targeted bandwidth. The eye diagram in Fig. 5.17 (c) is obtained from the FE that includes TIA/MA after extending the TIA's bandwidth to 13.5 GHz by reducing its feedback resistor to 400  $\Omega$ , achieving an overall bandwidth of 11.8 GHz. Comparing (b) and (c) emphasizes that the presented design technique improves the effective gain compared to its conventional widebandwidth counterpart. The performance of the proposed FE at 20 Gb/s in comparison to its conventional counterpart is summarized in Table 5. 1.

### 5.6.5 Operation with Large Input Signal

The presented analysis assumes that the gain cells are in linear operation. In reality, the circuit performance is strongly affected by the signal amplitude. As the signal propagates through cascaded stages, the latter gain cells start to saturate as a result of the increased voltage swing. Eventually, these cells act as unity-gain buffers, and consequently, the loop-gain falls below unity due to the presence of active feedback. This in turn reduces the bandwidth. The impact of large input levels on the bandwidth of the active feedback-based structure is observed in [13] and an inverse scaling technique [37] is proposed as a potential solution for the problem. However,

121

inverse scaling complicates the system analysis, especially in the presence of interleaving feedback.

Alternatively, a straightforward automatic gain control similar to that presented in [23] can be employed. The technique has three steps; 1) aggressively reducing the TIA's gain at the cost of introducing a severe peaking in its amplitude response; 2) re-configure one of MA stages to act as a low-pass filter to suppress the TIA's peaking and set the receiver bandwidth; 3) increasing the transconductance of the active feedback cell in the remaining MA stages to reduce their gain. In other words, at very high inputs, the TIA and the EMA interchange their roles. That is the TIA introduces a high-frequency peaking that is then suppressed by the subsequent low-bandwidth amplifier. Fig. 5.18 (a) and (b) show the simulation results for output eye diagrams when the input is set to 1 mA<sub>pp</sub> at 10 Gb/s and 20 Gb/s, respectively. To generate these eyes, the TIA's feedback resistor is reduced to 60  $\Omega$  and the LPFs are de-embedded from the EMA circuit. Despite the 7 dB of peaking in the TIA's amplitude response, the overall FE shows a flat amplitude response and a bandwidth of 12 GHz. The eye is fully open at 10 Gb/s. At 20 Gb/s, the internal eye-opening is better than 60% of the maximum value. At both data rates, the eye-opening is larger than it was at the sensitivity level. The widened eyes demonstrate the capability of the circuit to handle large input signals.



Fig. 5.18. Simulation results for the output eye diagram when the input current is set to  $1 \ mA_{pp}$  at (a) 10 Gb/s (b) 20 Gb/s.

# 5.7 Conclusions

A design technique that mitigates the trade-off between gain and bandwidth in CMOS multi-stage amplifiers has been presented. To improve gain and reduce noise, the transimpedance amplifier is designed with a larger feedback resistor and its bandwidth limitation is compensated by a followon equalizing main amplifier (EMA). The EMA leverages the improved performance of state-ofthe-art active-feedback main amplifier designs but with the added benefit of high-frequency peaking. By embedding the equalizer stage in the gain stage, the overall circuit attains the improved performance of traditional equalizer-based designs while achieving better energy efficiency due to the elimination of the standalone equalizer stage.

Both the conventional and the proposed receiver FEs are implemented in TSMC 65 nm CMOS technology. The electrical measurements at 10 Gb/s show that utilizing the EMA after the limited-bandwidth SF-Inv instead of the traditional wideband MA improves the sensitivity by 2.76 dB, demonstrating the capability of the proposed technique in restoring the targeted bandwidth. The presented FE achieves an energy-efficiency of 2.55 pJ/bit. The single-ended output eye diagram has a vertical opening of 332 mV<sub>pp</sub> which is sufficiently large to drive a subsequent decision circuit with a negligible power penalty. Simulation results also verify that the presented FE functional properly at 20 Gb/s and large input signals.

# Chapter 6

# Conclusions and Future Work

In this thesis, we have investigated the design of high-speed, area- and power-efficient receiver circuits for short-reach optical links. For that purpose, three main research directions have been presented. In the first research direction in Chapter 3, we have studied optical receiver front-ends that are intentionally designed to have a bandwidth much lower than the targeted data rate. Then we provided a methodology to accurately calculate the noise performance of these receivers depending on the type of equalizer used. In the second direction in Chapter 4, the power-sensitivity trade-off in the optical receiver has been explored to minimize the link's overall power dissipation. Finally in Chapter 5, in the third direction, an inductorless power-efficient design technique for linear equalization in optical receivers has been presented and a prototype chip has been fabricated in 65 nm CMOS technology.

# 6.1 Thesis Highlights

First, we proposed a novel methodology for noise optimization in equalizer-based optical receivers. The proposed notion of the effective gain is used to calculate the input-referred-noise. This effective gain accounts for the gain reduction due to the introduced ISI and insufficient settling time in narrow-bandwidth front-ends. The proposed calculation of the noise bandwidths considers how the TIA's noise is processed by the subsequent equalizer. The proposed optimization model allows designers to compare the noise performance of different receiver architectures for a given technology, photodiode capacitance, and data rate. Based on this model, the integrated input-referred noise is derived and compared for front-ends using DFEs, CTLEs, and FFEs. In each case, the TIA's pole Q is chosen to optimize the noise performance depending on the receiver architecture. It has been shown that DFEs enable the lowest input-referred noise.

Second, we explored the power-sensitivity trade-off in optical receivers. Traditionally, optical receivers with FET front-ends are designed for optimized noise-based sensitivity by matching the circuit's input capacitance ( $C_I$ ) to the total parasitic capacitance ( $C_D$ ) at the input node. However, maintaining this capacitive matching rule at high values of  $C_D$  leads to excessive power dissipation in the receiver. It also degrades the gain which increases the power penalty incurred by the voltage amplitude requirements of the decision circuit. In the second research direction, the trade-off between sensitivity and power dissipation of the receiver was optimized to reduce the energy consumption per bit of the overall link. Design trade-offs for the receiver, transmitter, and the overall link were presented, and comparisons were made to study how small (noisy) the receiver can become before its power reduction is offset by the transmitter's increase in power. Simulation results showed that energy-efficient links require low-power receivers with input capacitance much smaller than that needed for noise-optimum performance.

125

Finally, we presented the design and demonstration of a novel design technique of linear equalization in optical receivers. In this research direction, we showed that receivers with decision feedback equalizers (DFEs) achieve the best noise performance while continuous-time linear equalizers (CTLEs)-based receivers provide the highest gain. Therefore, considering the receiver's overall sensitivity, CTLE-based receivers become favorable for applications that require high gain in the receiver front-end. Conventionally, CTLEs are designed by cascading several inductively peaked stages which leads to a significant area and power overhead. To get around these limitations, the peaking is realized by adding a pole in the feedback paths of an active feedbackbased wideband amplifier. By embedding the peaking in the main amplifier (MA), the front-end meets the sensitivity and gain of conventional CTLE-based receivers with better energy efficiency by eliminating the equalizer stages. A receiver front-end (FE) that employs a high-gain narrowband transimpedance amplifier (TIA) followed by the proposed equalizing main amplifier (EMA) was simulated in TSMC 65 nm CMOS technology, targeting 20 Gb/s. The EMA provides a high-frequency peaking to extend the FE's bandwidth from 25 % to 60 % of the targeted data rate. The proposed FE achieves 6 dB higher gain and 2.24 dB better sensitivity compared to a conventional wideband FE that consumes the same power.

# 6.2 Potential Areas for Future Work

### 6.2.1 Extension of the Proposed Equalization Technique

### **Optical Measurements**

The presented equalization technique in Chapter 5 is measured electrically considering that the optical interface is not the main focus of the work. It only serves as an input to the proposed peaking technique. The presented technique is also applicable to the design of a multistage voltage amplifier. Further, whether driven by a 50-Ohm voltage source or by a current from a photodiode, the bandwidth of the inverter with shunt feedback is reduced by increasing its feedback resistor. Furthermore, the proposed modification is in the main amplifier that has a voltage-to-voltage transfer function. These points make electrical measurements sufficient to validate the concept behind our work. However, optical measurements would validate the technique in the identical context for which it is designed.

### **Monitoring and Self-Adaptation**

The performance of the designed front-end in Chapter 5 is sensitive to process, voltage, and temperature (PVT) variations. This sensitivity is due to the change in transconductance and resistor values over different process corners and temperatures. Temperature-dependent biasing and adaptation techniques can be employed to continuously monitor the output eye diagram and set the circuit parameters accordingly to maintain the best quality for the equalized eye. Further discussion about adaption techniques is presented in Section 6.2.3.

#### **Other Implementations**

The proposed equalizing main amplifier in Chapter 5 is modified based on the third-order gain stage in [14]. This peaking can be also realized by inserting a pole in the feedback loop of various possible designs of active feedback-based MA architectures [9] [10] [13] [42]. This in addition to using non-identical active feedback in the cascaded stages can lead to improved performance in terms of group delay variation, sensitivity to PVT variations, and capability of restoring the targeted wide bandwidth.

The presented front-end can be also integrated with decision circuits. The gain of the front-end can be adjusted to achieve the best sensitivity to emphasize the importance of having a high gain in the front-end in the presence of the voltage amplitude requirements of the decision circuit. Also, the proposed peaking technique can be implemented in more advanced technology node technology to support a faster data rate.

#### 6.2.2 Design of Receiver Circuits of Higher Modulation Schemes

So far in this document, binary NRZ signaling has been assumed for data encoding. In NRZ, the signal is high for the entire bit period  $(T_b)$  to transmit a logic "1" and low for the entire bit period to transmit a logic "0". 6.1 (a) and (b) show the NRZ amplitude levels and eye diagram, respectively. In this signaling scheme, to double the number of transmitted bits in each unit interval, the number of channels must be doubled. This in turn doubles both power consumption and hardware requirements. Alternatively, the data rate can be increased by encoding more data into the same timeframe. This can be achieved by using multi-level signaling or equivalently pulse amplitude modulation (PAM). For example, the PAM-4 shown in 6.1 (c) has four amplitude levels. Compared to NRZ (also referred to as PAM-2), PAM-4 doubles the channel throughput because each level "symbol" conveys two bits of information, (i.e., 20 GBaud/s PAM-4 is 40 Gb/s NRZ). As shown in 6.1 (d) the four voltage levels in PAM-4 create three eyes. In contrast, NRZ has only one eye as shown in 6.1 (b). This means that a PAM-4 receiver has one-third smaller vertical eye-opening compared to the NRZ receiver, assuming constant transmitter swing in both cases. Consequently, the PAM-4 receiver has a smaller signal-to-noise ratio and is more susceptible to noise. The horizontal eye-opening in PAM-4 signaling is supposed to be wider than that of NRZ signaling. However, the transition between non-adjacent levels in PAM-4 eye takes a longer time than the transition between adjacent levels in the NRZ eye. This in addition to deterministic and random jitter makes the horizontal eye opening slightly narrower in PAM-4 system. The vertically and horizontally reduced eye in PAM-4 system makes the receiver design crucial.

Very low-noise, linear, and broadband are desirable features in an analog front-end employed in PAM-4 receiver. Operation with low voltage, robustness against process and temperature variation, and small silicon area are also desirable features for PAM-4 receivers to achieve a performance advantage over NRZ receivers. PAM-4 receivers can be seen as an extension for the research directions presented in this thesis as follows

• The noise analysis presented in Chapter 3 can be extended to equalizer-based receivers designed for PMA-4 signaling. The four voltage levels in PAM-4 create three eyes which makes it more important to investigate the effective gain to accurately calculate the vertical opening and the signal-to-noise ratio of each eye.


6.1. NRZ versus PAM-4 signaling schemes (a) NRZ amplitude levels, (b) NRZ eye diagram, (c) PAM-4 amplitude levels, and (d) PAM-4 eye diagram.

- The investigation of the power-sensitivity trade-off presented in PAM-4 receivers for VCSELbased optical links. In this investigation, special care must be given to the nonlinearity introduced by the VCSEL which may incur more power in the transmitter and may limit the capability of reducing the power dissipation in the receiver.
- The front-end presented in Chapter 5 provides a high gain that can mitigate the problem of the reduced eye-opening in PAM-4 receivers. However, due to the presence of active feedback, the linearity of this front-end must be carefully examined before being employed for PAM-4 receivers.

## 6.2.3 Design of Adaptive Receiver Circuits for Optimized Link Performance

Multi-mode fiber (MMF) provides a cost-efficient solution for short-reach optical links up to 300 m. Compared to its single-mode fiber (SMF) counterpart, MMF has a larger inner core diameter which enables the use of optical connectors with relaxed tolerance and inexpensive optical component. However, MMF experiences significant variations in ISI characteristics and channel pulse response from fiber to fiber and also over time. Due to these variations, some channels in a multi-channel system may require receiver circuits with improved sensitivity or wider bandwidth. Using a single receiver designed for the worst-case link budget would result in power wastage and overdesign in channels that operate in better-than-worst-case conditions. Therefore, adaptability is a crucial feature to be added to the receiver in MMF-based links.

Monitoring the quality of the received signal is the key to any adaptation technique. Bit-error rate (number of bits detected in error relative to the total number of transmitted bits) is considered as the ideal performance metric to make adaption decisions. However, receiver-side BER measurement is not possible unless a training sequence is available. Therefore, BER-indicative parameters are usually used for adaption.

For the non-return-to-zero (NRZ) data pattern, the transmitted noiseless data is represented by voltage levels of  $\mu_1$  and  $\mu_0$  for logic one and logic zero, respectively, as shown in Fig. 6.2 (a). At the receiver side, the signal is distorted by noise and ISI which causes the received signal amplitude to spread over a range of values as shown in Fig. 6.2 (b). The received signal is no longer confined at two specific voltage levels but is instead has a Gaussian distribution (assuming additive Gaussian noise) [49]. The standard deviations and the mean values of the received ones/zeroes are denoted by  $\sigma_1$ ,  $\mu_1$  and  $\sigma_0$  and  $\mu_0$ . The BER is given by

$$BER = Q\left(\frac{|\mu_1 - \mu_0|}{\sigma_1 + \sigma_0}\right) \tag{6.1}$$

where Q is called the Q-function and is given by

$$Q(x) = \frac{1}{\sqrt{2\pi}} \int_{x}^{\infty} e^{\frac{-u^2}{2}} du$$
 (6.2)

The Q-factor can be measured directly from the received electrical signal without the need for a training sequence. Having this indicative parameter in hand allows us to monitor the transmission performance and make the required adaption decisions.

Recently, several techniques have been proposed to monitor the eye-opening at the decision circuit input and infer information about the Q-factor (or equivalently the BER) [49] [50] [51] [52]. These techniques have different characteristics in terms of accuracy, convergence time, and hardware requirements. For example, the technique in [51] automatically adapts the control signal of an equalizer by examining the probability density function (PDF) of the received data. The technique aims at minimizing the spreading of the PDF of the received signal while adding minimal complexity and power dissipation overhead. The technique is successfully demonstrated in a wireline receiver fabricated in 65-nm CMOS technology.



Fig. 6.2. Waveforms and power distributions of (a) noiseless transmitted signal (b) noisy received signal.

Adaptation techniques in the literature need to be carefully compared to select the scheme that best fits our application. The chosen scheme will be integrated into an optical receiver that self-adapts for variations in the MMF fiber channel to optimize the overall link performance.

## References

- [1] Cisco, "Cisco global cloud index: Forecast and methodology, 2015–2020," 2016.
- [2] Cisco, "The Zettabyte era: Trends and analysis," June 2017.
- [3] J. A. Kash, et al., "Optical interconnects in exascale supercomputers," in 23rd Annual Meeting of the IEEE Photonics Society, Denver, CO, 2010.
- [4] H. Liu, C. F. Lam, and C. Johnson, "Scaling optical interconnects in datacenter networks opportunities and challenges for WDM," in *IEEE Symp. on High Performance Interconnects*, Mountain View, CA, 2010.
- [5] J. A. Tatum, *et al.*, "VCSEL-based interconnects for current and future data centers," *J. Lightw. Technol.*, vol. 33, no. 4, pp. 727-732, Feb.15, 2015.
- [6] B. Razavi, Design of integrated circuits for optical communications, New York: McGraw-Hill, 2003.
- [7] E. Sackinger, "The transimpedance limit," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 8, pp. 1848-1856, Aug. 2010.
- [8] E. Säckinger, Broadband circuits for optical fiber communication, New Jersey: John Wiley & Sons, 2005.
- [9] S. Galal and B. Razavi, "10-Gb/s limiting amplifier and laser/modulator driver in 0.18-/spl mu/m CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2138-2146, Dec. 2003.
- [10] M. M. P. Fard, O. Liboiron-Ladouceur, and G. E. R. Cowan, "1.23-pJ/bit 25-Gb/s inductor-less optical receiver with low-voltage silicon photodetector," *IEEE J. Solid-State Circuits*, vol. 53, no. 6, pp. 1793-1805, June 2018.
- [11] I. Ozkaya, et al., "A 64-Gb/s 1.4-pJ/b NRZ optical receiver data-path in 14-nm CMOS FinFET," IEEE J. Solid-State Circuits, vol. 52, no. 12, pp. 3458-3473, Dec. 2017.
- [12] J. Proesel, C. Schow, and A. Rylyakov, "25Gb/s 3.6pJ/b and 15Gb/s 1.37pJ/b VCSEL-based optical links in 90nm CMOS," in *IEEE int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, 2012.
- [13] H. Huang, J. Chien, and L. Lu, "A 10-Gb/s inductorless CMOS limiting amplifier with third-order interleaving active feedback," *IEEE J. Solid-State Circuits*, vol. 42, no. 5, pp. 1111-1120, May 2007.
- [14] S. Ray and M. M. Hella, "A 53 dBΩ 7-GHz inductorless transimpedance amplifier and a 1-THz+ GBP limiting amplifier in 0.13-µm CMOS," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 8, pp. 2365-2377, Aug. 2018.
- [15] S. Ray, A. Chowdhury, and M. M. Hella, "Enhancing the stability of broadband amplifiers using third-order nested feedback," in *Proc. IEEE ISCAS*, Florence, 2018.
- [16] E. Sackinger, "On the noise optimum of FET broadband transimpedance amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 12, pp. 2881-2889, Dec. 2012.
- [17] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s transceiver for optical interconnects," *IEEE J. of Solid-State Circuits*, vol. 43, no. 5, pp. 1235-1246, May 2008.
- [18] M. H. Nazari and A. Emami-Neyestanak, "A 24-Gb/s double-sampling receiver for ultra-low-power optical communication," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 344-357, Feb. 2013.
- [19] S. Saeedi and A. Emami, "A 25Gb/s 170µW/Gb/s optical receiver in 28nm CMOS for chip-to-chip optical communication," in *IEEE Radio Frequency Integrated Circuits Symp.*, Tampa, FL, pp. 283-286, 2014.

- [20] S. Saeedi, S. Menezo, G. Pares, and A. Emami, "A 25 Gb/s 3D-integrated CMOS/silicon-photonic receiver for low-power high-sensitivity optical communication," *J. Lightw. Technol.*, vol. 43, no. 12, pp. 2924-2933, June, 2016.
- [21] M. Madani and G. E. R. Cowan, "10 Gb/s optical receiver with continuous-time feed-forward equalization," in IEEE 60th Int. Midwest Symp. Circuits Syst. (MWSCAS), Boston, MA, 2017.
- [22] D. Abd-elrahman, O. Liboiron-Ladouceur, and G. Cowan, "Low-noise optical receiver front-end using narrowbandwidth TIA and cascaded linear equalizer," in *Proc. IEEE 60th Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Boston, MA, pp. 735-738, 2107.
- [23] D. Li, et al., "A low-noise design technique for high-speed CMOS optical receivers," IEEE J. Solid-State Circuits, vol. 49, no. 6, pp. 1437-1447, June 2014.
- [24] A. V. Rylyakov, C. L. Schow, and J. A. Kash, "A new ultra-high sensitivity, low-power optical receiver based on a decision-feedback equalizer," in *Proc. Opt. Fiber Commun. Conf. Exhib. (OFC)*, Los Angeles, CA, pp. 1-3, 2011.
- [25] M. G. Ahmed, et al., "A 12-Gb/s -16.8-dBm OMA sensitivity 23-mW optical receiver in 65-nm CMOS," IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 445-457, Feb. 2018.
- [26] J. Proesel, A. Rylyakov, and C. Schow, "Optical receivers using DFE-IIR equalization," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, 2013.
- [27] A. Sharif-Bakhtiar and A. Chan Carusone, "A 20 Gb/s CMOS optical receiver with limited-bandwidth front end and local feedback IIR-DFE," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 11, pp. 2679-2689, Nov. 2016.
- [28] D. Abdelrahman and G. E. R. Cowan, "Noise analysis and design considerations for equalizer-based optical receivers," *IEEE Trans. Circuits Syst. I, Reg. Papers,* vol. 66, no. 8, pp. 3201-3212, Aug. 2019.
- [29] A. Emami, "Optical interconnects: Design and analysis," in *Proc. Opt. Fiber Commun. Conf. Exhib. (OFC)*, Los Angeles, CA, 2017.
- [30] J. E. Proesel, B. G. Lee, A. V. Rylyakov, C. W. Baks, and C. L. Schow, "Ultra-low-power 10 to 28.5 Gb/s CMOS-driven VCSEL-based optical links [Invited]," *IEEE/OSA J. Opt. Commun. Netw.*, vol. 4, no. 11, pp. B114-B123, Nov. 2012.
- [31] J. E. Proesel, B. G. Lee, C. W. Baks, and C. L. Schow, "35-Gb/s VCSEL-based optical link using 32-nm SOI CMOS circuits," in *Proc. Opt. Fiber Commun. Conf. Exhib. (OFC)*, Anaheim, CA, 2013.
- [32] A. H. Ahmed, A. Sharkia, B. Casper, S. Mirabbasi, and S. Shekhar, "Silicon-photonics microring links for datacenters—Challenges and opportunities," *IEEE J. Sel. Topics Quantum Electron*, vol. 22, no. 6, pp. 194-203, Nov.-Dec. 2016.
- [33] K. T. Settaluri, C. Lalau-Keraly, E. Yablonovitch, and V. Stojanović, "First principles optimization of optoelectronic communication links," *IEEE Trans. on Circuits Syst. I, Reg. Papers*, vol. 64, no. 5, pp. 1270-1283, May 2017.
- [34] K. Lakshmikumar, A. Kurylak, M. Nagaraju, R. Booth, and J. Pampanin, "A process and temperature insensitive CMOS linear TIA for 100 Gbps/λ PAM-4 optical links," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, San Diego, CA, 2018.
- [35] J. E. Proesel et al., "A 32 Gb/s, 4.7 pJ/bit optical link with -11.7 dBm sensitivity in 14-nm FinFET CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1214-1226, 2018.
- [36] P. Dash, "A variable bandwidth, power-scalable optical receiver front-end," *Masters thesis, Concordia Univ.*, *Montreal, Canada, 2013.* Available: https://spectrum.library.concordia.ca/977824. Accessed on Dec. 7, 2020.
- [37] E. Sackinger and W. C. Fischer, "A 3-GHz 32-dB CMOS limiting amplifier for SONET OC-48 receivers," *IEEE J. Solid-State Circuits*, vol. 35, no. 12, pp. 1884-1888, Dec. 2000.
- [38] A. S. Ramani, S. Nayak, and S. Shekhar, "A differential push-pull voltage mode VCSEL driver in 65-nm CMOS," *IEEE Trans. on Circuits Syst. I, Reg. Papers,* vol. 66, no. 11, pp. 4147-4157, Nov. 2019.
- [39] M. Raj, M. Monge, and A. Emami, "A Modelling and nonlinear equalization technique for a 20 Gb/s 0.77 pJ/b VCSEL transmitter in 32 nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1734-1743, Aug. 2016.
- [40] P. Westbergh, J. S. Gustavsson, Å. Haglund, M. Skold, A. Joel, and A. Larsson, "High-speed, low-currentdensity 850 nm VCSELs," *IEEE J. Sel. Topics Quantum Electron*, vol. 15, no. 3, pp. 694-703, May-june 2009.

- [41] J. Li, X. Zheng, A. V. Krishnamoorthy, and J. F. Buckwalter, "Scaling trends for picojoule-per-bit WDM photonic interconnects in CMOS SOI and FinFET processes," J. Lightw. Technol., vol. 34, no. 11, pp. 2730-2742, June, 2016.
- [42] O. T. Chen, C. Chan, and R. R. Sheen, "Transimpedance limit exploration and inductor-less bandwidth extension for designing wideband amplifiers," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 1, pp. 348-352, Jan. 2016.
- [43] Q. Pan, Y. Wang, Y. Lu, and C. P. Yue, "An 18-Gb/s fully integrated optical receiver with adaptive cascaded equalizer," *IEEE J. Sel. Topics in Quantum Electron.*, vol. 22, no. 6, pp. 361-369, Dec. 2016.
- [44] F. Radice, M. Bruccoleri, E. Mammei, M. Bassi, and A. Mazzanti, "A low-noise programmable-gain amplifier for 25 Gb/s multi-mode fiber receivers in 28nm CMOS FDSOI," in *Proc. ESSCIRC 41st Eur. Solid-State Circuits Conf. (ESSCIRC)*, Graz, Austria, 2015.
- [45] S. Shahramian, H. Yasotharan, and A. C. Carusone, "Decision feedback equalizer architectures with multiple continuous-time infinite impulse response filters," *IEEE Trans. Circuits Sys. II, Exp. Briefs*, vol. 59, no. 6, pp. 326-330, June 2012.
- [46] C. Williams, D. Abdelrahman, X. Jia, A. I. Abbas, O. Liboiron-Ladouceur, and G. E. R. Cowan, "Reconfiguration in source-synchronous receivers for short-reach parallel optical links," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 27, no. 7, pp. 1548-1560, July 2019.
- [47] R. Ma, M. Liu, H. Zheng, and Z. Zhu, "A 66-dB linear dynamic range, 100-dBΩ transimpedance gain TIA with high-speed PDSH for LiDAR," *IEEE Trans. Instrum. Meas.*, vol. 69, no. 4, pp. 1020-1028, April 2020.
- [48] Y. Chien, K. Fu, and S. Liu, "A 3–25 Gb/s four-channel receiver with noise-canceling TIA and power-scalable LA," *IEEE Trans. Circuits Sys. II, Exp. Briefs*, vol. 61, no. 11, pp. 845-849, Nov. 2014.
- [49] S. Ohteru and N. Takachio, "Optical signal quality monitor using direct Q-factor measurement," *IEEE Photonics Tech. Letters*, vol. 11, no. 10, pp. 1307-1309, Oct. 1999.
- [50] H. Won *et al.*, "A 28-Gb/s Receiver with self-contained adaptive equalization and sampling point control using stochastic sigma-tracking eye-opening monitor," *IEEE Tran. Circuits Syst. I, R. Papers*, vol. 64, no. 3, pp. 664-674, 2017.
- [51] D. Dunwell and A. C. Carusone, "Gain and equalization adaptation to optimize the vertical eye opening in a wireline receiver," in *CICC Dig. Tech. Papers*, San Jose, CA, 2010.
- [52] C. Li et al., "A ring-resonator-based silicon photonics transceiver with bias-based wavelength stabilization and adaptive-power-sensitivity receiver," *IEEE J. Solid-State Circuits*, vol. 49, no. 6, pp. 1419-1436, June 2014.