# Towards the Design of Robust High-Speed and Power Efficient Short-Reach Photonic Links

Christopher Williams

A Thesis

In the Department

of

Electrical and Computer Engineering

Presented in Partial Fulfillment of the Requirements

For the Degree of

Doctor of Philosophy (Electrical and Computer Engineering) at

Concordia University

Montréal, Québec, Canada

July 2019

© Christopher Williams, 2019

#### **CONCORDIA UNIVERSITY**

#### SCHOOL OF GRADUATE STUDIES

This is to certify that the thesis prepared

By: Christopher Williams

Entitled: Towards the Design of Robust High-Speed and Power Efficient Short-Reach Photonic Links

and submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy (Electrical Engineering)

complies with the regulation of the University and meets the accepted standards with respect to originality and quality.

Signed by the final examining committee:

|               |                           | Chair                         |
|---------------|---------------------------|-------------------------------|
| Dr. Luis Ama  | ador                      |                               |
|               |                           | External Examiner             |
| Dr. Anthony   | Chan Carusone             |                               |
|               |                           | External to Program           |
| Dr. Pablo Bia | nucci                     |                               |
|               |                           | Examiner                      |
| Dr. Rabin Ra  | ut                        |                               |
|               |                           | Examiner                      |
| Dr. John Xiu  | pu Zhang                  |                               |
|               |                           | Thesis Co-Supervisor          |
| Dr. Glenn Co  | owan                      |                               |
|               |                           | Thesis Co-Supervisor          |
| Dr. Odile Lib | oiron-Ladouceur           |                               |
|               |                           |                               |
| ved by        |                           |                               |
|               | Dr. Rastko R. Selmic, Gra | aduate Program Director       |
| nber 3, 2019  |                           |                               |
|               | Dr. Amir Asif, Dean       |                               |
|               | Gina Cody School of Eng   | ineering and Computer Science |

#### ABSTRACT

## Towards the Design of Robust High-Speed and Power Efficient Short-Reach Photonic Links

### Christopher Williams, Ph.D Candidate Concordia University, 2019

In 2014, approximately eight trillion transistors were fabricated every second thanks to improvements in integration density and fabrication processes. This increase in integration and functionality has also brought about the possibility of system on chip (SoC) and high-performance computing (HPC). Electrical interconnects presently dominate the very-short reach interconnect landscape (< 5 cm) in these applications. This, however, is expected to change. These interconnects' downfall will be caused by their need for impedance matching, limited pindensity and frequency dependent loss leading to intersymbol interference. In an attempt to solve this, researchers have increasingly explored integrated silicon photonics as it is compatible with current CMOS processes and creates many possibilities for short-reach applications.

Many see optical interconnects as the high-speed link solution for applications ranging from intra-data center (~200 m) down to module or even chip scales (< 2 cm). The attractive properties of optical interconnects, such as low loss and multiplexing abilities, will enable such things as Exascale high-performance computers of the future (equal to  $1 \times 10^{18}$  calculations per second). In fact, forecasts predict that by 2025 photonics at the smallest levels of the interconnect hierarchy will be a reality. This thesis presents three novel research projects, which all work towards increasing robustness and cost-efficiency in short-reach optical links. It discusses three parts of the optical link: the interconnect, the receiver and the photodiode.

The first topic of this thesis is exploratory work on the use of an optical multiplexing technique, mode-division multiplexing (MDM), to carry multiple data lanes along with a forwarded clock for very short-reach applications. The second topic discussed is a novel reconfigurable CMOS receiver proposed as a method to map a clock signal to an interconnect lane in an MDM source-synchronous link with the lowest optical crosstalk. The receiver is designed as a method to make electronic chips that suit the needs of optical ones. By leveraging the more robust electronic integrated circuit, link solutions can be tuned to meet the needs of photonic chips on a die by die basis. The third topic of this thesis proposes a novel photodetector which uses photonic grating couplers to redirect vertical incident light to the horizontal direction. With this technique, the light is applied along the entire length of a p-n junction to improve the responsivity and speed of the device. Experimental results for this photodetector at 35 Gb/s are published, showing it to be the fastest all-silicon based photodetector reported in the literature at the time of publication.

#### **ACKNOWLEDGEMENTS**

I would like to start by expressing my deepest appreciation to Dr. Glenn Cowan (Concordia University, Montreal, Canada), a professor who gave a struggling undergraduate student a chance when others wouldn't, and who never stopped believing in me. Without you, this dissertation would have never been possible. I am also extremely grateful to Dr. Odile Liboiron-Ladouceur (McGill University, Montreal, Canada) for pushing me to see my full potential. The countless hours of editing and the number of red pens you must have used on my work, I will never be able to repay.

I would like to acknowledge the financial support I received from the Gina Cody Faculty of Engineering at Concordia University, the Fonds de recherche du Quebec - Nature et technologies (FRQNT), the Natural Sciences and Engineering Research Council of Canada (NSERC), the CREATE Silicon Electronic-Photonic Integrated Circuits (SiEPIC) program and the Canadian Microelectronic Corporation (CMC).

I would like to thank Professor Lukas Chrostowski of the SiEPIC program (University of British Columbia, Vancouver, Canada) for his enthusiasm in teaching and Jim Quinn of CMC (Kingston, Canada) for his technical contributions and testing equipment. I would like to thank Sheryl Tablan and Maria Fasciano for their administrative support throughout my academic time at Concordia University. I would also like to thank and acknowledge every professor at Concordia University (Montreal, Canada) and the Electrotech program at Dawson College (Montreal, Canada), in particular Nick Markou and Glen Goodale. You have all helped me, in one way or another, get to this point.

I'm extremely grateful to Ted Obuchowicz for the technical support, advice and friendship. I would like to extend my deepest gratitude to Dr. Pouya Valizadeh, for reminding me how

interesting university can be. I am also grateful to Alexander Rylyakov for his guidance, understanding and support. I would like to recognize many people for their advice and invaluable contributions, in particular Abdullah Ibn Abbas, Diaa Abdelrahman, Marc-Alexandre Chan, Weihao Ni, Xiangdong Jia, Marjan Madani, Monir Moayedi Pour Fard, Behnam Banan, Reza Nezami, Guowu Zhang, Partha Dash, Rubana Priti, Michael Segev and Michael DiPerna. Don't worry, I am almost done.

I am deeply indebted to many people for my success. My mom, dad, aunts and uncles (shout out to auntie Audrey), extended family and friends; I owe a large portion of this success to you. Oh, my brother Michael too, he's ok. Saving the best for last, I cannot begin to express my gratitude to Edith Groulx-Robert. The tape-outs, paper deadlines, conferences, exams that you put up with, let's hope your investment pays off! I promise, no more degrees are in the plans. Ok, now let's get to it.

## **TABLE OF CONTENTS**

|    | List of Figures                                                                    | x      |
|----|------------------------------------------------------------------------------------|--------|
|    | List Of Acronyms                                                                   | xvi    |
|    | Chapter 1 - Introduction                                                           | 1      |
|    | Motivation and Problem Statement                                                   | 1      |
|    | Source-Synchronous Architecture using MDM                                          | 3      |
|    | Reconfigurable Optical Receiver                                                    | 6      |
|    | Other Work                                                                         | 7      |
|    | Thesis Objective                                                                   | 8      |
|    | Claim of Originality                                                               | 9      |
|    | Publications and Contributions                                                     | 10     |
|    | Thesis organization                                                                | 14     |
|    | Chapter 2 - Background and Literature Review                                       | 15     |
|    | Frequency Dependent Loss                                                           | 15     |
|    | Mode-Division Multiplexing                                                         | 17     |
|    | Source-Synchronous Architectures                                                   | 21     |
|    | Modal Crosstalk                                                                    | 27     |
|    | Photodiodes                                                                        | 29     |
|    | Chapter 3 - Modal Crosstalk in Silicon Photonic Multimode Interconnects            | 37     |
|    | Modal Leakage In MDM Interconnects                                                 | 37     |
|    | Mode Crosstalk Measurements in The Frequency Domain                                | 46     |
|    | Experimental Mode Crosstalk in the Time Domain                                     | 51     |
|    | Conclusion                                                                         | 58     |
|    | Chapter 4 - A Source-Synchronous Architecture Using Mode-Division Multiplexing for | or SiP |
| Ir | nterconnects                                                                       | 59     |
|    | Proposed Architecture                                                              | 59     |

| D   | Design of the MDM SiP Waveguide                                                                 | 62    |
|-----|-------------------------------------------------------------------------------------------------|-------|
| Ir  | mpact of Modal Crosstalk on Data/Clock Signals                                                  | 64    |
| N   | Nodal Time Skew on Data/Clock Signals                                                           | 66    |
| Ir  | mpact of Jitter on MDM Source-Synchronous Links                                                 | 68    |
| E   | xperimental Results                                                                             | 70    |
| С   | Conclusion                                                                                      | 77    |
| Cha | pter 5 - Reconfiguration in Source-Synchronous Receivers for Short-Reach Parallel Optical Links | s.79  |
| Ρ   | roposed Reconfigurable Architecture                                                             | 79    |
|     | Overview                                                                                        | 79    |
|     | Path Selection                                                                                  | 81    |
|     | Analog Front-End                                                                                | 84    |
|     | Oscillators                                                                                     | 87    |
|     | Clock Distribution                                                                              | 93    |
|     | Data Path                                                                                       | 97    |
| E   | xperimental Results                                                                             | 99    |
| D   | Discussion                                                                                      | 107   |
| С   | Conclusion                                                                                      | 109   |
| L   | ayout Aware Design Methodology                                                                  | 111   |
|     | Floorplanning                                                                                   | 111   |
|     | Supply Regulation                                                                               | 112   |
|     | crosstalk reduction Through ESD PATH                                                            | 112   |
|     | Ground Splitting                                                                                | 113   |
|     | Substrate noise paths                                                                           | 115   |
|     | Substrate biasing                                                                               | 116   |
|     | Packaging Considerations                                                                        | 117   |
|     | Power Distribution considerations                                                               | . 118 |

| Supply routing                            |     |
|-------------------------------------------|-----|
| Post-Layout Simulations                   |     |
| Chapter 6 - Other Work                    |     |
| Diode Junction Design                     |     |
| Experimental Results                      |     |
| Chapter 7 – Conclusion                    |     |
| Thesis Highlights                         |     |
| Future Work                               | 134 |
| Source-Synchronous Architecture Using MDM | 134 |
| Reconfigurable CMOS Circuits              |     |
| Grating-Assisted Si-PD                    |     |
| References                                |     |

## **LIST OF FIGURES**

| Figure 1-1 - Illustration of possible HPC node architecture to enable Exascale computing using       |
|------------------------------------------------------------------------------------------------------|
| photonic interconnects [1]                                                                           |
| Figure 1-2 - (a) Conventional source-synchronous optical RX with set clock and data paths; (b)       |
| proposed RX with reconfigurable clock and data paths                                                 |
| Figure 2-1 - Image showing input and output eye diagrams at 10 Gb/s for a backplane                  |
| application. The backplane trace shows a loss of > 20 dB at 5 GHz [49] 16                            |
| Figure 2-2 - (Left) S-parameters of 20 and 30 inch PCB traces showing $> 25$ dB loss at 9 GHz.       |
| (Right) Output eye diagrams of PCB trace for 17 Gb/s signal (a) without and (b) with                 |
| equalization [50]                                                                                    |
| Figure 2-3 – S21 measurements of 5 and 7 mm on-chip electrical interconnect [48] 17                  |
| Figure 2-4 - Lumerical simulation showing cross-over distance between two waveguides. Color          |
| scale indicates normalized optical intensity, while $X$ and $Y$ indicate spatial location. Waveguide |
| structures shown using dotted lines                                                                  |
| Figure 2-5 – Mode-multiplexing technique from [51]                                                   |
| Figure 2-6 – Modes found in a multimode waveguide                                                    |
| Figure 2-7 - Comparison of (a) CDR and (b) clock-forwarded (source-synchronous) receiver             |
| architectures                                                                                        |
| Figure 2-8 - (a) Ideal data transfer for jitter-free source-synchronous operation, (b) clock delayed |
| (skewed) from data by one bit period after transmission and (c) source-synchronous operation         |
| with jitter and skew added                                                                           |
| Figure 2-9 – Block diagram showing time domain response at the output, $y(t)$ , due to input $x(t)$  |
| and delayed input $x(t-\Delta T)$                                                                    |
| Figure 2-10 - 100 MHz sinusoidal jitter on clock and data with (a) 500 ps of delay (5 UI of skew     |
| at 10 Gb/s) and (b) 5 ns of delay (50 UI of skew at 10 Gb/s)27                                       |
| Figure 2-11 – Photon absorption in silicon                                                           |
| Figure 2-12 – Drift and diffusion currents in photodiodes                                            |
| Figure 2-13 - (Top) Separate p-type, intrinsic and n-type doped material at equilibrium, far from    |
| the interface; (Bottom) The interface of all three doped materials showing continuous Fermi-         |
| level                                                                                                |

Figure 2-14: Side view of substrate for (a) shallow n-well, (b) spatially modulated light detector and (c) silicon-on-insulator. All PDs are excited by optical input (red traces) from the fibers (blue cylinders) above each. Photon movement and carrier movements are illustrated for all three Figure 2-15 - Cross-sectional view of a p-i-n photodiode grating coupler on SOI substrate [62]. Figure 3-1 - (a) Directional coupler illustrating the input signal coupling from the add port WG to the bus WG, with incorrectly coupled signal from ADD WG shown as leakage; (b) transmission characteristics of M2 from add WG at the MUX output, with a subplot of transmission across width variation at 1570 nm as an example; (c) crosstalk characteristics of M2 to M1 from add WG at the MUX output, with a subplot of crosstalk across width variation at 1545 nm as an example. Both (b) and (c) are for ADD WG width variations between -10 nm and Figure 3-3 - Simulated tapered MDM MUX transmission for (a) M2 MUX input port to M2 mode in the bus interconnect and (b) M2 MUX input port to M1 mode in the bus interconnect while varying the drop WG width from -10 nm to 10 nm. Subplot in (b) shows crosstalk across Figure 3-4 - Simulated crosstalk using ideal structures (MUX, interconnect, and DEMUX) for interconnect lengths of (a) 100 µm and 250 µm; (b) 750 µm and 1000 µm. FSR is indicated for Figure 3-5 - 1 mm interconnect with (a) +/- 10 nm width variation at input port 2 (add WG); (b) Combined effects of 5 nm and 10 nm width variation at input port 2 (add WG) along with 5 nm Figure 3-6 - Experimental setup for crosstalk measurement of Mode 2 to Mode 1 (M2M1) using Figure 3-7 - Experimental results of wavelength sweeps for (a) 100 µm, (b) 250 µm, (c) 750 µm Figure 3-8 - Comparison between simulated, calculated (model), and experimentally measured FSR values. 50 Figure 3-9 - Experimental results of two identical 1 mm MDM interconnects A and B. ..... 51

| Figure 3-10 - Experimental setup for crosstalk data measurements in the time domain with one      |
|---------------------------------------------------------------------------------------------------|
| PRBS aggressor signal at M2 input port 2                                                          |
| Figure 3-11 - Eye diagram showing impact of modal crosstalk in the time domain for an             |
| aggressor signal from M2 onto a decorrelated signal on M1 for effective crosstalk of (a) -29.2    |
| dB, (b) -24.2 dB, (c)-22.2 dB and (d) -19.2 dB. Inset text indicates input optical power to the   |
| grating couplers for each mode                                                                    |
| Figure 3-12 - Illustration in MATLAB of the resulting eye diagrams for both extremes of the       |
| beat term signs, (a) positive and (b) negative                                                    |
| Figure 3-13 - (a) Graph showing effective crosstalk versus the vertical eye opening (normalized   |
| to the opening at 29.2 dB crosstalk) and the horizontal eye opening (with respect to the UI of an |
| ideal 8 Gb/s bit period); (b) effective crosstalk versus calculated BER from oscilloscope         |
| measurements                                                                                      |
| Figure 3-14- Crosstalk spectrum overlaid onto calculated BER bar graph (approximated and with     |
| BER floor of 10-12) for interconnect lengths of (a) 250 µm and (b) 1 mm                           |
| Figure 4-1 - Proposed MDM architecture [26]                                                       |
| Figure 4-2 - Passive SiP MDM device structure supporting three modes                              |
| Figure 4-3 - Effective index versus waveguide width for TE modes with subset figures showing      |
| waveguide cross-sections with mode spatial distribution                                           |
| Figure 4-4- Experimental results of 2-mode MDM devices with varying interconnect length           |
| versus measured CW crosstalk. Results presented are for best and worst crosstalk found across     |
| working wavelengths (from 1550 nm to 1570 nm)                                                     |
| Figure 4-5 - Theoretical skew between three modes                                                 |
| Figure 4-6 - Microscope images of the SiP 750 µm optical interconnect. Distance between           |
| grating couplers (127 µm) given for scale                                                         |
| Figure 4-7 - Experimental setup for dual data mode source-synchronous operation                   |
| Figure 4-8 - CW measurements highlighting clock traces at best (1560 nm) and worst (1553 nm)      |
| isolated wavelengths with different combinations of mode 1 (M1), mode 2 (M2) and mode 3           |
| (M3) co-propagating data signals. Inset figures show optically forwarded clock on oscilloscope    |
| with histogram jitter measurements                                                                |
| Figure 4-9- Measured RMS jitter for various experimental crosstalk values, showing                |
| exponentially increasing jitter trend                                                             |

| Figure 4-10 - Electrical eye diagrams of data transmission at 1553 nm captured with oscilloscope  |
|---------------------------------------------------------------------------------------------------|
| triggered by the forwarded (a) optical clock on mode 2 and (b) electrical clock bypassing the     |
| DUT                                                                                               |
| Figure 4-11 - Power penalty plots for (a) 1553 nm and (b) 1560 nm wavelengths77                   |
| Figure 5-1 - Proposed reconfigurable receiver architecture for parallel optical links             |
| Figure 5-2 - (a) TIA fan-out configuration using pass-transistor switches to shield CL loading    |
| capacitance from the TIA; (b) TIA fan-out configuration without using switches                    |
| Figure 5-3 - Circuit simulation for design in 5(a) with CMOS pass-transistor switches with        |
| varying width (minimum length) versus bandwidth at the output of the TIA/MS and switch,           |
| normalized to switchless design in 5(b)                                                           |
| Figure 5-4 – TIA capacitive loading for both implementations in Figure 5-2                        |
| Figure 5-5 - Block diagram and circuits of the designed analog front-end (AFE)                    |
| Figure 5-6 – Digital-to-analog control blocks for ILO and injection circuits                      |
| Figure 5-7 - (a) CML ILO structure in each receiver, (b) one delay cell of ILO, and (c) injection |
| circuit for ILO locking                                                                           |
| Figure 5-8 - ILO output "phase 0" showing coarse skew ability using four different injection      |
| configurations                                                                                    |
| Figure 5-9 - ILO output "phase 0" showing fine skew ability using many different injection        |
| configurations                                                                                    |
| Figure 5-10 – Layout of ILO showing differential delay cells, injection blocks and clock buffers. |
|                                                                                                   |
| Figure 5-11 – Simulated differential output phases                                                |
| Figure 5-12 – Analog delay line for clock phase alignment                                         |
| Figure 5-13 - Simulation results for ILO differential delay stage, showing gain and common-       |
| mode rejection ratio across corners                                                               |
| Figure 5-14 - Reconfigurable on-chip clock distribution network between receivers. In this        |
| example, RX(n) is the clock receiver. Sizes indicate W/L transistor ratios used                   |
| Figure 5-15 - Implemented GSGSG transmission line for clock distribution network with metal 4     |
| ground plane and metal 6 signal routing (not to scale)                                            |
| Figure 5-16 - (a) Primary/secondary latch architecture and (b) CML latch circuit                  |
| Figure 5-17 - Extracted simulations of latches showing functionality at 24 Gb/s                   |

| Figure 5-18 - Die photo of 1 mm x 0.7 mm receiver chip in 65 nm CMOS, containing three                   |
|----------------------------------------------------------------------------------------------------------|
| reconfigurable receivers                                                                                 |
| Figure 5-19 - Power consumption of source-synchronous link during experimental verification.             |
| Figure 5-20 - Experimental setup. 103                                                                    |
| Figure 5-21 - ILO clock output                                                                           |
| Figure 5-22 - BER curves for (a) clock input to RX1 and data input to RX2, (b) data input to             |
| RX1 and clock input to RX2. Measurements are for a per latch data speed of 2 Gb/s (8 Gb/s                |
| total) with a clock input amplitude of 194 $\mu A_{pp}$                                                  |
| Figure 5-23 - Bathtub curves for data input to RX1 and clock input to RX2, then clock input to           |
| RX1 and data input to RX2                                                                                |
| Figure 5-24 - Measured bathtub curves of each latch during quarter-rate operation at 8 Gb/s 105          |
| Figure 5-25 - Effect of reference clock input strength on ILO clock jitter                               |
| Figure 5-26 - Effect of reference clock input strength on BER curves. Clock attenuation of 0 dB          |
| corresponds to an input signal of 194 $\mu A_{pp}$ and -4 dB corresponds to 124 $\mu A_{pp}$             |
| Figure 5-27 - Effect of reference clock input strength on bathtub curves. Clock attenuation of           |
| 0 dB corresponds to an input signal of 194 $\mu A_{pp}$ , data input amplitude held at 130 $uA_{pp}$ 107 |
| Figure 5-28 - Layout of the first version of configurable receiver. The receiver had many                |
| problems dealing with inter-receiver crosstalk                                                           |
| Figure 5-29- (a) ESD connection with pads; (b) ESD coupling paths between two I/O pads 113               |
| Figure 5-30 - Separated supply and grounds                                                               |
| Figure 5-31 - Ground separation using SUB2 layer                                                         |
| Figure 5-32 - Substrate isolation using NT_N layer                                                       |
| Figure 5-33 - Current draw from supply of CMOS strong-arm latches (red) and CML latches                  |
| (blue)                                                                                                   |
| Figure 5-34 – (a) Direct versus (b) star interconnect connections                                        |
| Figure 5-35 - Power supply routing using star configuration. Example taken from version two of           |
| chip layout                                                                                              |
| Figure 5-36 – Bonding diagram and actual chip image of packaging using fan out design 120                |
| Figure 5-37 - Supply and ground routing using (a) indirect and (b) direct connections                    |

| Figure 6-1 - Proposed grating-assisted horizontal photodetector showing (a) conceptual              |
|-----------------------------------------------------------------------------------------------------|
| illustration; (b) grating coupler and PiN structure; (c) side view of grating coupler [88] 124      |
| Figure 6-2 - Bandwidth limitations on lateral photodetector junctions                               |
| Figure 6-3 - Top view of proposed Si-PDs in waveguides showing direction of applied light. (a)      |
| PiN structure with large intrinsic region; (b) PiN structure with small intrinsic region and (c) PN |
| junction without intrinsic region                                                                   |
| Figure 6-4 – Illustrated comparisons between proposed photodetector designs                         |
| Figure 6-5 - Top view of proposed multi-finger Si-PD, showing interconnection of PD and             |
| direction of applied light                                                                          |
| Figure 6-6 - Layout (top) and microscope images (bottom) of three SOI based grating-assisted        |
| SI-PDs; (a) variant with large intrinsic region added, (b) variant with three finger design and (c) |
| variant with focusing grating coupler [88]                                                          |
| Figure 6-7 - Experimental DC measurements of the Si-PDs showing (a) dark current along with         |
| photocurrent for an optical input of 0 dBm CW light; (b) responsivity; (c) photocurrent of the Si-  |
| PD variants with a reverse-bias voltage of 8V [88]                                                  |
| Figure 6-8 - (a) Experimental setup for S-parameter measurement; (b) measured S21 OE                |
| frequency response of the selected grating-assisted Si-PD variants; (c) resulting bandwidths for    |
| all measured devices [88]                                                                           |
| Figure 6-9 - (a) Measured electrical eye-diagrams of Si-PD variant 1 (5 um intrinsic width) with    |
| an applied reverse-bias of 20 V; (b) Measured electrical eye-diagrams of Si-PD variant 2 (2 um      |
| intrinsic width) with an applied reverse-bias of 12 V; (c) Measured eye-diagrams of Si-PD           |
| variant 3 (1 um intrinsic width) with an applied reverse-bias of 20 V; (d) Measured eye-diagrams    |
| of Si-PD variant 4 (0.3 um intrinsic width) with an applied reverse-bias of 14 V [88] 132           |

## LIST OF ACRONYMS

| BER    | Bit Error Rate                                    |
|--------|---------------------------------------------------|
| CMOS   | Complementary Metal-Oxide Semiconductor           |
| CDR    | Clock and Data Recovery                           |
| CML    | Current-Mode Logic                                |
| CMRR   | Common-Mode Rejection Ratio                       |
| CW     | Continuous Wave                                   |
| DAC    | Digital-to-Analog Converter                       |
| DSP    | Digital Signal Processing                         |
| EIC    | Electronic Integrated Circuit                     |
| FOM    | Figure of Merit                                   |
| Gb/s   | Giga-bit per Second                               |
| НРС    | High-Performance Computer                         |
| I/O    | Input / Output                                    |
| ILO    | Injection-Locked Oscillator                       |
| ISI    | Intersymbol Interference                          |
| LVS    | Layout Versus Schematic                           |
| MDM    | Mode-Division Multiplexing                        |
| MOSFET | Metal-Oxide Semiconductor Field Effect Transistor |
| MZM    | Mach-Zehnder Modulator                            |
| РСВ    | Printed-Circuit Board                             |
| PD     | Photodetector                                     |
| PIC    | Photonic Integrated Circuit                       |

| SDM   | Space-Division Multiplexing      |
|-------|----------------------------------|
| SiP   | Silicon Photonic                 |
| PLL   | Phase-Locked Loop                |
| RX    | Receiver                         |
| SDM   | Space-Division Multiplexing      |
| Si-PD | Silicon photodetector            |
| SML   | Spatially Modulated Light        |
| SoC   | System-on-Chip                   |
| SOI   | Silicon-On-Insulator             |
| TB/s  | Tera-Byte per Second             |
| TX    | Transmitter                      |
| UI    | Unit Interval                    |
| WDM   | Wavelength-Division Multiplexing |

"ARE YOU FINISHED SCHOOL YET?"

- Everyone

#### **MOTIVATION AND PROBLEM STATEMENT**

Smaller CMOS technology nodes and improvements in fabrication techniques have enabled circuits to be highly integrated, bringing about the possibility of system on chip (SoC) and high-performance computing (HPC). This increase in functionality contributes to continuously rising transfer speeds in short reach interconnects. Electrical interconnects are one of the most common methods used to answer the input/output (I/O) needs of these applications. For these interconnects, however, the challenge is to keep up with improvements in processing speed [1]. Optical interconnects using silicon photonic integrated circuits (PIC) is one proposed interconnect method for enabling scalability to higher interconnect bandwidths, for length scales in the intra-data center range (~200 m) right down to the chip-to-chip (~3 cm) [2] and on-chip (< 2 cm) [3] scales of the future [4], [5].

For the moment, very-short reach electrical interconnects, which are able to handle the traffic on these high-speed links, are the dominant choice. This, however, is expected to change. They will be gradually replaced because of their need for impedance matching, their high channel-loss and limited pin-density [6].

On the other hand, optical interconnects, with attractive properties such as low loss and multiplexing abilities, may enable technology for intra-data center connections and Exascale high-performance computers of the future. Photonic interconnects using silicon-on-insulator (SOI) based waveguides take advantage of the large refractive index difference between the core and the cladding. This index contrast allows for high confinement of light and relatively low loss in the integrated waveguides [7]. Forecasts predict that photonics will be a feasible solution for

the lowest levels of interconnections, the board and chip levels, by approximately 2025 [6]. In fact, it was found that the critical length at which it is advantageous to use optical interconnects in place of electrical ones is 1.8 mm (one-tenth the chip edge of an 18 mm chip) at the 22 nm CMOS node [8].

Electronic/photonic co-designed systems offer both the functionality of electronic integrated circuits (EIC) and the bandwidth of optical devices. Several methods have been proposed to support the co-integration of these two technologies, such as front or back end-of-line integration (FEOL and BEOL), flip-chip and hybrid-technology integration [6], [9]. As CMOS technology nodes scale down to allow for higher integration (reaching 5 nm in 2019), system issues related to I/O bandwidth will continue to worsen. For example, the first Exaflop system expected in 2020 (equal to  $1 \times 10^{18}$  calculations per second) is predicted to require each of the 100,000 computing nodes of a HPC to have a > 4 TB/s memory link bandwidth, a > 1 TB/s node-to-node interconnect bandwidth and power limits of < 200 W [5]. The illustration in Figure 1-1 shows a possible Exaflop node architecture using photonic devices, which provide the necessary memory link bandwidth requirements while minimizing power [1].

To explore current and future issues that these new and powerful integrated systems are facing, this thesis investigates the way short reach optical interconnects can improve data transfer efficiency for interconnects at low-end length scales. In particular, three different components of an optical interconnect are studied: 1) the optical link, 2) the receiver and 3) the photodiode.



Figure 1-1 - Illustration of possible HPC node architecture to enable Exascale computing using photonic interconnects [1].

### SOURCE-SYNCHRONOUS ARCHITECTURE USING MDM

Optical multiplexing significantly increases the aggregated throughput of interconnects in transmission lengths ranging from inter- and intra-data center down to intra-chip communication. Techniques such as wavelength-division multiplexing (WDM) using silicon photonic (SiP) waveguides have been studied [10], [11].

An interesting multiplexing dimension is mode-division multiplexing (MDM) which uses orthogonal optical modes as transmission channels [12], [13]. This method has the advantage of using a single laser to modulate all channels, which could lead to lower complexity and power dissipation than WDM [14]. MDM has already shown promise, for example, with eight data channels transmitted simultaneously in [15]. As opposed to a parallel electrical interconnect bus, the optical MDM channels share the same physical space in a waveguide as they propagate. Multiple Gb/s channels can thus share an optical waveguide about 1 µm wide [16].

For data to be transferred from the transmitter to the receiver correctly, the receiver needs to be

synchronized with a transmitter, regardless of the channel. The choices for synchronization fall into two categories: 1) clock-recovery and 2) clock-forwarding. Recovering the clock is done using a clock-and-data recovery (CDR) architecture, where transitions in the data channel are used as a synchronizing signal for a local clock signal at the receiver, allowing alignment of the two. The second method, a source synchronous link where the clock is forwarded along with the transmitted data, has been used extensively in parallel electrical links as a power-efficient means to provide synchronization over short channels [17],[18]. By forwarding the clock to the receiver (RX) on a separate channel, one removes the need for a full clock and data recovery circuit to synchronize the incoming data, saving chip area and power [18]. For example, the CDR implementations in [19-21] require > 26 mW per lane while most source-synchronous implementations in the literature only require a fraction of that [17], [22], [23]. Parallel optical links have also started using clock-forwarding with WDM [24] and space division multiplexing (SDM) [25]. However, clock forwarding in MDM optical links remains to be assessed. This topic of the thesis looks at using the emerging multiplexing technique MDM for a high-speed source-synchronous communication link. This is in an effort to investigate novel alternatives to the popular WDM or SDM architectures. Using MDM as an alternative to WDM could reduce power and tuning requirements for the light source. Also, the MDM alternative may reduce the total cross-sectional area of optical waveguides by the inter-waveguide pitch compared to SDM, increasing bandwidth density. It should be noted that we are not necessarily indicating that MDM should replace other optical multiplexing techniques in all applications. Instead, we are interested in the idea of using it in conjunction with other forms of multiplexing, such as WDM, to increase total link throughput.

The limitations of a source-synchronous architecture are determined by the implementation. As stated in [17], the optimum source-synchronous link has the clock to data skew limited to a few unit intervals UI and the aggregate throughput of the data channel(s) large enough to amortize the power of the dedicated clock channel transmitter and receiver. It was estimated that a short-reach chip-to-chip or on-chip interconnect ( $\sim 2$  cm) satisfies the low skew requirements of a source-synchronous architecture, as will be discussed, with a calculated channel-to-channel skew of 0.7 unit intervals (UI) at 25 Gb/s [26]. The low skew of this proposed technique indicates that a source-synchronous architecture using short-reach multiplexed optical interconnects is a candidate to replace electrical interconnects.

To properly study this topic, it is important to fully understand the problematic aspects of this method. One of the challenges in optically multiplexed links is crosstalk. In MDM-based interconnects, since all modes are transmitted using the same wavelength, crosstalk between channels is coherent in nature and cannot simply be filtered out [27]. Optical crosstalk disturbs the clock's sampling point of the data and causes horizontal and vertical eye closure, thus reducing crosstalk improves data transmission [28]. However, optical crosstalk in MDM links is not easily predicted at design time due to process variation [29], [30]. During our initial experimental measurements, the amount of optical crosstalk seen on each channel varied considerably across wavelengths. However, the wavelength with the lowest crosstalk varied considerably from chip to chip. This led to a need to investigate crosstalk across the transmission spectrum to offer insight for system-level design of wideband WDM-MDM comparable receivers.

Experimental work on source-synchronous optical links in [31] led to the observation that jitter on the clock channel was dependent on the mode crosstalk characteristics of the interconnect.

5

This resulted in the insightful solution to dynamically route the clock and data paths using a configurable receiver, improving signal transmission on a per die basis.

#### **RECONFIGURABLE OPTICAL RECEIVER**

To take full advantage of the proposed source-synchronous architecture, we also improved on the optical receiver. Optical receivers in both [24] and [25], which use a source-synchronous architecture, receive and then distribute the forwarded clock on-chip to the other data receivers where deskew on each lane is performed. A receiver normally follows the conventional sourcesynchronous architecture, illustrated in Figure 1-2(a), where the clock receiver's photodetector is hardwired, by means of wire-bonding [32] or integrated on the CMOS chip [33], to receive the off-chip clock and distribute the clock to other on-chip receivers while data receivers are hardwired to receive data from off-chip and receive a forwarded clock from the designated onchip clock receiver.



Figure 1-2 - (a) Conventional source-synchronous optical RX with set clock and data paths; (b) proposed RX with reconfigurable clock and data paths.

While photonic integrated circuit (PIC) fabrication techniques are continuously improving, they can still be considered in their infancy compared to the more mature CMOS processes. The PIC structures can be impacted by fabrication variation causing optical loss [34], [29], parasitic

coupling [35] or require post-fabrication tuning [36]. Our proposed solution to reduce the impact of optical crosstalk in a passive MDM interconnect is to select, on a per-die basis, the channel with the least amount of optical crosstalk at a given wavelength to transmit the sensitive forwarded clock to reduce added clock jitter. This is done using a novel reconfigurable CMOS receiver for parallel optical links. The configurability of the proposed receiver allows for the clock signal to be rerouted amongst the other parallel data channels and placed on the optical channel with the lowest amount of crosstalk on a per die basis. For example, to select the proper channel for clock forwarding in the architecture of Figure 1-2(a), one would need to measure each passive SiP die individually first, then provide a customized wire-bonding solution to map the clock receiver to the lowest crosstalk channel. This would be expensive, time consuming, and impractical for high-volume manufacturing. Instead, we proposed the reconfigurable approach in Figure 1-2(b), where the system can accept a clock in any lane of the parallel link by easily reconfiguring a data ready receiver to a clock ready one. In this case, automatic self-test routines could be run once during the first use of the module to determine the best configuration for the clock and data channels as a system.

#### **OTHER WORK**

When receiving information over an optical medium, the photodetector (PD) is responsible for doing an optical-to-electrical conversion. This device converts the photons into electrons and holes. CMOS nodes are not ideal for photodetectors because common wavelengths used in telecom optics, such as 1310 nm and 1550 nm, are not absorbed well by silicon. Yet, silicon may be the only material able to support the high-volumes of transceivers needed for the future of datacom [4], [37]. Discrete photodetectors are thus normally created with exotic materials on a separate die, where they are wire-bonded to the CMOS chips. This adds cost and capacitive and

inductive parasitics to the link. These parasitics lower bandwidth and cause transmission errors due to ISI.

In this thesis, a team-based project will be presented on a novel all-silicon PD in a CMOS compatible SOI process. The proposed all-silicon PD targets 850 nm wavelengths, used in VCSEL lasers, which are presently popular in short-reach links due to their yield and ease of manufacturing [38]. Conventional all-silicon PDs suffer from bandwidth and responsivity limitations due to the movement and collection of carriers in the substrate, as will be discussed in Chapter two. This PD borrows grating-coupler design theory from optics allowing light to be redirected on-chip and absorbed in the horizontal direction, in contrast to the classic vertical direction, allowing customization of the PN junction. The goal of this work is to increase responsivity and bandwidth in all-silicon PDs. In doing so, the resulting design will contribute to higher transfer speeds and lower assembly costs.

#### **THESIS OBJECTIVE**

The objective of this thesis is to create novel electronic and optical blocks for use in highspeed and low-power short-reach links. To achieve this goal, we break down the link and target several problematic areas.

An analysis of the interconnect carrying high-speed data pointed to the issue of frequency dependent loss as an inevitable limitation in electrical links. It is thus of interest to explore alternative methods to transmit data at short distances, targeting the chip-to-chip or future on-chip interconnect environment. It is also recognized that when optics are used, laser power can become a significant portion of the link budget. To reduce this power, by reducing the number of lasers required, it is of interest to explore MDM as a transmission method [14].

With the receiver being one of the key elements of the link, research efforts are also dedicated to this important block. Learning about crosstalk issues in MDM leads to the creation of a new receiver architecture, designed with silicon photonics in mind. The architecture is used to make electronics that can be reconfigured post-fabrication to suit the needs of the SiP interconnect, on a per die basis, by leveraging the more robust electronic chip to optimize the link. This has the possibility of increasing yield on electronic-photonic co-packaged parts, and ultimately reducing the cost of the devices.

Once data is transmitted over an optical link, the photodiode's optical to electrical conversion can pose limitations. Exotic materials and multiple chips can cause the cost to rise and the performance to fall. The objective here is to create a CMOS compatible photodiode using a novel method to exploit known optics theory.

The objectives of this thesis are thus summarized as follows:

- Explore a novel source-synchronous architecture using an emerging optical multiplexing technique, mode-division multiplexing (MDM).
- Investigate mode crosstalk to improve future wideband MDM links.
- Design an innovative reconfigurable CMOS receiver for use in parallel optics to optimize photonic/electronic co-packaged components.
- Create a high-performance all silicon photodetector (Si-PD) for use in 850 nm applications.

#### **CLAIM OF ORIGINALITY**

MDM crosstalk causes are explored across wavelengths to allow for increased throughput in future wideband WDM-MDM links. Frequency-domain and time-domain measurements using a variable crosstalk method for MDM interconnects is used, demonstrating the effects of optical crosstalk on eye diagrams and showing why minimizing optical crosstalk is essential. This work has been submitted for publication [39].

- ➤ We propose a novel source-synchronous architecture using MDM theory. The architecture is exploratory in nature and experimentally tested using fabricated silicon photonic passive structures. We propose wavelength or mode selection techniques to reduce clock jitter [16], [31]. This is the first demonstration of using the MDM architecture for a source-synchronous application.
- Following the observations of the MDM source-synchronous architecture, a novel CMOS receiver for parallel optics is proposed [40]. The architecture allows any receiver to be repurposed as either a clock or data receiver. Its flexibility gives the possibility to optimize the link on a per die basis, improving the yield of co-packaged parts. Since each receiver can be repurposed using a novel circuit reuse concept, the clock distribution among receivers must be dynamic as well. This is accomplished using a configurable clock distribution driver allowing any clock-configured receiver to properly feed all data-configured receivers on the chip, with complexity and power that scales with the number of receivers added. To the authors' best knowledge, the proposed reconfiguration and clock distribution scheme has not been seen in the literature.

#### PUBLICATIONS AND CONTRIBUTIONS

#### **Provisional Patents Directly Related to This Thesis**

 M. Moayedi Pour Fard, C. Williams, G. Cowan and O. Liboiron-Ladouceur, October 2018, U.S. Provisional Patent 62/567,981 "Photodetector for Detecting Incoming Infrared Light".

#### Peer-Reviewed Journal Articles Directly Related to This Thesis

**J1)** C. Williams, G. Zhang, R. Priti, G. Cowan, O. Liboiron-Ladouceur, "Modal Crosstalk in Silicon Photonic Multimode Interconnects," *OSA journal of Optics Express Journal*, Sept. 2019.

C. Williams: Initial simulations, design and layout of MDM interconnects, derivation of equations, planning and testing of devices, writing of manuscript.

G. Zhang: Simulation of optical devices, derivation of equations, writing of manuscript.

R. Priti: Discussion of methodology, editing of manuscript.

G. Cowan: Supervised project, revision of manuscript.

O. Liboiron-Ladouceur: Supervised project, revision of manuscript.

**J2) C. Williams**, D. Abdelrahman, X. Jia, A. Ibn Abbas, O. Liboiron-Ladouceur, G. Cowan, "Reconfiguration in Source-Synchronous Receivers for Short-Reach Parallel Optical Links," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 27, issue 7, p. 1548-1560, Jul. 2019.

*C. Williams*: Designed system approach and circuits, organization of team and project, layout of RF path and assembly of top level chip, testing of device and writing of manuscript.

*D. Abdelrahman*: Design and layout of TIA, wrote section on TIA in manuscript, revision of manuscript.

X. Jia: Optimization of latches, system-level verification, revision of manuscript.

A. Ibn Abbas: Took part in initial design of clocking network and ILO, revision of manuscript.

O. Liboiron-Ladouceur: Supervised the project, revision of manuscript.

G. Cowan: Supervised the project, revision of manuscript.

**J3)** M. Moayedi Pour Fard, **C. Williams**, G. Cowan, and O. Liboiron-Ladouceur, "High-speed grating-assisted all-silicon photodetectors for 850 nm applications" *Optics Express*, vol. 25, issue 5, p. 5107-5118, Mar. 2017.

*M. Moayedi Pour Fard:* Designed optical grating coupler for 850 nm, performed measurements on the fabricated device, wrote manuscript.

*C. Williams:* Initial background research, designed photodiode structures, aided in design of grating couplers, layout of all variants and revision of manuscript.

G. Cowan: Supervised the project, revision of manuscript.

O. Liboiron-Ladouceur: Supervised the project, revision of manuscript.

**J4) C. Williams**, B. Banan, G. Cowan and O. Liboiron-Ladouceur, "A Source-Synchronous Architecture Using Mode Division Multiplexing for On-Chip Silicon Photonic Interconnects," *Journal of Selected Topics in Quantum Electronics*, vol. 22, issue 6, Apr. 2016.

C. Williams: Designed MDM structures, tested devices and wrote manuscript.

B. Banan: Supervised testing, revision of manuscript.

G. Cowan: Supervised project, revision of manuscript.

O. Liboiron-Ladouceur: Supervised project, revision of manuscript.

### Peer-Reviewed Conference Papers Directly Related to This Thesis

**C1) C.Williams**, M. Moayedi Pour Fard, G. Cowan, and O. Liboiron-Ladouceur, "An all-silicon photodetector for 850 nm wavelength applications," *Integrated Photonics Research Conference*, (*Invited talk, presenting author*), Jul. 2018.

C. Williams: Initial background research, designed photodiode structures, layout of all variants and wrote manuscript.

*M. Moayedi Pour Fard*: Designed optical grating coupler for 850 nm, performed measurements on the fabricated device and revision of manuscript.

G. Cowan: Supervised the project, revision of manuscript.

O. Liboiron-Ladouceur: Supervised the project, revision of manuscript.

**C2)** M. Moayedi Pour Fard, **C. Williams**, G. Cowan, and O. Liboiron-Ladouceur, "A 35 Gb/s silicon photodetector for 850 nm wavelength applications," *IEEE Photonics Conference (IPC)*, paper IW1B.1, Oct. 2016.

*M. Moayedi Pour Fard:* Designed optical grating coupler for 850 nm, performed measurements on the fabricated device, wrote manuscript.

*C. Williams:* Initial background research, designed photodiode structures, aided the design of the grating couplers, layout of all variants, and revision of manuscript.

G. Cowan: Supervised the project, revision of manuscript.

O. Liboiron-Ladouceur: Supervised the project, revision of manuscript.

**C3) C. Williams**, B. Banan, G. Cowan, and O. Liboiron-Ladouceur," Demonstration of Mode-Division Multiplexing for On-Chip Source-Synchronous Communications," *Asia Communication and Photonics Conference*, Nov. 2015. (Nominated for best student paper)

C. Williams: Designed MDM structures, tested devices and wrote manuscript.

B. Banan: Supervised testing, revision of manuscript.

G. Cowan: Supervised the project, revision of manuscript.

O. Liboiron-Ladouceur: Supervised the project, revision of manuscript.

C4) C. Williams, B. Banan, G. Cowan, and O. Liboiron-Ladouceur, "Source-synchronous optical link using mode-division multiplexing," *Group IV photonics*, August 2015. (Poster presentation)

C. Williams: Designed MDM structures, tested devices and wrote manuscript.

B. Banan: Supervised testing, revision of manuscript.

G. Cowan: Supervised the project, revision of manuscript.

O. Liboiron-Ladouceur: Supervised the project, revision of manuscript.

#### Patents Not Directly Related to This Thesis

1) C. Williams, M. Ahmed, A. Rylyakov, R. Younce, Y. Liu, R. Ding and A. Ahmed, Jan. 1, 2019, U.S. Patent 10,168,596 "Optical Waveguide Modulator".

#### Peer-Reviewed Journal Papers Not Directly Related To This Thesis

1) M. Ahmed, T. Huynh, C. Williams, Y. Wang, R. Shringarpure, R. Yousefi, J. Roman, N. Ophir, A. Rylyakov, "34 Gbaud Linear Transimpedance Amplifier for 200 Gb/s DP-16QAM Optical Coherent Receivers," *Journal of Solid-State Circuits*, Mar. 2019.

#### Peer-Reviewed Conference Papers Not Directly Related To This Thesis

1) Y. Ma, C. Williams, M. Ahmed, A. Elmoznine, D. Lim, Y. Liu, R. Shi, T. Huynh, J. Roman, A. Ahmed, L. Vera, *et al.*, "An All-Silicon Transmitter with Co-Designed Modulator and DC-Coupled Driver," *Optical Communication Conference (OFC)*, Mar. 2019.

**2)** A. Ahmed, D. Lim, A. Elmoznine, Y. Ma, T. Huynh, **C. Williams**, L. Vera, Y. Liu, R. Shi, M. Streshinsky, A. Novack, *et al.*, "A 6 V Swing 3.6% THD >40 GHz Driver with 4.5x Bandwidth

Extension for a 272 Gb/s Dual-Polarization 16-QAM Silicon Photonic Transmitter", *ISSCC*, Feb. 2019.

**3)** A. Novack, M. Streshinsky, T. Huynh, T. Galfsky, H. Guan, Y. Liu, Y. Ma, R. Shi, A. Horth, Y. Chen, A. Hanjani, J. Roman, Y. Dziashko, R. Ding, S. Fathololoumi, *et. al.*, "A Silicon Photonic Transceiver and Hybrid Tunable Laser for 64 Gbaud Coherent Communication," *Optical Communication Conference (OFC) post deadline session*, Mar. 2018.

**4)** M. Ahmed, T. Huynh, C. Williams, Y. Wang, R. Shringarpure, R. Yousefi, J. Roman, N. Ophir, A. Rylyakov, "A 34 Gbaud Linear Transimpedance Amplifier with Automatic Gain Control for 200 Gb/s DP-16QAM Optical Coherent Receivers", *Optical Communication Conference (OFC)*, Mar. 2018.

#### THESIS ORGANIZATION

This thesis is divided into three topics, each of which is part of the same theme of robust shortreach optical communications. After an introductory chapter, the second chapter gives necessary basic background information on all topics covered in this thesis, along with a literature review of work done by others in the field. Chapter three presents the first topic of the work, MDM crosstalk and causes. It shows some of the mechanisms which shape the output spectrum and explains why crosstalk is hard to predict in fabricated devices. Chapter four discusses the second topic, an experimental source-synchronous architecture using mode-division multiplexing. Chapter five presents a reconfigurable optical receiver for use with parallel optics. This chapter, which shows design of the CMOS receiver along with experimental results, also discusses a layout-aware design methodology used to design the CMOS chip. Chapter six discusses work carried out on a novel all-silicon photodetector (Si-PD). Chapter seven contains conclusions of the work as well as a discussion of possible future research directions.

# CHAPTER 2 - BACKGROUND AND LITERATURE REVIEW

This chapter will discuss the background theory for the subjects discussed in this thesis. It will start first with the reason optical interconnects were chosen, leading into topics such as receiver architectures and photodiode theory.

#### FREQUENCY DEPENDENT LOSS

As CMOS technologies get smaller, transistors get faster due to several factors. One of these is that carriers have a shorter distance to travel along the channel between source and drain. Another improvement is smaller gates contribute to smaller loading capacitance, allowing for reduced time constants. One problem with devices getting smaller is that interconnections to these devices must also reduce in size if the goal is to maintain a high packing density. In doing so, this will result in higher parasitic capacitance and crosstalk as the conductors get closer together. These smaller wires also contribute to higher parasitic resistance for locally routed interconnects and higher parasitic inductance for globally routed interconnects. These parasitic elements all contribute to lower bandwidth and an increase in ISI of the link. Repeaters can reduce total wire delay for on-chip applications by breaking longer wire into shorter ones, however, results in area and power penalties [41], [42].

As data rates increase and the spectral content increases, the electromagnetic laws described by Maxwell's equations cause the current density distribution to start being pushed out of the center of the conductor and travel closer towards the edge of the wires, referred to as the skin effect [43][44]. This causes the effective cross-sectional width of the electrical interconnect to further reduce as frequency content moves higher, leading to higher resistance. In addition, wires,

ground planes or substrates in the vicinity also lead to other complex interactions, collectively referred to as proximity effects [45].

In the literature, it is clear that loss becomes large as interconnects get longer or data rates rise. To deal with frequency dependent loss, electrical interconnects use equalization [46] or several shorter interconnects along with repeaters to propagate a high-speed signal [47]. These solutions can be energy inefficient and will increase routing complexity [48]. To take examples, Figure 2-1 shows a backplane interconnect application from [49], showing input and output eye diagrams over a 30 inch trace with a loss of > 20 dB at 5 GHz. From [50], Figure 2-2 shows a loss of 25 dB over a 20 inch printed circuit board (PCB) trace at 9 GHz, with eye diagrams demonstrating the output of the trace with and without electronic equalization. In [48], a loss of 35 dB over a 7 mm on-chip interconnect at 10 GHz in Figure 2-3 is reported.



Figure 2-1 – Image showing input and output eye diagrams at 10 Gb/s for a backplane application. The backplane trace shows a loss of > 20 dB at 5 GHz [49].



Figure 2-2 - (Left) S-parameters of 20 and 30 inch PCB traces showing > 25 dB loss at 9 GHz. (Right) Output eye diagrams of PCB trace for 17 Gb/s signal (a) without and (b) with equalization [50].



Figure 2-3 – S21 measurements of 5 and 7 mm on-chip electrical interconnect [48].

One proposed solution to this problem is using optical interconnects at the on-chip or chip-tochip scales, which do not show the same frequency dependent loss characteristics as electrical interconnects. The work in this thesis uses the fact as a starting point and investigates one type of optical multiplexing technique, mode-division multiplexing, as a method of high speed data transfer at these short distance scales.

#### **MODE-DIVISION MULTIPLEXING**

When solving the wave equations for a waveguide (thus applying boundary conditions), valid solutions are referred to as guided modes. There are a finite number of guided modes that allow a wavelength to propagate with minimal loss. There are an infinite number of lossy modes, and as their name suggests, these modes have a large amount of leakage. If these modes are excited, the energy will quickly be lost in the cladding.

The wave equation solutions thus depend on the sizing of the waveguide. In fabrication, we are usually limited to the waveguide heights supported (90 nm and 220 nm for example in the chosen technology), so the waveguide width is used as a design variable. A difference in the refractive index between the waveguide and the cladding is what allows the wave to be guided. If the waveguide's index is higher than the exterior (oxide or air) then total internal reflection is possible (and thus guiding).

Coupling of light between two waveguides is an important phenomenon in photonics. If we place two waveguides next to each other, energy transfer begins to occur. The distance between the two waveguides, the effective index of each, and the length over which they stay in that proximity will determine how much energy is transferred. The transfer of energy is periodic, meaning that the optical energy will move between waveguides as shown in Figure 2-4. This image is a top view of a directional coupler, and shows the energy found in two waveguides in close proximity to each other. The color scale indicates the normalized optical intensity. The coupling length is the length associated with the peak-to-peak transfer distance for a specific wavelength in a particular waveguide, at a specified gap.



Figure 2-4 – Lumerical simulation showing cross-over distance between two waveguides. Color scale indicates normalized optical intensity, while *X* and *Y* indicate spatial location. Waveguide structures shown using dotted lines.

Figure 2-5 from [51] demonstrates the functionality of the mode-multiplexer. The transmitter sends out three optical signals, of equal wavelength (1550 nm), into three separate waveguides of equal width (450 nm). Within these waveguides only a single mode is present, the fundamental. Recall that the number of modes present is determined by the wavelength and the waveguide width. These waveguides are then brought within close proximity to a microring resonator, which results in light coupling between the two structures. Design parameters of the ring, such as the gap between the two structures and the radius, will create a resonance effect whereby only a
limited band of wavelengths can be supported and thus coupled. The same is true on the upper side of the microrings, where the gap and coupling length determine if energy can be transferred between the structures. For coupling to occur, the effective index of the mode in the transport waveguide (450 nm, 930 nm or 1.41  $\mu$ m) must equal the effective index of the microrings. In the figure, the first case shows that the single-mode microring will couple to the TE<sub>0</sub> mode in a 450 nm waveguide. In the middle case, the same single-mode microring will couple to the second-mode, TE<sub>1</sub>, given another set of design parameters (coupling distance, gap, temperature, etc.). In theory, this can continue for as many modes as desired, providing that the parameters of both the microring and transport waveguide are altered accordingly.



Figure 2-5 – Mode-multiplexing technique from [51].

In order for mode coupling to occur without continuously adding data on the same mode, the behavior of the mode in the waveguide must be understood. As the waveguide increases in size, the lower modes get spread out, and the effective index associated with each mode increases. This is captured in Figure 2-6 from [51], which illustrates the effective index of each TE mode as the waveguide width is increased. As can be seen, with the width of the waveguide fixed, each mode will be experiencing a different effective index.



Figure 2-6 – Modes found in a multimode waveguide.

Any of these optical modes can be used to transmit useful information. Using this idea, a concept from electrical links, clock forwarding, was applied to this new optical multiplexing method and an investigation was done on the challenges associated with the proposed architecture. Sending a clock associated with the transmitter using one of the optical channels has interesting implications on the receiver design, which will be discussed in the next section.

Packing density is an important metric, especially for an integrated chip. In these cases where chip real estate can be precious, wasted space results in higher costs or unneeded routing complexity. When two waveguides are within close proximity, a super-mode is created and analysis can be done to find the distance over which periodic energy transfer will occur as in [7]. This gives a need to bring the waveguides as close as possible together, but not so close as to suffer strong coupling effects between them.

Assuming an SDM application with waveguide widths of 500 nm, an eigenmode solver (Lumerical's MODE) was used to find coupling strength between neighboring waveguides. The assumption is that if a waveguide is routed for 2 cm for on-chip applications, the coupling length

should be at least an order of magnitude greater. This works out to about -8 dB of optical crosstalk over the 2 cm. It was found that this results in a minimum distance between parallel single mode waveguides of 1.2 µm, with a greater pitch required for longer distances. This would result in a cross-sectional width of 3.9 µm for three single-mode waveguide channels over 2 cm (three 500 nm waveguides with a pitch of 1.2 µm). Comparing this to an MDM implementation, a three-mode waveguide would measure approximately 1.5 µm. For multiple MDM waveguides running in parallel, this would require a pitch of 1.15 µm between each. As the number of waveguides increase, the cross-sectional width can become a significant amount of chip area. In these cases, the amount of area saved on waveguide pitch will allow for higher integration.

#### SOURCE-SYNCHRONOUS ARCHITECTURES

One method of clock recovery is to use a clock and data recovery (CDR) architecture, where transitions in the data channel are used to synchronize a locally generated clock signal at the receiver allowing alignment of the clock and data. An example of one implementation of a CDR is illustrated shown in Figure 2-7(a), where the oscillator used on the receiver side is phase adjusted to the data using a larger CDR control loop. The second method, clock-forwarding, uses a reserved transmission channel, in addition to the data channel(s), to allow the transmitter to send the clock along with the data and shown in Figure 2-7(b).



Figure 2-7 - Comparison of (a) CDR and (b) clock-forwarded (source-synchronous) receiver architectures.

In general, a source-synchronous link can be less complex and power-hungry than a CDR approach [18]. They also possess an important characteristic related to the jitter tolerance of the receiver. Jitter from the transmitter-side clock is unavoidably mixed with the data stream. By using a clock signal at the receiver that is derived from the same transmitter-side clock as the data, the receive-side clock jitter is correlated with that of the data stream without having to directly detect data jitter [28]. A source-synchronous architecture, however, has overhead power due to the added clock receiver and an extra interconnect lane needed to transmit the clock that must be taken into consideration. For this reason, the source-synchronous approach is often used in parallel links where the overhead power from the clock receiver is amortized over several lanes [52].

Jitter is defined as deviations of a signal's actual zero-crossing to its ideal zero-crossing location. This deviation can be caused by thermal noise in circuits or deterministic phenomena. Jitter is unavoidably mixed with the data stream, and thus the jitter of the transmitted data and the forwarded clock are correlated. In data at speeds of 10 Gb/s and higher, the main source of noise is from the power supply [25]. By using a clock signal at the receiver that originated from the same transmitter as the data, the clock will also contain jitter which is correlated to that of the

data stream [28]. This allows the receiver-side clock to track the data edges and can result in improved data reception.

Skew is the time difference between multiple signals on different signal paths, due to different transmission distances or speeds. For the electrical domain example, the common PCB trace comes to mind where signals on a longer trace will arrive after signals on a shorter trace. This is due to the different path lengths, as electromagnetic radiation has a finite propagating speed. In the optical domain, different modes propagate at various velocities, v, depending on their effective index of refraction given by (1).

$$v = \frac{c}{n_{eff}} \tag{1}$$

Where c is the speed of light in a vacuum and  $n_{eff}$  is the effective refractive index. In the context of mode multiplexing in a source-synchronous link, skew has been identified as a possible issue, particularly due to the forwarded clock at longer distances.

Whether they are different physical lines (space-division multiplexing or electrical interconnects), different wavelengths (WDM interconnects) or different modes (MDM interconnects), the clock and data both travel on different channels. This introduces a difference in arrival times between the clock and the data due to a difference in the propagation velocity. As the skew increases for a given data channel, the correlation between the clock and the data will reduce. As a basis, let us consider the case in Figure 2-8(a) of an ideal, jitter-free clock with velocity and length-matched clock and data paths. Here the clock edge that launches the data is the same that latches it in on the receiver end. If now the path that the data travels is increased relative to the clock by exactly one bit, then the clock edge that launched the bit will no longer be the same that receives the bit, illustrated in Figure 2-8(b). Different path lengths here are equivalent to differences in mode or wavelength propagation speed in optical waveguides. In

Figure 2-8(c), jitter from the transmitter is added to both the clock and data signals with unequal path lengths. Different amounts of jitter are present on each bit at the transmitter, and drawn to be correlated to the jitter on the clock. On the receiving side in (c), the data that was delayed by one bit now has less jitter correlation with the clock signal that is clocking it in. If the jitter frequency is low compared to the bit rate, then edges of neighbouring bits will be relatively similar and thus will not greatly affect transmission. If, however, the jitter frequency is high compared to the bit period, then jitter correlation between neighbouring bits becomes low (as is the case in Figure 2-8(c)).



Figure 2-8 - (a) Ideal data transfer for jitter-free source-synchronous operation, (b) clock delayed (skewed) from data by one bit period after transmission and (c) source-synchronous operation with jitter and skew added.

As shown in [54], differential jitter is the difference between data and clock jitter. The differential jitter term reduces to zero in an ideal case where both are equal and no delays present, where jitter tracking is then perfect. Using a block diagram in Figure 2-9 to model this difference of arrival times, the time domain response of the system y(t) is shown in (2). This is the result of the combination of a path with no delay, x(t), and a path with a delay of  $\Delta T$ . Next, a Fourier transform is performed on the signal to convert to the frequency domain, resulting in (3).



Figure 2-9 – Block diagram showing time domain response at the output, y(t), due to input x(t) and delayed input  $x(t-\Delta T)$ .

$$y(t) = x(t) - x(t - \Delta T)$$
(2)  
$$Y(jw) = X(jw) - X(jw)e^{-jw\Delta T}$$
(3)

The overall transfer function of the system, H(jw), in (4) is thus equal to the normalized jitter (gain of system) in (5), derived from (3).

$$H(jw) = \frac{Y(jw)}{X(jw)}$$
(4)  
$$|J_{NOR}(\omega)| = |1 - e^{-j\omega\Delta T}|$$
(5)

The authors in [54] proposed a transfer function for jitter in skewed clocking using the transfer function in (5). This led to defining equation (6) for the minimum frequency,  $f_j$ , where the normalized differential jitter,  $J_{nor}$ , at the output is equal to the jitter at the input (unity gain) and related to the skew of the system,  $\Delta T$ . Below this frequency, the differential jitter is tracked by the system and results in lower differential jitter at the output ( $J_{nor} < 1$ ).

$$|J_{NOR}| = 1 \to f_j = \frac{1}{6\Delta T} \tag{6}$$

If the skew increases,  $f_j$  lowers resulting in reduced tracking ability of the system. In the worstcase scenario, the differential jitter terms become 180 degrees out of phase (clock relative to data) and the effective differential jitter is higher than either of the single jitter terms. This is referred to as jitter amplification  $(J_{nor} > 1)$ , and is illustrated in Figure 2-10. In this example, both the correlated clock and data jitter is represented as a 100 MHz sinusoidal fluctuation in zero crossing. In Figure 2-10(a), the delay in arrival time between the clock and data is 500 ps, equivalent to 5 UI at 10 Gb/s and in (b) a delay is 5 ns, equivalent to 50 UI at 10 Gb/s. These skew values were chosen as an example to show the extreme cases of low and high skew situations at 10 Gb/s operation. The differential jitter is the resulting difference between the two jitter terms, which is attenuated at low skew (normalized jitter  $< 1 \text{ UI}_{p-p}$ ) in (a) and amplified in high skew situations (normalized jitter > 1  $UI_{p-p}$ ) in (b). To reiterate, for low skew scenarios optimum jitter tracking bandwidth should be high to track correlated jitter. As skew increases, the amount of jitter that should be intentionally filtered increases (tracking bandwidth reduces). This is because the correlated noise cancellation no longer helps after large skews due to the nature of the differential jitter, and the possible jitter amplification in out-of-phase circumstances. Many designs thus use an adjustable JTB to accommodate different path delays and bit rates.



Figure 2-10 - 100 MHz sinusoidal jitter on clock and data with (a) 500 ps of delay (5 UI of skew at 10 Gb/s) and (b) 5 ns of delay (50 UI of skew at 10 Gb/s).

Added noise beyond the correlated jitter of the source-synchronous link is undesirable, as it does nothing but degrade transmission by increasing uncertainty in the clock crossings. It is thus important to investigate crosstalk mechanisms of the modes transmitting the clock and data to understand limitations of the method.

### MODAL CROSSTALK

Complex simulators, such as Lumerical's MODE, use numerical methods to calculate the complex movement of many modes in a uniform medium [7]. Many authors have explored the mathematics behind mode coupling in MDM architectures to understand the fundamentals of wave propagation and mode interaction. In most cases, solving the coupled mode equations become very complex since the equations require a large amount of information on the waveguide structure. However, many of these waveguide variables are unknown in laboratory settings when dealing with actual fabricated devices [35]. It is possible to predict mode propagation characteristics in deterministic structures, however, with unknown or random parameters it becomes impossible.

Random variation in a waveguide's dimensions during the fabrication process will cause changes to the effective index of the structure. These changes to the refractive index of a waveguide will alter the properties of guided modes in an MDM interconnect. In an ideal straight waveguide with multiple modes, no modal crosstalk occurs. Modes are orthogonal solutions to the wave equation and cannot interfere with other modes [55]. However, coherent light with the same polarization within the same mode can interfere [55]. Crosstalk, involving one mode leaking into another mode, occurs when perturbations in the waveguide structure are present in a fabricated, non-ideal device [35], [56]. This causes the modes to redistribute energy amongst themselves at the interface of a change in the waveguide structure.

A second source of crosstalk is found in the multiplexing and demultiplexing stages. As with all integrated designs, variation of parameters due to fabrication errors causes a deviation in the intended functionality of the device. In MDM structures, variation due to fabrication affects the mode coupling mechanism in the multiplexing stages and causes channels to couple to incorrect modes. In [29], a design for an integrated mode-multiplexing circuit is presented which incorporates the idea of creating a robust design against process variation. The proposed method uses a tapered coupler, in place of a straight one.

In this thesis, one of the investigations focuses on crosstalk due to device variation. The crosstalk spectrum is also found to be correlated to the length of the MDM interconnect and is discussed. Using simulations and experimental results, the effects are analyzed for a wideband MDM application using the crosstalk spectrum. The correlation between MDM crosstalk and eye closure is also discussed, which then allows us to make estimations about the bit error rate of the interconnect. From an application standpoint, it is important to understand the limitations of MDM crosstalk for the proposed source-synchronous link. Using one optical mode as a clock channel and two as data channels, experiments are carried out. This is the first demonstration of using the MDM architecture for such an application. These experiments confirm that crosstalk to

the clock carrying channel must be kept low to avoid an exponential increase in clock jitter as well as bit error rate.

### **PHOTODIODES**

In the hopes of making a CMOS compatible photodetector using the same silicon material throughout the link to reduce packaging complexity and bandwidth lowering parasitics, another project discussed in this thesis deals with all-silicon photodiodes.

Each photon has a fixed amount of energy equal to:

$$E_{photon} = hv = \frac{hc}{\lambda}$$
(7)

Where *h* is Plank's constant (6.63x10<sup>-34</sup> J·s), *v* is the frequency of the light, *c* is the speed of light in a vacuum (3x10<sup>8</sup> m·s) and  $\lambda$  is the wavelength of light [57]. This simple equation shows that as the wavelength gets smaller, the photon energy get larger. This is important in understanding why some materials react to a band of wavelengths and others do not.

Photodetectors absorb light and create electron-hole pairs. This occurs because the energy of the incident photon is larger than the material bandgap. If this requirement is satisfied, then an electron can absorb the photon energy and be injected into the conduction band (using the energy band visualization), and leave behind a hole in the valence band. When the light penetrates the surface of the substrate, absorption of light happens over a distance *d* related to the wavelength of incident light. The photons absorption characteristics follow an exponential decay, referred to as the normalized photon flux in Figure 2-11, and following equation (8) showing photons per second per unit area [58].

$$\boldsymbol{\varphi}(\boldsymbol{d}) = \boldsymbol{\varphi}_{\boldsymbol{o}} * \boldsymbol{e}^{(-\boldsymbol{\alpha} * \boldsymbol{d})} \operatorname{m}^{-2} \operatorname{s}^{-1}$$
(8)

Where  $\alpha$  is an absorption coefficient,  $\varphi_o$  is incident flux and *d* is depth into the substrate. For example, the value of  $\alpha$  in silicon is 0.0535 /µm at 850 nm and 0.414 /µm at 600 nm. These two wavelengths are given as examples to show the effect on penetration depth as the wavelength gets shorter. The penetration depth is equal to  $1/\alpha$ , where the remaining photon flux is only 37% of the initial value. This is to say that for 850 nm wavelength, the penetration depth is 18.7 µm and 2.4 µm for 600 nm. These values are the minimum length needed to collect 63% of the incident photons.



Figure 2-11 – Photon absorption in silicon.

Charge carriers are grouped into two categories: diffusion currents and drift currents. Diffusion currents are caused by carriers generated outside of the charge-free or depletion region, and the transport mechanism is slow. Drift currents are caused when carriers are accelerated by the electric field found in the depletion region, and are much faster than their diffusion counterparts [57]. As diffusion currents will take longer to get to the metal contacts, this can either cause minority carrier recombination or ISI by adding a slow tail to the time domain

response of the output. These two scenarios are presented in Figure 2-12 along with their respective time-domain eye-diagrams.



Figure 2-12 – Drift and diffusion currents in photodiodes.

The built-in potential of a p-n structure (or p-i-n when using an intrinsic region) refers to the potential created across the junction in thermal equilibrium. The conduction (*Ec*) and valence (*Ev*) bands for separated *p*-type (*p*), intrinsic (*i*) and *n*-type (*n*) materials are shown in Figure 2-13. When the three are brought into contact, the conduction and valence bands must be adjusted since the equilibrium energy levels of the electrons, Fermi levels, must be continuous throughout the system. The built-in potential  $\varphi_i$  exists at the interface of the two different doping types for a p-i-n structure, where *qX* is the electron affinity and is an attribute of silicon, equal to the difference between the conduction band and the free-electron energy. To draw this image, we are assuming the depletion approximation at the interface of the two doping materials, where the majority-carrier density changes abruptly. The difference in work functions,  $q\varphi s_{n,p}$ , creates the built-in potential is an attribute derived from the doping of the *n* and *p* regions, and so the intrinsic region, *i*, has no influence on this value. However, it will

lower the electric field because the distance between the uncompensated dopant ions in each n and p region are now further from each other.



Figure 2-13 - (Top) Separate p-type, intrinsic and n-type doped material at equilibrium, far from the interface; (Bottom) The interface of all three doped materials showing continuous Fermi-level.

Equation (9) can be used to find the built-in potential  $\varphi_i$  of a PN diode [58]:

$$\boldsymbol{\varphi}_{i} = \boldsymbol{v}_{t} * \ln\left(\frac{\mathrm{Na}*\mathrm{Nd}}{\mathrm{n}_{i}^{2}}\right) \tag{9}$$

Where  $N_a$  and  $N_d$  are the acceptor (hole) and donor (electron) densities and  $n_i$  is the intrinsic freecarrier density of the silicon material. The charge free (depletion region) width,  $x_d$ , exists at the interface of a PN junction in thermal equilibrium. Equation (4) shows that this depletion region depends on the bias voltage Va that is applied to the junction and on the doping level of the regions [58].

$$\mathbf{x}_{\mathbf{d}} = \frac{2 \ast \varepsilon_{\mathbf{s}}}{q} \ast \left(\frac{1}{\mathsf{Na}} + \frac{1}{\mathsf{Nd}}\right) (\boldsymbol{\varphi}_{\mathbf{i}} - \mathbf{V}_{\mathbf{a}}) \tag{10}$$

Another method of increasing the depletion region is to add an area of undoped intrinsic silicon in between the n and p doping profiles to create a PiN structure. The larger the area, the

more carriers that will be created in the electric field region and accelerated in the semiconductor as drift current. On the downside, the wider the region the lower the internal electric field, as described by (11) [58].

$$\mathbf{E} = \frac{2*(\boldsymbol{\varphi}_{i} - \mathbf{V}_{r})}{\mathbf{X}_{d}} \tag{11}$$

The maximum velocity of a particle in a semiconductor is referred to as the saturation velocity,  $V_{sat}$ . Holes travel slower than electrons in the silicon lattice, with a mobility of 450 cm<sup>2</sup>/V·s compared to 1400 cm<sup>2</sup>/V·s for electrons. This particle travel time is the basis of carrier transit limitations, and so we would like to maximize the particle's velocity. Maximizing the velocity will minimize the travel time and thus increase the speed (bandwidth) of the device. While true that transit time can be reduced by using a narrow intrinsic region or highly doped *n* and *p* regions, this will decrease the quantity of photons absorbed (responsivity) and also increases capacitance (higher RC time constant). This results in a trade-off between responsivity and bandwidth in a photodetector.

As an example, using (12), a charge-free region of 10  $\mu$ m wide will yield a junction capacitance of 0.0103 fF /  $\mu$ m<sup>2</sup>, with a transit time of 100 ps found from (13) [58]. If we narrow this region to 1  $\mu$ m wide, this will yield a junction capacitance of 0.103 fF /  $\mu$ m<sup>2</sup> with a transit time of 10 ps. The junction capacitance will affect the RC bandwidth limitation whereas the particle travel time sets the transit time limitation.

$$C_{j} = \frac{\varepsilon_{s}}{x_{d}} = \frac{11.7*8.854 \times 10^{-18F}/\text{um}}{x_{d}}$$
(12)  
$$\tau_{\text{transit}} = \frac{\text{Distance}}{\text{Speed of Particle}}$$
(13)

In a standard CMOS process, doping layers are usually designed with transistors or guard rings in mind, and so a CMOS photodetector designer's tool box is limited. Ideally, a low-doped or intrinsic region would be implanted within the substrate, forming a p-i-n structure (p-type semiconductor, intrinsic region and n-type semiconductor). Three basic photodetectors that are classically used are illustrated in Figure 2-14.



Figure 2-14: Side view of substrate for (a) shallow n-well, (b) spatially modulated light detector and (c) silicon-oninsulator. All PDs are excited by optical input (red traces) from the fibers (blue cylinders) above each. Photon movement and carrier movements are illustrated for all three designs.

In Figure 2-14(a), shallow doping of an N+ region along with reverse biasing of the PN-junction structure gives rise to a region of depleted charge causing an electric field. As photons penetrate the surface and create electron-hole pairs (charge carriers), they contribute to drift (fast) or diffusion (slow) currents. If the charge carrier arrives at the depletion region quickly, then it spends more time as drift current and thus can be moved out of the photodetector fast. If, however, the photon penetrates deeper, then the charge carrier will spend more time as diffusion current and will be the speed limitation of the photodiode, causing inter-symbol interference (ISI) and pulse-spreading.

Spatially modulated light (SML) detectors help eliminate the slow-moving diffusion carriers using an alternate method [59]. Figure 2-14(b) shows the concept behind the SML detector. Here, two identical photodiodes are placed within close proximity; however the one on the left has a metal layer masking the top. As light is incident on the top, photons propagate into the substrate on the right photodiode, but get blocked on the left by the mask. The drift current is collected by the unmasked photodiode just as previously seen. The interesting part is that the diffusion currents deep in the substrate will have an equal probability of going into either p-n diode structure. The current from the left photodiode contains both drift and diffusion currents and the photodiode on the right contains only diffusion current. This is then passed to a differential transimpedance amplifier, where the diffusion current will be seen as a common-mode disturbance and eliminated from the output, leaving only the drift current to propagate to the following stage. In [60] and [61], the authors varied the mask shapes to see the effect of the photodiode dimensions. The work showed that different mesh patterns allowed for smaller RC delay, resulting in a larger bandwidth and thus faster devices.

In an effort to increase responsivity, the authors in [62] used grating couplers over a lateral p-n junction structure in an SOI technology, similar to the illustration in Figure 2-14(c). The design targeted 850 nm applications and is shown in Figure 2-15. The design in [62] used p-i-n structures running perpendicularly to the light input direction. The novel design increased quantum efficiency by four times compared to the reference design without the grating coupler. Due to the large size of the device, however, the bandwidth was limited to 4.1 GHz.



Figure 2-15 - Cross-sectional view of a p-i-n photodiode grating coupler on SOI substrate [62].

For this project, we used a focused grating coupler to design the photodiode. This all-silicon photodiode uses a grating-assisted coupler to redirect vertically applied light to the horizontal direction and a lateral PiN reverse-biased junction which then absorbs the 850 nm laser light.

The lateral p-i-n structure is designed to be parallel to the travelling light's direction, and allows for the creation of more drift current along the entire length of the structure. This leads to a faster PD with a higher responsivity.

# CHAPTER 3 - MODAL CROSSTALK IN SILICON PHOTONIC MULTIMODE INTERCONNECTS

This chapter investigates modal crosstalk in silicon photonic MDM-based interconnects using tapered multiplexers. Crosstalk from coherent optical interference originates from variation in the physical structure and alters the transmission link performance. Through simulations and experimental work, optical crosstalk as a function of wavelength is analyzed to understand its impact in MDM and MDM-WDM dual-multiplexing applications. The detrimental effects are validated in the frequency and time domains through fabricated MDM interconnects of various lengths. Results indicate modal crosstalk must be < -22 dB to maintain a BER of  $10^{-12}$ . The experimental methodology assesses the optical modal crosstalk's impact on the data, towards a mitigation approach to improve the payload signal integrity and enable system-level optimization such as channel wavelength allocation.

### MODAL LEAKAGE IN MDM INTERCONNECTS

As the crosstalk power between one signal channel (aggressor) and another (victim) increases, it can become detrimental to the latter's transmission. This crosstalk comes from different sources, one of which is modal crosstalk [35]. In an ideal straight waveguide, it is nonexistent since modes are mathematically orthogonal and theoretically do not interfere with other modes in the interconnect [56] [63]. However, crosstalk involving one mode coupling into another can occur when perturbations or surface roughness in a fabricated waveguide structure are present. These perturbations cause a redistribution of energy in the modes, a phenomenon previously investigated [35], [56]. Another source of crosstalk is mode leakage, where a mode is incorrectly coupled to or from a mode in a bus waveguide. Although mode crosstalk occurs within the

interconnect waveguides [35, 56, 63], leakage as a form of crosstalk, specifically in tapered multiplexers, is the focus of this paper..

Mode multiplexers (MUX), used for coupling signals in and out an optical multimode bus waveguide, have been implemented in numerous ways using passive components [13]. One approach, illustrated in Figure 3-1, is an asymmetric directional coupler (ADC) used to excite supermodes between the waveguide that adds and drops a specific mode and an appropriately designed bus waveguide supporting multiple modes [7]. In the image, port 1 handles mode 1 (M1) transmission and port 2 handles mode 2 (M2) transmission. The coupling between the add/drop waveguide (WG) and the bus waveguide in the ADC is sensitive to fabrication variations in the waveguides' physical dimensions. To find the most effective dimensions to vary, simulations similar to those in [29] were carried out to evaluate the sensitivity of the ADD WG and BUS WG to variation. It was observed that the ADD WG effective index changes twice as much as that of the BUS WG effective index over a given span. It was thus chosen to use the ADD WG width as the dimension to demonstrate fabrication sensitivity.

These sensitivity simulations are explored for a MUX ADC using FDTD simulation in which a directional coupler (length 34  $\mu$ m; gap 150 nm) adds the mode through a waveguide (ADD WG) with its width varying +/- 10 nm from its designed width of 430 nm to account for fabrication variation within a die [7], [29]. Figure 3-1(b) illustrates the transmission spectrum of the first fundamental mode M1 (TE0) which is sent through the add waveguide input port (ADD WG) coupling to the second mode M2 (TE1) in the bus waveguide. This is done while the add waveguide is subject to width variations. The width variations result in a spread of transmission for a given wavelength, for example > 6 dB at 1570 nm as plotted in the subfigure of Figure 3-1(b). This suggests that within even the same die the multiplexer design may fail to couple

modes between the add and the bus waveguides, resulting in poor transmission or failure of the multiplexed link.

Coupling of energy between the multiplexer waveguide and bus waveguide is determined by the phase-matching condition, related to the propagation constants of both modes [35]. As shown in [51], coupled mode theory indicates that the closer the two propagation constants are, the higher the coupling strength between the two modes. The coupling strength between the bus and add waveguides is thus highest for the modes intended to couple (i.e., input port 2 to mode 1 in the bus waveguide), indicating most of the energy transfers to this mode. However, weak coupling may still occur unintentionally amongst any other guided modes or an infinite number of radiative (lossy) modes of the bus waveguide. Since radiative modes do not guide light well and the energy coupled to them dissipates quickly [7], we will assume any light at the output ports were transmitted on guided modes. It is thus the strength of the mode coupling to a given guided mode that will determine the leakage and ultimately measured crosstalk. The signal that is coupled to the incorrect mode (i.e., coupled to M1 instead of M2) in the bus waveguide is presented in Figure 3-1(c). The inset image shows how the crosstalk spectrum also changes in the presence of parameter variation, for example with a spread of 8 dB at 1545 nm. Crosstalk will be considered as the ratio of the sum of optical power, P, leaked from all other modes over the expected signal power of the given output port, an example of which is given in (14). That is, the ratio of power at a given wavelength in Figure 3-1(c) to that in Figure 3-1(b). In equation (14), the crosstalk at port 1, XTalk<sub>Out Port1</sub>, is the ratio of the optical power leaked from port 2 to the output port 1,  $P_{in Port2}$ , over the optical power transmitted from port 1 to that waveguide output (Output Port 1), *P*<sub>In Port1 ,Out Port1</sub>.

$$XTalk_{Out\_Port1} [dB] = 10 \log \left( \frac{P_{In\_Port2} \rightarrow Out\_Port1}{P_{In\_Port1} \rightarrow Out\_Port1} \right)$$
(14)



Figure 3-1 - (a) Directional coupler illustrating the input signal coupling from the add port WG to the bus WG, with incorrectly coupled signal from ADD WG shown as leakage; (b) transmission characteristics of M2 from add WG at the MUX output, with a subplot of transmission across width variation at 1570 nm as an example; (c) crosstalk characteristics of M2 to M1 from add WG at the MUX output, with a subplot of crosstalk across width variation at 1545 nm as an example. Both (b) and (c) are for ADD WG width variations between -10 nm and 10 nm.

To make the coupler more tolerant to waveguide width variation, a tapered coupler, illustrated in Figure 3-2, has been proposed [29]. Tapered couplers use a waveguide that gradually varies in width along the length of the coupling region.



Figure 3-2 - Two-mode MDM interconnect structure using tapered couplers.

A tapered multiplexer (length 103 µm; gap 150 nm) as in [16] and [31] is simulated and subjected to the same +/- 10 nm variation conditions as the ADC in Figure 3-1. On the MUX side, as with the ADC, the mode 2 (M2) input port is excited but not mode 1 (M1). The simulated transmitted output for both M2 and M1 is illustrated in Figure 3-3(a) and (b), respectively. The transmitted signal in Figure 3-3(a) shows much smaller variation in signal transmission across wavelength and width variations than its equivalent signal in the ADC. Crosstalk in Figure 3-3(b), however, still shows a large variation both across wavelengths for a given waveguide width and across widths for a given wavelength (e.g., subfigure in Figure 3-3(b) at 1575 nm). The crosstalk is thus wavelength dependent and caused by incorrect coupling, where the crosstalk mode is excited (M1) instead of the transmission mode (M2) due to effective index mismatch. The plots in Figure 3-3 demonstrate the optical spectrum profile of the crosstalk where only the MUX is subjected to width variation. However, the couplers on both sides of the interconnect suffer from these variations. This leads to complex wavelength dependencies in the transmission spectrum.



Figure 3-3 - Simulated tapered MDM MUX transmission for (a) M2 MUX input port to M2 mode in the bus interconnect and (b) M2 MUX input port to M1 mode in the bus interconnect while varying the drop WG width from -10 nm to 10 nm. Subplot in (b) shows crosstalk across width variation at 1575 nm as an example.

As discussed briefly in [64], it was observed that the length of the MDM interconnect impacted the crosstalk spectrum. To investigate, simulations are conducted on the structure in Figure 3-2 using an ideal multiplexer, interconnect, and demultiplexer. The multiplexer and demultiplexer are identical and without parameter variation, with the only change being in the length of the interconnect between the two. Interconnect lengths of 100  $\mu$ m, 250  $\mu$ m, 750  $\mu$ m, and 1 mm are simulated and plotted in Figure 3-4, with the through mode transmission (i.e., M1 to M1 through input port 1 to output port 1) used to normalize the crosstalk mode (i.e., M2 to M1 from input port 2 to output port 1). The spectrum of the crosstalk shows an interferometer-like shape. This originates from the interference between the multiplexer and demultiplexer leakage signals. The free-spectral range (FSR) is the distance between nulls and is indicated for each interconnect length. The values of the peaks and nulls is also seen to vary across the spectrum. This is expected since the coupling properties of the tapered coupler will vary across the usable range of wavelengths.



Figure 3-4 - Simulated crosstalk using ideal structures (MUX, interconnect, and DEMUX) for interconnect lengths of (a) 100  $\mu$ m and 250  $\mu$ m; (b) 750  $\mu$ m and 1000  $\mu$ m. FSR is indicated for each interconnect length.

The MUX can be divided into two parts, the bend and the taper. For the taper, the crosstalk can be derived using coupled-mode theory (CMT) as previously demonstrated by Lipson [51]. However, for the bend, since the gap and width vary together, the coupling coefficient between add and bus waveguide vary as well. Making an analytical expression of the coupling coefficient for CMT becomes challenging. For practical purposes, a commercially available and widely used simulation software Lumerical FDTD is thus employed to get the crosstalk amplitude instead of using the CMT method.

When the input is the M2 input port, two crosstalk components are derived as having the following form

$$E_{XT\_MX} = a_{XT\_MX} e^{j\varphi_{XT\_MX}} e^{j\frac{2\pi n_{eff1}}{\lambda}L}$$
(15)

$$E_{XT_DX} = a_{XT_DX} e^{j\varphi_{XT_DX}} e^{j\frac{2\pi n_{eff2}}{\lambda} \cdot L}$$
(16)

where  $E_{XT_MX}$  is the crosstalk field (*XT*) that is generated by the input signal at the input port 2 propagating with *neff*<sub>1</sub> effective index along the interconnect (i.e., crosstalk in Figure 3-3(b)).

 $E_{XT_DX}$  is the crosstalk that is generated by the output signal M2 signal that fails to couple at the demultiplexer output port 2 (DROP WG), after it propagates through the interconnect with *neff*<sub>2</sub> effective index. Variables  $a_{XT}$  and  $\phi_{XT}$  are the amplitude and phase of the crosstalk at the multiplexer (MX) demultiplexer (DX). Since these two propagate at different effective indexes, they will interfere with each other at the output of the interconnect. The intensity, *I*, at the output port has the form in (17):

$$I = |\mathbf{E}_{\mathbf{XT}_{\mathsf{MX}}} + \mathbf{E}_{\mathbf{XT}_{DX}}|^2$$

 $= |a_{XT_MX}|^2 + |a_{XT_DX}|^2 + 2|a_{XT_MX}||a_{XT_DX}|cos\left[\left(\varphi_{XT_MX} - \varphi_{XT_DX}\right) + \frac{2\pi(n_{eff2} - n_{eff1})}{\lambda}L\right]$ (17)

The first argument in the cosine function of equation (17) deals with the difference in phase between the two interfering signals. The second argument describes the difference between the phase constants of each interfering mode, which is sensitive to the waveguide length *L* and the wavelength  $\lambda$ . We can observe in Figure 3-4 that the actual crosstalk amplitude across the spectrum does not increase significantly between the 100 µm and 1 mm interconnects; however, the spectrum pattern does change giving more peaks and valleys as the length *L* becomes longer. If one ignores the phase term  $\varphi_{XT_MX} - \varphi_{XT_DX}$  in equation (17), the FSR in the crosstalk spectrum can be calculated using the equation derived in (14):

$$FSR = \frac{2\pi}{\frac{d[(\varphi_{XT}_{MX} - \varphi_{XT}_{DX})]}{d\lambda} + \frac{2\pi(\Delta n_g)L}{\lambda^2}} \approx \frac{\lambda^2}{(\Delta n_g)L} \quad (for \ L > 600 \ \mu m)$$
(18)

Here  $\Delta \phi$  is the difference in phase between the modes,  $\Delta n_g$  is the difference in effective group index and L is the length of the interconnect. However, equation (18) is only valid when the phase terms can be neglected in equation (17) (i.e.,  $\frac{2\pi(n_{eff2}-n_{eff1})}{\lambda}L \gg \phi_{XT_MX} - \phi_{XT_DX})$ ), indicating it should only be used on interconnect lengths where the condition is met. In the following sections, we will show that equation (18) aligns with simulations and that experimentally the length condition is met for interconnect lengths approximately greater than  $600 \,\mu\text{m}$ .

The 1 mm long interconnect structure in Figure 3-2 is simulated for width variations in the multiplexer and demultiplexer. First, in Figure 3-5(a), the width of the add waveguide is varied  $\pm$ -10 nm from the designed width while keeping the nominal dimensions for the demultiplexer. One can see how the null, for example found at 1572 nm for the ideally sized waveguide (black dotted curve), changes to a peak for a width variation of only 5 nm. Secondly, in Figure 3-5(b), the width of the demultiplexer taper in the bus waveguide is reduced by 5 nm in addition to the add waveguide width variation of 5 nm and 10 nm in the multiplexer. Here, a peak is suppressed around 1574 nm for a 5 nm change to the input port 2 waveguide width (blue curve). Two peaks are also merging when the width change is increased to 10 nm (red curve). These changes alter the effective index ( $n_{eff}$ ) and coupling strength to a given mode, thus changing the spectrum. Results in Figure 3-5(b) demonstrate how the combined effects of interferometric patterns and random structure variation can change the crosstalk spectrum across chips and cause complex crosstalk patterns.



Figure 3-5 - 1 mm interconnect with (a) +/- 10 nm width variation at input port 2 (add WG); (b) Combined effects of 5 nm and 10 nm width variation at input port 2 (add WG) along with 5 nm reduction in the width of the DEMUX taper in the bus WG.

Varying the width of the ADD and DROP waveguides, and therefore the gap adjacent to the waveguide as well simultaneously, results in a wavelength shift of approximately 1 nm in the nulls of the spectrum without affecting the FSR. This is the reason that two different aspects of the structure were varied (e.g. the ADD WG in the MUX and taper in the DEMUX), highlighting how the combined effects of interferometric patterns and random structure variation can change the crosstalk spectrum across chips and cause complex crosstalk patterns.

### MODE CROSSTALK MEASUREMENTS IN THE FREQUENCY DOMAIN

The experimental setup in Figure 3-6 is used to characterize interconnect lengths of 100  $\mu$ m, 250  $\mu$ m, 750  $\mu$ m, and 1 mm. Each interconnect is designed using the same multiplexer structure illustrated in Figure 3-2. In presenting the results, the input mode, *X*, corresponds to where the continuous wave (CW) light is injected and the output mode, *Y*, is the mode which is measured at the output of the interconnect. The graphs in this section are then labeled as a mode input to mode output labeling scheme (*MxMy*). For example in Figure 3-6, the injected light from the source is added to mode 2 (M2) at the input while crosstalk is measured by monitoring mode 1 (M1) at the output. This situation leads to a label of *M2M1*.



Figure 3-6 - Experimental setup for crosstalk measurement of Mode 2 to Mode 1 (M2M1) using a wideband ASE source as input and an optical spectrum analyzer (OSA) at the output.

Experimental results for M1M2 and M2M1 are presented in Figure 3-7. These results were obtained using a broadband ASE source (erbium-doped fiber amplifier) as input (total output power 0 dBm), and the output is recorded using an optical spectrum analyzer (OSA) (sensitivity: -80 dBm; resolution: 0.06 nm). The results are normalized to the input mode straight-through case (i.e., M1M1) as a reference to show crosstalk power across the spectrum. The wavelength range was chosen due to the useable region of the ASE source. For the 100  $\mu$ m interconnect length the indicated FSR is approximated to where the lobe abruptly ends, however, the actual range appears to be larger.



Figure 3-7 - Experimental results of wavelength sweeps for (a) 100  $\mu$ m, (b) 250  $\mu$ m, (c) 750  $\mu$ m and (d) 1000  $\mu$ m MDM interconnects.

In Figure 3-7(c) at 1550 nm and Fig. 7(d) at 1557 nm, a merging of peaks in the crosstalk spectrum is observed. This effect is similar to the simulation results in Figure 3-5(b), indicating multiple variations from the design parameters of the interconnect. Qualitatively, as expected, the FSR is shown to decrease as the interconnect increases in length.

Figure 8 plots the FSR from the model in (5) (blue curve), the results from the simulated structure in Fig. 4 (red curve) and the experimentally measured interconnects in Fig. 7 (dotted black curve). It shows the measured FSR matches the calculated and simulated values well.

Since the full derivation is hard to evaluate without simulation, the approximation is also plotted in Fig. 8 (green dotted curve). The approximation of (18), however, only begins to match the others as the interconnect length increases. This is expected as the model approximation given in equation (18) is only valid for lengths which satisfy the stated requirement. The FSR of the interconnect is determined by both the first and second terms of the denominator in the full form of (18). The contribution of the first denominator term in (18) comes from the wavelength dependence of the phase term which is fixed and does not change for any of the different interconnect lengths. The second term comes from the phase difference of the different modes. This term increases as the length of the interconnect length increases and will become the dominant term. It is instructive for the reader to note that practically, for this particular device, the approximation closely resembles the full solution after the interconnect length of  $600\mu m$ , with higher accuracy as the interconnect length increases. This is because the first denominator term of the full form in (18) becomes less than one-quarter the value of the second denominator term at the length value of  $600 \mu m$ .

By knowing what shapes the crosstalk spectrum, designers can use the nulls to their advantage and place channels in these low crosstalk windows. Control on the spectrum will lead to better performance for multichannel (MDM-WDM) applications by exploiting FSR nulls to minimize crosstalk.



Figure 3-8 - Comparison between simulated, calculated (model), and experimentally measured FSR values. To show the combined effect of deterministic (such as interferometric interference) and random crosstalk effects (such as variation due to fabrication), two identical 1 mm interconnect devices were fabricated on the same SiP chip, Interconnect – A and Interconnect - B. The measurements are presented in Figure 3-9(a) and (b) and normalized to the straight through case. As predicted by simulation in Figure 3-5(a), the two devices show similar FSR, however, have a shifted spectrum due to variation of waveguide parameters. These results highlight the difficulty in predicting crosstalk in MDM links, where fabricated devices will differ, even within a die making mitigation of crosstalk difficult. This will lead to potentially requiring optimized optical or electronic solutions to mitigate crosstalk in large volume consumer applications [40].

Another way of understanding the differences in crosstalk spectra due to variation is the accumulation of phase of one signal with respect to the other. Assuming the undesired phase introduced by fabrication imperfections is constant over the entire range of wavelengths, the phase offset is calculated by fitting the experimental results following equation (4) for 100  $\mu$ m,

 $250 \mu m$ ,  $750 \mu m$  and 1 mm, resulting in 1.8, 3.5, 5 and 4 radians, respectively. The phase offset will vary from die to die, impacting the spectrum differently for the same interconnect length (e.g., 4 and 4.7 rad., respectively, for Figs. 5(a) and 5(b)).



Figure 3-9 - Experimental results of two identical 1 mm MDM interconnects A and B.

## EXPERIMENTAL MODE CROSSTALK IN THE TIME DOMAIN

This section ties the previously discussed theoretical topics together, showing why the MDM crosstalk spectrum is an important metric in the quality of transmission in MDM and WDM-MDM multiplexed links. It explores the crosstalk effects in the time domain using an oscilloscope in an experimental setup, shown in Figure 3-10, similar to the one used in [26]. To that setup an optical attenuator is added to vary crosstalk coupling strength. The 1 mm MDM interconnect device under test (DUT) on a SiP chip receives modulated optical input from a PRBS31 data sequence generator at 8 Gb/s. It has been observed that the data rate applied to the interconnect does not affect the crosstalk signature on the spectrum. The data rate in this experimental setup was chosen based on equipment availability. The DATA and DATABAR are fed to two electro-optic modulators (MOD), where the DATABAR is electrically delayed with

significant RF cable differential path delay (45 cm) for approximately 14 bits of decorrelation between the two PRBS NRZ data streams. To make the effects of crosstalk more prominent from the 1 mm interconnect, the wavelength is set to 1545.2 nm to obtain a high amount of crosstalk for interconnect B for the M2M1 scenario (-17.4 dB), as shown in Figure 3-9(b). A variable optical attenuator (VOA) is used at the M2 input port to vary the input signal strength on this mode. This is done to create a variable crosstalk mechanism, whereby the maximum crosstalk is set by the physical attributes of the interconnect when the optical attenuator is set to 0 dB. This mimics sweeping wavelengths with different peak crosstalk values. As the attenuation is increased, the optical power into M2 input port is reduced and thus the crosstalk coupling strength is effectively lowered. The output is then taken from M1 output port, amplified using an EDFA and filtered before being measured on the oscilloscope. Due to the VOA, there is a 1.8 dB difference in input power between the input grating couplers of the two modes. This is taken into account in the following section by adding the 1.8 dB loss to the intentional attenuation of the VOA.



Figure 3-10 - Experimental setup for crosstalk data measurements in the time domain with one PRBS aggressor signal at M2 input port 2.

Figure 3-11 shows time domain recorded eye diagrams using a sampling oscilloscope (Agilent DCA X 86100D) of M1 input signal receiving optical crosstalk from M2 input (aggressor) as M2 input strength is increased (attenuation reduced). The image is labeled using an effective

crosstalk value, which takes into account the inherent crosstalk value of the interconnect (-17.4 dB plus VOA loss) and the optical attenuator value. The VOA is adjusted to give an effective crosstalk of -29.2 dB, -24.2 dB, -22.2 dB and -19.2 dB in Figure 3-11(a) to (d), corresponding to VOA attenuation values of -10 dB, -5 dB, -3 dB and 0 dB respectively. The inset text indicates the optical input to the grating couplers (GC) as the aggressor signal is increased. As the crosstalk from M2 increases, the received eye at M1 output port closes and the timing jitter worsens. Our experimental setup is limited to a crosstalk coupling strength of -19.2 dB, although it is expected that the eye will continue to worsen as crosstalk increases. On the lower end, the experimental setup can reduce the crosstalk as low as -39.2 dB (not shown), although insignificant changes occur for crosstalk below -29.2 dB.



Figure 3-11 - Eye diagram showing impact of modal crosstalk in the time domain for an aggressor signal from M2 onto a decorrelated signal on M1 for effective crosstalk of (a) -29.2 dB, (b) -24.2 dB, (c)-22.2 dB and (d) -19.2 dB. Inset text indicates input optical power to the grating couplers for each mode.

To understand what is happening in the images of Figure 3-11, let us consider two electromagnetic fields of propagating signal,  $E_{Sig}$  and  $E_{XT}$ , of equal carrier frequency in a

waveguide. Similar to equation (17), equation (19) describes the resulting output intensity of a port (optical power),  $I_{OUT}$ , due to the interaction between the two fields at the input to a photodetector.

$$I_{OUT} = |(E_{Sig} + E_{XT})|^2 = I_{Sig} + I_{XT} + 2\sqrt{I_{Sig}I_{XT}}\cos(\varphi_{Sig} - \varphi_{XT})$$
(19)

The first term,  $I_{Sig}$ , is the power (intensity) of the through case (ex: M1M1); the second term  $I_{XT}$  is the crosstalk from an aggressor mode into the through mode (ex: M2M1). The third term, referred to as the beat term, results from the nature of the coherent crosstalk in the MDM interconnect. It describes the field addition and has a range of  $\pm 2\sqrt{I_{Sig}I_{XT}}$ , depending on the argument of the cosine term [65]. Assuming a digital signal changing from a logical 0 to a logical 1 in normalized units without crosstalk, the effect the beat term will have on the hypothetical eye diagram is illustrated in Figure 3-12. The diagrams are generated in MATLAB showing the two possible positive and negative extremes. A positive term (i.e.,  $+2\sqrt{I_{Sig}I_{XT}}$ ) results in crosstalk that does not decrease the eye opening but adds to the total height of the signal, illustrated in Figure 3-12(a). A negative term (i.e.,  $-2\sqrt{I_{Sig}I_{XT}}$ ), however, is detrimental to transmission and results in closure of the eye, illustrated in Figure 3-12(b).



Figure 3-12 - Illustration in MATLAB of the resulting eye diagrams for both extremes of the beat term signs, (a) positive and (b) negative.
The amount of peak-to-peak increase or reduction in the eye opening is determined by both the amount of crosstalk from the aggressor to the victim channel and the argument of the cosine term. The sign of the cosine term (positive or negative) cannot be predicted in an actual system [65]. The beat term results in the uneven growth of the "1" logic level of the eye diagram compared to the "0" level as crosstalk increases. This can be clearly seen in Figure 3-11 from the examples with higher amounts of crosstalk in Figure 3-11 (d) compared to the lower amounts of crosstalk in Figure 3-11 (a).

Using the oscilloscope measurements obtained, a few examples of which are shown in Figure 3-11, the vertical and horizontal eye openings are recorded. The vertical eye opening is normalized to the widest point, found at the effective crosstalk value of -29.2 dB. The horizontal eye opening is plotted on a secondary axis in unit intervals (UI) with respect to the ideal 8 Gb/s bit period. It is observed that both traces worsen above a crosstalk value of approximately -27 dB in Figure 3-13(a).



Figure 3-13 - (a) Graph showing effective crosstalk versus the vertical eye opening (normalized to the opening at 29.2 dB crosstalk) and the horizontal eye opening (with respect to the UI of an ideal 8 Gb/s bit period); (b) effective crosstalk versus calculated BER from oscilloscope measurements.

Using oscilloscope measurements of the eye, the bit error rate (BER) is estimated and plotted in Figure 3-13(b) [66]. The slice point is assumed to be in the center of the eye, anticipating instances where the jitter or timing error might be higher towards the edges. Also, as the crosstalk is increased, the decision threshold ( $v_{th}$ ) is varied so as to always be in the middle of the two logic bands ("1" and "0"). This is assumed since many optical receivers have the functionality to do these adjustments using eye monitoring circuitry. When dealing with crosstalk, it is important to understand that the dominant impact on the BER is due to closure of the eye, caused by the "1" logic level approaching the "0" logic level, and not an increase in the noise. This is analogous to how increased ISI increases BER [67]. Based on these calculations, the crosstalk must be less than -21 dB to maintain a BER better than  $10^{-12}$ . The plot in Figure 3-13(b) focuses only on data points which give a BER worse than  $10^{-12}$ , highlighting the crosstalk values that become problematic.

Referring to Figure 3-7, the FSR lobes have maxima and minima corresponding to -19 dB and -30 dB, respectively. Referring back to the BER estimation in Figure 3-13(b), this corresponds to a BER ranging from  $10^{-6}$  to  $10^{-12}$ , respectively. The inset image in Figure 3-13 (b) shows an example of degradation due to crosstalk in eye diagrams, illustrating the difference between modal crosstalk at -19 dB and -29 dB.

Using the information found in Figure 3-13 (b), a theoretical BER response across wavelengths for interconnect lengths of 250  $\mu$ m and 1 mm are plotted in Figure 3-14(a) and (b) respectively. A BER floor of 10<sup>-12</sup> is used in to simplify the image. The estimated BER plot provides insight into the system-level planning of channel wavelength allocation. Inset eye diagrams are added to locations corresponding to crosstalk values as a visual aid.



Figure 3-14- Crosstalk spectrum overlaid onto calculated BER bar graph (approximated and with BER floor of 10-12) for interconnect lengths of (a) 250  $\mu$ m and (b) 1 mm.

One observation made from Figure 3-14 is that channels across the spectrum will inherently experience a wide range of BER outcomes, based solely on the interconnect itself without other system-level considerations taken into account. Depending on the wavelengths used, this will limit the ability to reduce laser power in the link since it will be the poorer-performing channels imposing these limits. One can also see how larger FSR, as in Figure 3-14(a), will lead to both an increased window of poor BER performance (i.e., 1540 nm to 1550 nm) as well as a window of improved performance (i.e., 1550 nm to 1560 nm). On the other hand, at longer interconnect lengths as in Figure 3-14(b), there still exists opportunities to intelligently place channels at wavelengths corresponding to low BER. This indicates that MDM can be used efficiently for a variety of interconnect lengths with some trade-offs and considerations of the channel placement and spacing. These trade-offs may also require further investigation since interconnects with distances greater than 250 µm will most likely be of interest for the foreseeable future. Also, variation of device parameters due to fabrication will cause the spectrum to change, possibly requiring post processing of data or the use of optical tuning mechanisms.

## CONCLUSION

This section discusses the impact of modal crosstalk in silicon photonic MDM-based interconnects. Using simulations and experimental measurements of tapered couplers, fabrication variation and its effects on the crosstalk spectrum have been investigated. Next, interconnect length influence on the crosstalk spectrum was investigated and shown that the FSR, caused by the interconnect length dependence, is correctly captured by presented equations and simulations. Following this, time-domain experimental results highlighted the effects of low and high crosstalk strength from an aggressor to victim signal and gave rationale to the importance crosstalk coupling strength has on transmission quality. The optical crosstalk-shaping topics presented in this section are thus essential for enabling high throughput WDM-MDM dual multiplexed links of the future.

# CHAPTER 4 - A SOURCE-SYNCHRONOUS ARCHITECTURE USING MODE-DIVISION MULTIPLEXING FOR SIP INTERCONNECTS

A source-synchronous interconnect using mode-division multiplexing (MDM) for potential use in on-chip applications is experimentally demonstrated using a 3-mode 750  $\mu$ m Silicon photonics structure. Results are presented for simultaneous transmission of two data channels on two separate modes (bit error rate <10<sup>-12</sup> at 10 Gb/s) sampled by an optically forwarded clock sent on a third separate mode. Performance assessment of the mode assignment for the clock is presented. The investigation shows that an optimum clock placement is important at wavelengths where modal crosstalk is higher. For example, at 1553 nm, the clock's jitter decreases from 45 ps down to 2.7 ps where the clock is encoded on a mode with high crosstalk (-18.6 dB) to one that has less crosstalk (-28.6 dB). At 1560 nm where modal crosstalk is less, the clock's jitter is 2.6 ps (-27.8 dB crosstalk) and 1.1 ps (-34 dB crosstalk) without and with optimum clock placement, respectively. With proper clock to mode assignment, the optical interconnect becomes functional across an optical bandwidth of 11 nm enabling MDM–wavelength-division multiplexing architectures.

## **PROPOSED ARCHITECTURE**

When transmitting data, synchronization is essential. Source-synchronous links, where a clock signal is forwarded along with the data, are commonly used in electrical links [68]. This is done to avoid the need to recover a clock on the receiver side which uses power hungry clock and data recovery (CDR) schemes. Source-synchronous links are favorable for data transmission because of the noise correlation that exists between the clock and data signals [54]. The clock edge and

data bit track each other, thus allowing for an improvement in timing margin due to reduction in the difference between the clock jitter and data jitter. In optical interconnects, the same problem of synchronization must be resolved. The source-synchronous architecture has also been shown for optical links using WDM in [69], and more recently in [70].

When using WDM for increased interconnection capacity a plurality of lasers is required, increasing complexity, power consumption and operating costs. Mode-division multiplexing (MDM) only requires one laser, and has recently gained attention in on-chip applications. Unlike WDM, MDM transfers information on different electromagnetic confinements, or modes, using only one wavelength. In other MDM work, either new structures were analyzed or BER testing was done for asynchronous data (clock recovered at receiver) [51], [71]. Our previous work in [16] and [31] showed how MDM can be used in a novel way to create a source-synchronous link using a single optical channel, with one clock and one data mode. Using multiple modes, the link can transmit a clock signal to keep synchronization along with multiple data channels in the same waveguide, while using only one wavelength. The proposed MDM architecture is illustrated in Figure 4-1. The transmit side (TX) modulates data on single-mode waveguides (SM WG), which are multiplexed onto different modes (1 to N), along with the transmit clock on a separate mode (Mode 1 in this illustration). On the receive side (RX), the optical signals are optically mode-demultiplexed onto single-mode waveguides, and then electrically recovered using the forwarded clock which is deskewed (phase aligned) and used in the latches.

The work presented in this chapter demonstrates how MDM source-synchronous links function when multiple data modes are co-propagating with the clock signal. The results highlight the importance of understanding how the clock signal is sensitive to modal crosstalk (XT). The channel lengths considered in this work are for the application of on-chip (< 2 cm) interconnects.

Modal crosstalk was first characterized using continuous wave (CW) laser measurement which revealed a 3 dB optical bandwidth of 11 nm from 1553 nm to 1564 nm. The lowest (-41.7 dB) and highest (-19.6 dB) crosstalk wavelengths were then identified at 1560 nm and 1553 nm, respectively.

Experimental results in this chapter show the feasibility of mode-division multiplexing by simultaneously transmitting on a single wavelength two data channels at 10 Gb/s along with a 10 GHz clock, the first demonstration of its kind. The forwarded clock is directly used in this experimental demonstration to sample the data at the receiver. Bit error rates (BER) below 10<sup>-12</sup> are obtained on both co-propagating data channels for wavelengths corresponding to the lowest (1560 nm) and the highest (1553 nm) modal crosstalk. The forwarded clock signals are then analyzed in the time domain to see the effects of modal crosstalk on clock jitter. Identification of the mode with the lowest crosstalk at the two wavelengths listed above led to optimum clock placement, leading to lower clock jitter and successful data transmission across the entire optical bandwidth.



Figure 4-1 - Proposed MDM architecture [26].

### **DESIGN OF THE MDM SIP WAVEGUIDE**

The MDM structure allows for mode-based source-synchronous transmissions by encoding the clock on one mode and the data onto others. The device is designed using a Silicon-on-Insulator (SOI) chip with a waveguide thickness of 220 nm. The three-mode MDM waveguide design is a passive structure. Single mode waveguides, each with a width of 430 nm, are fed with off-chip modulated data (DATA1 and DATA2) and clock (CLK) signals through vertical grating couplers (Figure 4-2). These three waveguides then couple to a larger bus waveguide of a width of 1450 nm supporting all three modes. The tapered coupler design allows for increased robustness to fabrication variation [29]. It was found through simulation that this tapered approach also allows for robustness to temperature variations as well.

The MDM design is based on the phase matching condition whereby the add/drop port's effective index ( $n_{eff}$ ) of the coupler must be equal to the targeted mode in the bus waveguide, thereby only exciting a particular mode. When coupling to higher-order modes, lower-order modes still exist, however their effective indexes are much higher preventing coupling. Figure 4-3 shows simulated results using a commercial eigenmode solver (Lumerical MODE) of the optical waveguide's effective index as a function of waveguide width for the first four TE modes. The effective index of 2.3 was selected to enable the phase matching condition as it was found to be achievable by all TE modes of interest on the bus waveguide by varying its width. The 3-mode design exploits an 8-fiber array with three inputs, three outputs and two for chip alignment.



Figure 4-2 - Passive SiP MDM device structure supporting three modes.



Figure 4-3 - Effective index versus waveguide width for TE modes with subset figures showing waveguide cross-sections with mode spatial distribution.

Generally, bus waveguides can be designed by adjusting both width and thickness such that they support multiple modes with two polarizations. However, in order to couple (or multiplex) TE and TM polarizations from a single mode fiber, a different type of grating coupler (e.g. [72]) must be used, or alternatively, an on chip polarization rotator must be designed. Both of these approaches add complexity to the design, so we did not consider them for this proof of concept.

In Figure 4-2, the tapered design is labeled *Wo* for the center width shown in Figure 4-3 which corresponds to the intersection of  $n_{eff} = 2.3$  for each mode.  $W_{MIN}$  and  $W_{MAX}$  are the minimum and maximum taper widths, respectively, and were found through simulation while investigating fabrication tolerances for +/- 10 nm coupler width and gap variations. The taper lengths were

optimized through simulation as well, ensuring a gradual increase in the waveguide width. The chosen design parameters are summarized in Table 5.

|                 | W <sub>MIN</sub> | Wo   | W <sub>MAX</sub> | Gap  | Taper  |
|-----------------|------------------|------|------------------|------|--------|
|                 | (nm)             | (nm) | (nm)             | (nm) | Length |
| Mode 2<br>(TE1) | 810              | 885  | 960              | 150  | 103 µm |
| Mode 3<br>(TE2) | 1250             | 1350 | 1450             | 150  | 120 µm |

Table 1- Mode MUX/DEMUX Design Parameters

## IMPACT OF MODAL CROSSTALK ON DATA/CLOCK SIGNALS

In an ideal straight waveguide with multiple modes, no modal crosstalk would occur. Intermodal mixing occurs when perturbations in the waveguide structure or bends are present [56]. Crosstalk degrades the link performance due to power from one mode leaking into another. If the isolation between modes is low with large modal crosstalk, the transmitted signal degrades. If the signal in question is transmitting the clock, data patterns on other modes leaking onto the clock signal will impact the zero crossings of the clock, directly affecting the clock jitter.

In this work, crosstalk is characterized using a continuous-wave (CW) laser, with input to one mode of the structure at a time. It was found that different modes exhibit different interference patterns across the optical bandwidth of the device. This leads to varying amounts of crosstalk depending on the mode in question and the wavelength being used to transmit information through the interconnect. An optical bandwidth of 11 nm spanning wavelengths 1553 nm to 1564 nm showed a worst case crosstalk value of -19.6 dB and a best case of -41.7 dB. This experimental work focuses on an interconnect length of 750 µm. The length of the interconnect used is based solely on available chip area and can be longer.

The results we presented in [31] showed that crosstalk interference patterns change and

crosstalk increases as the interconnect length extends from 100  $\mu$ m to 1 mm for both two and three mode devices. This was also observed experimentally for two modes for interconnect lengths between 100  $\mu$ m to 2 cm that included waveguide bends. Results for these measurements are shown in Figure 4-4. These results were measured using the same CW measurement procedure as previously discussed. The results are presented showing the best (lowest value) and worst (highest value) crosstalk found across all wavelengths, from 1550 nm to 1570 nm. These interconnects differ from the work in this chapter as some of the larger lengths contain bends of 100  $\mu$ m radii to achieve the desired on chip lengths, as opposed to the straight interconnects used in this proof of concept. As interconnect lengths increase, larger fluctuations in crosstalk with peaks and troughs across wavelength are seen, motivating techniques to mitigate the impact of crosstalk on data transmission. This effect narrows the useable bandwidth of the interconnect due to the increase in amplitude and occurrence of the crosstalk. As a static behavior, these fluctuations can be accounted for during initialization of the link with optimized wavelength assignment for MDM transmission.



Figure 4-4- Experimental results of 2-mode MDM devices with varying interconnect length versus measured CW crosstalk. Results presented are for best and worst crosstalk found across working wavelengths (from 1550 nm to 1570 nm).

Other works involving MDM have shown a higher number of transmitted modes using both TE and TM polarizations. In [15], there were eight channels reported using four TE and four

TM. This interconnect was only 100 µm long but showed the same interference pattern that was also seen in our work in [31] at distances up to 1 mm. The simulation outputs in [15] did not capture the interference patterns seen in their experimental results, leading to much higher crosstalk at some points than expected (< -50 dB simulated versus -18 dB measured). This shows an important challenge with a higher number of propagating modes where it becomes difficult to control and predict the resulting crosstalk of each mode. With the addition of more modes, the cumulative crosstalk in any one mode becomes higher because the number of sources (modes) leaking into a given mode is simply higher. Work presented in the next section of this chapter leads to the conclusion that if the clock signal is restricted to a mode (TE or TM) with the lowest amount of cumulative crosstalk (less than -21 dB with equipment used in this experiment), then the link will work for longer transmission link or a greater number of modes because the data channels are more robust to the crosstalk than the clock.

### MODAL TIME SKEW ON DATA/CLOCK SIGNALS

As information is transmitted on different modes, the propagation velocity will vary between modes as the spatial confinement of each mode leads to different refractive indexes. The propagation speed of a pulse is determined by the group velocity, and is equal to the speed of light (*c*) divided by its group index ( $n_g$ ) [7]. Skew is the relative difference in arrival time between the data and the clock. In an ideal source-synchronous link, the clock edge travels alongside the data at the same velocity. However as the two signals are transmitted on different modes, they will both arrive at different times. Methods used in electrical channels, such as in [73], can be adapted for clock deskew and alignment in the proposed MDM architecture. It is desirable to keep skew low as data rates increase because skew increases differential jitter thereby reducing jitter correlation between clock and data signals. This is exacerbated as the link length is extended. When this happens, the clock edge that transmitted the data is no longer necessarily the one that will clock it back on the receiver side.

Simulations were run to find the group index of the 1.45 µm width bus interconnect, capable of transmitting all three modes. The group velocities and differences were found for each case. Figure 4-5 shows the skew (in unit intervals, UI) for 10 and 25 Gb/s data rates between mode 1 and mode 2 (M1-M2), mode 1 and mode 3 (M1-M3), and mode 2 and mode 3 (M2-M3) for an optical carrier at 1558 nm (approximately mid-band over the optical bandwidth). Logically, the skew becomes a larger part of the UI as the data rate increases, and so higher rates suffer more from skew as the interconnect length increases. One can see from Figure 4-5 that the center mode (in this case M2, green and blue lines) has the least timing skew from either end mode (M1 and M3). This leads to the desire to place the clock on the center mode to minimize clock-to-data skew. It also indicates that with MDM, a lower data rate utilizing the parallel nature of the architecture may be favored from a system perspective in order to minimize skew, maximizing noise correlation. Another argument for reducing data rates and increasing parallelism is also found in [74] where the authors argue that current optical devices have an optimal data rate in the range of 4 to 8 Gb/s, which maximize optical power savings. Therefore it can be an advantage to reduce data rates and increase parallelism from both a skew and power savings viewpoint.

In order to see how the mode properties change with temperature, the material refractive indices of silicon ( $n_{Si}$ ) and silicon dioxide ( $n_{SiO2}$ ) were found using the theory presented in [75] and [76] for temperatures of 23.5°C and 71°C, respectively. The group velocities vary only slightly by no more than  $0.08 \times 10^9$  cm/s for all three modes. Simulation results show that this variation translates to a time skew between mode 1 and 2, and mode 2 and 3 over 2 cm of less than 1 ps and 2 ps, respectively, indicating that temperature fluctuation leads to manageable time

skew for the receiver. The electronics in the receiver for the source-synchronous link can include the ability to adapt the deskew circuitry in response to slowly varying phenomena such as temperature fluctuations.



Figure 4-5 - Theoretical skew between three modes.

## IMPACT OF JITTER ON MDM SOURCE-SYNCHRONOUS LINKS

Deviations in a signal's zero crossings from their ideal crossings in time are referred to as jitter [77]. Clock and data jitter in a source-synchronous link is studied in depth in [54]. If no skew or delay exists between the two signals at the receiver, then the system provides ideal clock and data jitter tracking. However as the skew increases, differential jitter occurs due to the clock and data arriving at the receiver out of phase, lowering the noise correlation between the two.

In a source-synchronous link, correlated jitter between the clock and data is a definite advantage, as it allows tracking between the two and improves reception. However as skew increases, a high frequency jitter limit gets imposed as a point beyond which jitter is amplified and degrades the system performance [54]. This indicates that if the interconnect increases clock-to-data skew, then the jitter tracking bandwidth at the receiver should be reduced [53]. This does not only apply to optical links, but is seen in electrical interconnects as well. Clock placement, thus, comes at a tradeoff as keeping the clock centralized minimizes the clock-to-data skew

(minimizes differential jitter), but may expose the sensitive clock signal to higher crosstalk from both neighboring optical modes, as it will be discussed in the following section.

In digital system applications, 50 to 400 MHz jitter tracking bandwidth is normally targeted because the jitter is strongly correlated to the power supply noise [53]. Using theory in [54], this indicates that an absolute clock to data delay of  $\leq$  416 ps is required for a  $\geq$  400 MHz tracking bandwidth. If only considering jitter correlation for modes with sufficiently low crosstalk for a desired bandwidth of 400 MHz, an upper theoretical limit of the interconnect length is calculated to be approximately 20 cm if the clock is on mode 1 or 3, and < 30 cm if the clock on mode 2. Links can be longer if the jitter tracking bandwidth is reduced. Likewise, if one were to increase the number of propagating modes, the same issues must be considered. Using the same specifications as above with a 3 µm wide waveguide for a seven mode-multiplexed scheme, the increased difference in group velocity would reduce the maximum interconnect length down to 17 cm with proper clock placement. This corresponds to a 43% loss in the length of an ideal interconnect with a possible 300% increase in aggregate throughput. This indicates that as there is a trade-off, an optimal balance may be found on a per-case basis. All these calculations assume the calculated group velocity holds for extended lengths for all modes, and that any bends or waveguides induce low modal crosstalk and loss. This puts the application of on-chip and chipto-chip interconnects within this range. For the chip-to-chip case, it is assumed that some form of silicon interposer is used with group indexes to remain the same as the on-chip scenario calculated.

This study focuses on TE modes only. Note that harnessing both TE and TM modes with a compatible (de)multiplexer would increase aggregate throughput. This is considered possible because the TM modes' group velocity can be similar to that of the TE modes. In numerical

terms, the difference in arrival times between TE0 and TM0 in a 1.45  $\mu$ m wide waveguide over 2 cm is 28 ps. Since this amount is less than 416 ps, the jitter tracking bandwidth can meet the previously discussed 400 MHz target such that the skew is manageable by the receiver.

#### **EXPERIMENTAL RESULTS**

The passive structure was fabricated using E-beam fabrication techniques from the University of Washington (Figure 4-6). The device used for testing is 750 µm in length and capable of transmitting three modes simultaneously, and is the longest length possible without bends at the time of testing this proof of concept. Continuous wave (CW) measurements were first done in order to obtain the characteristics of the device using a broadband source and an optical spectrum analyzer (OSA; sensitivity: -80 dBm; resolution: 0.06 nm). For CW measurements, the OSA provides power values corresponding to the through and leakage (crosstalk) components of the optical power for each mode. The power corresponding to the two data modal crosstalk values are combined in order to obtain the *cumulative modal crosstalk*, accounting for the total optical power leakage onto the clock mode.



Figure 4-6 - Microscope images of the SiP 750  $\mu$ m optical interconnect. Distance between grating couplers (127  $\mu$ m) given for scale.

In order to analyze the effect of modal crosstalk on the propagating signals, the experimental setup illustrated in Figure 4-7 is used. A laser is split through a 1:4 optical coupler and sent to electro-optic Mach-Zehnder modulators which are fed clock (CLK) and data signals (DATA and

DATABAR). The two data channels were generated by the programmable pattern generator (PPG) with significant RF cable differential path delay (45 cm) for approximately 18 bits of decorrelation between the two  $2^{31}$ -1 PRBS NRZ data streams. In this setup, the extra delays and decorrelation remove possibility of measuring skew or jitter correlation.



Figure 4-7 - Experimental setup for dual data mode source-synchronous operation.

Figure 4-8 shows the optical transmission (TX) response of the device to a broadband source input on (a) mode 1, (b) mode 2, and (c) mode 3. The interconnection link is found to have a 3 dB optical bandwidth of 11 nm, ranging from 1553 nm to 1564 nm. The spectrum shows the wavelengths with best and worst modal crosstalk. The best isolation (smallest crosstalk, -27.8 dB) from another mode occurs at 1560.4 nm and the worst isolation (largest crosstalk, -19.6 dB) from another mode occurs at 1553 nm. Superimposed on these graphs are subfigures of clock signals forwarded on the corresponding mode, captured using a sampling oscilloscope (Agilent DCA-X 86100D) after an optical-to-electrical conversion (responsivity: 0.7 A/W; bandwidth: 46 GHz). For comparison, the clock was also sent without any co-propagating data (*no data active*). The other modes carrying co-propagating data were then selectively enabled, one at a time, before enabling both co-propagating data modes. This allowed further confirmation of the effect of different crosstalk sources.

Upon analysis of the acquired clock signals, it became clear that mode selection for carrying the clock is important. Looking at the most isolated wavelength at 1561 nm in Figure 4-8(c), the clock on mode 3 exhibits the narrowest trace indicating the least amount of edge uncertainty or jitter. For the worst case wavelength at 1553 nm, then mode 2 and 3 clocks (Figure 4-8(b) and (c)) show a large amount of movement, making mode 1 the best candidate for clock transmission at that specific wavelength. These observations are supported by the recorded jitter values listed in Table 6, along with the crosstalk per mode (crosstalk due to leakage from another mode) and cumulative crosstalk (sum of crosstalk due to leakage from all modes into the mode of interest). For example at 1553 nm, the cumulative modal crosstalk to mode 2 is -18.61 dB (from Table 6) which is the aggregate crosstalk from mode 3 to mode 2 (-19.61 dB) and crosstalk from mode 1 to mode 2 (-25.48 dB).

Correlation between jitter and crosstalk is plotted in Figure 4-9. Here the single-source crosstalk (XT) data indicates modal crosstalk from one other data mode only (other mode has no data propagating), and the dual-source crosstalk points indicate the cumulative crosstalk from two modes simultaneously transmitting data, both taken from Table 6. A trend line is created using the cumulative crosstalk data points, and shows exponentially increasing jitter as crosstalk goes above -25 dB. Higher jitter leads to more complexity in the electronics at the receiver, something that should be avoided for low-power operation. It is thus recommended that crosstalk on all modes be below -25 dB to avoid the exponential jitter increase seen beyond this crosstalk amount.



(a) Inset figures show the optically forwarded clock signal on mode 1 under different crosstalk scenarios.



(b) Inset figures show the optically forwarded clock signal on mode 2 under different crosstalk scenarios.



(c) Inset figures show the optically forwarded clock signal on mode 3 under different crosstalk scenarios.

Figure 4-8 - CW measurements highlighting clock traces at best (1560 nm) and worst (1553 nm) isolated wavelengths with different combinations of mode 1 (M1), mode 2 (M2) and mode 3 (M3) co-propagating data signals. Inset figures show optically forwarded clock on oscilloscope with histogram jitter measurements.

| CLK<br>Mode | Data Mode            | Best Wavelength                |                       |                    | Worst Wavelength           |                       |                     |
|-------------|----------------------|--------------------------------|-----------------------|--------------------|----------------------------|-----------------------|---------------------|
|             |                      | (1560.4 nm)                    |                       |                    | (1553 nm)                  |                       |                     |
|             |                      | Modal<br>Crosstal<br>k<br>(dB) | Jitter<br>RMS<br>(ps) | Jitter<br>p-p (ps) | Modal<br>Crosstalk<br>(dB) | Jitter<br>RMS<br>(ps) | Jitter<br>p-p (ps)  |
| 1           | Base                 |                                | 0.8984                | 8.6                |                            | 0.996                 | 8.8                 |
|             | 2                    | -29.68                         | 2.267                 | 16.2               | -29.61                     | 2.4                   | 19.2                |
|             | 3                    | -32.62                         | 1.495                 | 12.6               | -35.9                      | 1.679                 | 13.6                |
|             | Cumulative (2 and 3) | -27.89                         | 2.628                 | 21.6               | -28.69                     | 2.753                 | 23.6                |
| 2           | Base                 |                                | 0.874                 | 7.4                |                            | 0.974                 | 8.2                 |
|             | 1                    | -29.81                         | 1.765                 | 13.6               | -25.48                     | 2.11                  | 16.2                |
|             | 3                    | -30.23                         | 1.249                 | 9.8                | -19.61                     | 19.92                 | $47.4 +^2$          |
|             | Cumulative (1 and 3) | -27                            | 2.02                  | 16.6               | -18.61 <sup>1</sup>        | 45                    | 47.4 + <sup>2</sup> |
| 3           | Base                 |                                | 0.839                 | 7.8                |                            | 0.922                 | 7.4                 |
|             | 1                    | -34.85                         | 1.0762                | 10.4               | -35.75                     | 1.22                  | 9.6                 |
|             | 2                    | -41.71                         | 0.992                 | 8.6                | -21.63                     | 18.65                 | 34                  |
|             | Cumulative (1 and 2) | -34.03                         | 1.18                  | 10.6               | -21.46                     | 18.11                 | 35.6                |

Table 2- Experimentally Measured Clock Jitter On Oscilloscope

<sup>1</sup>No BER measurement possible with optical CLK, only with electrically forwarded clock.

<sup>2</sup> Jitter measurement went beyond the range of the jitter histogram measurement window.

For the wavelength with worst crosstalk and the clock on mode 2, Table 6 indicates that no BER is obtainable when using the optically forwarded clock. The error detector, however, is able to lock to the pattern with an electrically forwarded clock bypassing the DUT in Figure 4-7. This confirms that the jitter on the clock (45 ps) caused by modal crosstalk (cumulative crosstalk -18.6 dB) creates a condition where the BERT is unable to lock. Eye diagrams were captured on the electrical sampling oscilloscope using the data as the input and the forwarded clock as the trigger signal. In Figure 4-10(a), an optically forwarded clock is the trigger. As observed, the

method of sending the clock notably changes the eye. Closure of the eye horizontally is caused by modal crosstalk induced jitter.

The power penalty plots (Figure 4-11) are derived from the BER curves obtained through experimental results for a PRBS data sequence of 2<sup>31</sup>-1. These plots show the extra power needed to achieve a BER of 10<sup>-12</sup> while two data and one clock mode are co-propagating in the waveguide compared to the back-to-back case. In these measurements, the back-to-back case consists of an optical attenuator replacing and equivalent to the photonics chip's attenuation with an electrically forwarded clock. The graph's points are then labeled with the mode used to forward the clock (CLK Mx) along with the mode which underwent the BER test (D Mx). The third mode not listed in each of the points had data propagating on it also and thus contributed to crosstalk. Figure 4-11(a) has two fewer points than (b) due to the error detector locking issues while transmitting the clock on mode 2 (as indicated in Table 6). As illustrated in the graph, there may be different penalties for the same cumulative mode crosstalk. These differences are dependent on the clock/data to mode assignment based on crosstalk from the data channel. For example at 1553 nm when the clock is on mode 1 (CLK M1) and BER measurements are for the data on mode 2 (D M2), the cumulative crosstalk onto the clock is -28.69 dB (from Table 6) and the cumulative crosstalk onto the data is -18.61 dB (blue diamond point in Figure 4-11(a)) leading to a power penalty of 2.5 dB. When the clock is on mode 3 (CLK M3) and data measured on mode 2 (D M2), the cumulative crosstalk to the data remains the same in Figure 4-11(a), however the cumulative crosstalk onto the clock is in fact worse at -21.46 dB, which results in a 1 dB higher power penalty.



Figure 4-9- Measured RMS jitter for various experimental crosstalk values, showing exponentially increasing jitter trend.



Figure 4-10 - Electrical eye diagrams of data transmission at 1553 nm captured with oscilloscope triggered by the forwarded (a) optical clock on mode 2 and (b) electrical clock bypassing the DUT.



(a)



(b)

Figure 4-11 - Power penalty plots for (a) 1553 nm and (b) 1560 nm wavelengths.

### CONCLUSION

This chapter investigated clock signal integrity in an on-chip interconnect architecture exploiting MDM. Measurements show that due to different modes receiving crosstalk from different sources, the choice of which mode to transmit the sensitive clock signal is crucial. Identification of the mode with the lowest crosstalk at a given wavelength led to optimum clock to mode assignment leading to lower clock jitter and successful data transmission across the entire optical bandwidth. Results using the presented experimental setup showed that cumulative crosstalk it led

to exponentially higher jitter resulting in poor data reception. As such to increase the throughput in this proposed source-synchronous scheme, one may exploit polarization schemes utilizing TE and TM modes, provided they meet the crosstalk/jitter requirements of the receiver. Theoretically, the number of modes is limited by the resulting cumulative modal crosstalk and not the skew differences, mainly because most skew problems will not occur at the on-chip scale targeted here (< 2 cm). The electronics at the receiver handles deskew on a per-channel basis and only requires one transmitted clock to align the receiver and transmitter. The number of data channels sent to the receiver, we believe, is thus only limited by cumulative crosstalk, space and power specifications of the electronics at the receiver. The use of the MDM technique over a wide band response is interesting, as it increases the application possibilities of the pro-posed technique as MDM can be employed on multiple wave-lengths (WDM) as well. For example, this can be extended to multiple source-synchronous links where each WDM channel has its own separate clock.

# CHAPTER 5 - RECONFIGURATION IN SOURCE-SYNCHRONOUS RECEIVERS FOR SHORT-REACH PARALLEL OPTICAL LINKS

This chapter demonstrates an architecture to make electronics that can be reconfigured postfabrication to suit the needs of the SiP interconnect by leveraging the more robust electronic chip to optimize the link. Configurable circuit blocks allow the incoming signal to be routed within each receiver to various data or clock specific circuit blocks, with unused blocks disabled. Since each receiver can be repurposed using a novel circuit reuse concept, the clock distribution among receivers must be dynamic as well. This is accomplished using a configurable clock distribution driver, allowing any clock-configured receiver to properly feed all data-configured receivers on the chip, with complexity and power that scales with the number of receivers added. To the authors' best knowledge, the proposed reconfiguration and clock distribution scheme has not been seen in the literature. The clock's ring-based oscillators in each receiver use a multilocation and variable-strength injection circuit as a method of per lane phase adjustment. The experimental chip is implemented in TSMC's 65 nm CMOS process however the architecture is portable to other technologies. The proposed design achieves 8 Gb/s with a bit error rate (BER) of 10<sup>-12</sup>, with similar sensitivity when swapping data and clock inputs to the receivers.

## PROPOSED RECONFIGURABLE ARCHITECTURE

#### **OVERVIEW**

The proposed lane repurposing architecture is illustrated in Figure 5-3. This architecture gives system-level configurability for parallel optical links. The architecture can reconfigure each

optical receiver with its transimpedance amplifier and main amplifier (TIA+MA) to accept either a clock signal or data signal. This is accomplished by powering down unused circuits and rerouting clock and data signals within the receiver. Using Figure 5-3 as an example, receiver 1 (*RX1*) is configured for receiving a full or sub-rate forwarded clock from the optical interconnect and will regenerate and distribute the clock signal to the other receiver(s). This is done using the injection-locked oscillator (*ILO*) in RX1. In general, sub-rate architectures allow circuits to run at lower frequencies resulting in reduced power dissipation and design complexity, making them an interesting low-power option.

The ILO in the data configured receiver (*RX2*) will then lock to the distributed clock signal from the ILO in RX1. This is accomplished using the clock distribution network to dynamically route the clock signal from RX1 to RX2 on-chip. The data receiving path uses phase adjusted versions of the distributed forwarded clock to recover the data signals. This phase adjustment allows for independent alignment of the clock in each receiver with respect to the data. The data configured paths feed high-speed latches, whose output can be selected to be sent off chip during testing using multiplexers.

As opposed to using one distinct ILO structure for clock recovery and a different one for phase skewing, for example in [17] or [24], this design incorporates a novel two-step clock alignment architecture using identical circuit blocks. These two ILOs have different purposes, one configured for global clock recovery (RX1 ILO in Figure 5-3) and a second for phase alignment (RX2 ILO in Figure 5-3), but share the same structure and differ only in configuration. Repurposing a receiver is done simply by switching the role of each of the two oscillators and rerouting the clock path. This minimizes overhead in each of the configurable receivers since the same circuits are reused.

A reference clock (*REF CLK*), intended to be used during testing or as part of a larger system, can be used in place of a forwarded clock receiver. This is done as a method to test each ILO individually during initial chip verification and mimics the role of the forwarded clock in ILO locking in each data receiver.



Figure 5-1 - Proposed reconfigurable receiver architecture for parallel optical links.

## **PATH SELECTION**

In this design, the fan out of the TIA and main amplifier stages enables multiple paths to be fed by the same source. Consider two situations, one using pass-transistor switches to shield loading capacitance from the TIA, Figure 5-2(a), and the second without switches, Figure 5-2(b). The TIA and main amplifier blocks make up the transimpedance stage, where the output voltage  $V_{TIA}$ is controlled by the input current from the PD. In both cases, unused circuitry represented by  $C_L$  can be disabled however still presents capacitive loads. Note that in a system of more than two receivers, each TIA will still only drive two possible paths (data and clock).



Figure 5-2 - (a) TIA fan-out configuration using pass-transistor switches to shield CL loading capacitance from the TIA; (b) TIA fan-out configuration without using switches.

In the circuit implementation in (a), the TIA is loaded by two parasitic capacitors  $C_{gd}$ , accounting for MOS overlap and fringing capacitance of the switch in the off state, and a triode  $C_{gs}$  of the switch in the on state. In this case, the triode capacitance of the MOS switch is larger than the MOS overlap capacitance. TIA output resistance is given by  $R_{TIA}$  and load capacitors  $C_{L1}$  and  $C_{L2}$  account for the gate capacitances in the circuits in each activated path. The resistance in the switch,  $R_{sw}$ , is due to the resistive channel in the MOSFET when active. In this implementation, the deactivated switch effectively shields the TIA/MA output from the loading capacitance ( $C_{Lx}$ ) in the unused path. In implementation (b), no switches are used and the loading on the TIA/MA output is always two parasitic capacitors  $C_{L1}$  and  $C_{L2}$ .

Circuit simulations were done to investigate the trade-offs in sizing of the CMOS passtransistor switch in Figure 5-2(a), with results presented in Figure 5-3. This shows the effect of increasing the width of the switch transistors (devices with minimum length) on the bandwidth at the output of the TIA/MA and the switch. The bandwidth plot is normalized to the switchless design in Figure 5-2(b) for comparison. Since, at small sizes, the switches provide a smaller capacitive load than the latches do, the bandwidth at the TIA/MA output is greater than the switchless design. This scenario however results in a very low bandwidth at the output of the CMOS pass-transistor switch due to the switch's on resistance.

For the design in Figure 5-2(b), the clock buffer presents a smaller capacitive load than the latches in the data path. Hence the reasons one would use a switch design in Figure 5-2(a) to shield the transimpedance stage driving the high-speed data path ( $C_{L1}$ ) from the clock path ( $C_{L2}$ ) are out-weighted by the bandwidth loss associated with implementation of the actual switch. The implementation in Figure 5-2(b) was thus chosen and the transimpedance stage was directly loaded by the latches and clock buffer circuitry in parallel without any switches in the high-speed path.



Figure 5-3 - Circuit simulation for design in 5(a) with CMOS pass-transistor switches with varying width (minimum length) versus bandwidth at the output of the TIA/MS and switch, normalized to switchless design in 5(b).

For analysis purposes, we will now see when it is beneficial to use the switch implementation. Taking the best case switch multiplier found in Figure 5-3 of 10, we will vary the capacitive load of the TIA. In Figure 5-4 the two implementations in Figure 5-2 are compared. First, no switches are used with the TIA loaded by the capacitive load indicated on the axis. Next, the implemented pass-transistor switch design is used for the two paths, each with a value of half the capacitive load. The resulting bandwidth is then normalized to the switchless design. We can see that as the capacitance increases, there may be a crossover point where it would be beneficial to adopt the switch implementation to shield unwanted loading capacitance. It is thus important to revisit this analysis when considering new implementations.



Figure 5-4 – TIA capacitive loading for both implementations in Figure 5-2.

### **ANALOG FRONT-END**

In this proposed reconfigurable architecture, the TIA and MA should be able to optimize the power and bandwidth requirements of a given receiver. This can lead to power savings because the clock receiver requires less bandwidth than the data receivers due to the sub-rate architecture of the proposed design. For example, increasing the feedback resistance can be used to trade

bandwidth for gain of the lower speed clock path. The analog front-end (AFE) block diagram and circuits are shown in Figure 5-5.

The AFE consists of an inverter-based TIA, DC level shifter (LS) and three-stage main amplifier (MA) in a pseudo-differential structure. The single-ended photo-current is fed to a main TIA while the input of a TIA replica is left open. Both TIA inputs are connected to pads for equal loading, as well for the ability to use either as an input, as will be discussed later. The TIA replica doubles the power consumption and input-referred noise power. It is, however, necessary to improve the common-mode rejection from both the supply and substrate noise. The TIA itself is a CMOS inverter with resistive shunt feedback. An additional NMOS shunts the feedback resistor to control the TIA's bandwidth and gain when the receiver is reconfigured. This topology is a low-noise and high-gain TIA with reasonable power consumption due to high input transconductance achieved by the reuse of current in PMOS and NMOS transistors. For DCoffset compensation, a differential low-pass filter extracts the difference between the MA outputs and produces a feedback voltage  $V_{fb}$ . The feedback voltage is then applied to a differential pair  $M_{o1}$  and  $M_{o2}$  to steer its tail current I<sub>offset</sub> into the TIA input nodes.

A low cut-off frequency of 0.72 MHz is achieved as a trade-off between on-chip passive area and introduces baseline wander for long runs of consecutive identical digits, however is sufficient for PRBS31 data sequences in our application. The low common-mode voltage at the TIA output ( $\sim 0.45$  V) is not sufficient to properly bias the subsequent MA ( $\sim 0.7$  V is required). Therefore, a differential pair with a poly-silicon resistor is employed instead of the tail current source introduced between the TIA and the MA to provide the required DC level shifting [78]. The LS also prevents the active feedback used in the MA from loading the TIA's output resistance. To provide a signal with sufficiently large amplitude for the subsequent latches (or ILOs), the LS is followed by a three-stage MA. Each MA stage consists of two differential amplifiers in the forward path and two-level active feedback (AF) as shown in Figure 5-5 [79].

Inductorless AF-MAs were used in this implementation as the total amplifier size is considerably reduced when on-chip passive inductors are avoided. In addition, since this architecture has multiple receivers nearby, the inductorless design avoids inductive coupling between neighboring channels. These features make the inductorless AF-MAs the preferable choice for our application.

The multi-level AF-MA architecture achieves high gain and wide bandwidth without using onchip passive inductors. Moreover, this architecture enables straightforward gain/bandwidth adjustment when the RX is reconfigured by controlling the tail current sources in the forward and feedback paths. The design of the front-end targeted data rates up to 24 Gb/s assuming a total input capacitance of 200 fF. The targeted energy efficiency was 1.5 pJ/b.



Figure 5-5 - Block diagram and circuits of the designed analog front-end (AFE).

#### **OSCILLATORS**

Each receiver in the proposed architecture contains one oscillator. Injection locked oscillators (ILOs) are one type of oscillator, commonly used due to their compact size and low power [18]. An *N*-stage differential ILO also provides multiple output phases, each delayed by 360/2N degrees, which is useful for sub-rate latched data receiver architectures such as this one. The ILO block in this implementation consists of a four-stage differential oscillator with four output phases, shown in Figure 5-7(a).

Once it is decided which receiver will accept the forwarded clock from the transmitter, a multiplexing clock buffer routes the TIA and main amplifier output to the ILO stage, illustrated in RX1 of Figure 5-1. The TIA and MAs only need to amplify the signal to a fraction of the full-rail signal before it feeds into an ILO buffer and injection circuitry. Signal regeneration is a very efficient method of signal amplification [80], and the ILO will act as a regeneration stage to the small output clock signal of the main amplifier (~ 150 mV<sub>pp</sub>). Once passed to the ILO, the oscillator will injection lock to the forwarded-clock, tracking the phase and frequency within a given range.

The ILO delay cells are designed using fully differential current-mode logic (CML) circuits, illustrated in Figure 5-7(b). The differential delay cell with symmetric loads is controlled using the *Vtune*, *Vc* and *Vb* voltages, generated using replica biasing circuits similar to those in [81]. The replica bias circuit uses digital to analog converters (DACs), where binary control bits raise or lower the current in the replica bias circuit which in turn increase or decrease the free-running oscillation frequency. In the proposed design, six binary and ten thermometer coded bits were used in the replica bias stage, providing the ILO both coarse and fine frequency adjustments (1.8 GHz – 10 GHz). Three DAC blocks are shown in Figure 5-6, used to control the ILO free-running frequency (center) and injection strength to the ILO (top and bottom). These are quite

large in size and intended to facilitate testing, however, in a follow-up design can be made smaller.



Figure 5-6 - Digital-to-analog control blocks for ILO and injection circuits.

The ILO can be locked to an input signal using the injection circuit, illustrated in Figure 5-7(c). The injection circuitry sinks pulses of current corresponding to the input reference clock frequency, causing the ILO to lock to the reference signal. Using the analysis approach in [54] and circuit simulations, the jitter tracking bandwidth (JTB) was found to vary from approximately 50 MHz to 500 MHz based on the injection current strength ranging from 0.05 to 0.5 times the clock signal current.



Figure 5-7 - (a) CML ILO structure in each receiver, (b) one delay cell of ILO, and (c) injection circuit for ILO locking.

The ILO signal can then be distributed to other receivers that are configured to accept data input (*RX2* in Figure 5-1), each containing their own ILO. This two-stage ILO approach serves as a method to phase align the on-chip distributed clock to each individual channel [73]. By controlling the location of injection into the ILO in the data receiver, the output phase can be varied without the input reference phase changing. This allows for phase alignment with the incoming data stream on a per channel basis [17]. For quarter-rate operation, the ILO must be tunable  $\pm$  0.5 UI or 90 degrees. The ILO output phase can be set to 0 or 90 degrees when the reference clock is injected into the ILO with injection current set to maximum from *INJ CIRCUIT 1* or *INJ CIRCUIT 2*, respectively (Figure 5-7(a)). The injection signal can also be inverted due to the complementary structure illustrated in Figure 5-7(c) (*EN* and *ENBAR*) to obtain 180 and 270 degree ILO output phase shifts. Coarse phase adjustments are illustrated in

Figure 5-8, with injection configuration (*inj config*) corresponding to the two injection points (*INJ CIRCUIT 1* and *INJ CIRCUIT 2*) and their complements.



Figure 5-8 – ILO output "phase 0" showing coarse skew ability using four different injection configurations.

By injecting into the ILO from both *INJ CIRCUIT 1* and 2 while varying the injection strength of each in a complementary manner (interpolation), ILO output phases in between 0 and 90, 90 and 180, 180 and 270 or 270 and 360 degrees can be obtained. The ability to fine tune the output phases are shown in Figure 5-9, where one ILO output can be interpolated between two phase configurations (for example between *Inj Config A* and *Inj Config D* in Figure 5-8). This image is of a simulation which is run several times, where each time the control bits of the injection strength DAC are varied. Unlike [17] which only had three interpolation settings, our injection strength is based on six binary weighted current source settings in each of the two injection circuits per ILO. The interpolation of these independently controlled injection circuit output only, eliminating the need to detune the free-running frequency of the ring. Since any ILO can be tuned 360 degrees, static phase differences between the injected clock and the output of the ILO can be accounted for using an external control loop.


Figure 5-9 - ILO output "phase 0" showing fine skew ability using many different injection configurations.

The Cadence layout of the ILO is shown in Figure 5-10. Care was taken to ensure the differential interconnect between each delay stage was length matched. This reduces the chance of systematic offset in output phases. Isolation and proper substrate biasing was also done in and around the ILO, to prevent external signals from getting in or internal oscillations from unknowingly propagating out of the ILO.

| ILO Injection Cell | ILO Delay Cell | <u> </u> |  |
|--------------------|----------------|----------|--|
| LO Injection Cell  | ILO Delay Cell |          |  |
|                    | 2<br>BUFFER 1  |          |  |

Figure 5-10 – Layout of ILO showing differential delay cells, injection blocks and clock buffers.

Using back-annotation and mismatch models for schematic simulations, each of the four ILO output phases were monitored (Figure 5-11) and found to have a worst case 8 ps between the two differential output stage buffers. To account for phase-to-phase misalignment during post fabrication tuning (if required), analog delay interpolators in the clock buffer (*Buffer* in Figure 5-1) allow for independent delay control of differential ILO output phases *0 and 180* relative to *90 and 270* (Figure 5-7(a)). The analog delay line can interpolate between a path with zero delay to a path with two delay stages, enabling up to 15 ps of delay, with a block diagram shown in Figure 5-12.



Figure 5-11 – Simulated differential output phases.



Figure 5-12 – Analog delay line for clock phase alignment.

Corner analysis is then done on delay stages in the ILO, with Cadence simulation results shown in Figure 5-13. As can be seen, across corners the gain is greater than  $\sqrt{2}$  indicating the eight-stage differential oscillator will start oscillating [77]. The common-mode rejection ratio (CMRR) is also plotted for the delay stage, and found to be < -20 dB at low frequencies and maximum -14 dB at high frequencies.



Figure 5-13 – Simulation results for ILO differential delay stage, showing gain and common-mode rejection ratio across corners.

#### **CLOCK DISTRIBUTION**

When receivers are repurposed, the distribution network of the forwarded clock signal needs to change as well. This is a required feature since any of the receivers on-chip can be used as a clock receiver, meaning the clock must be able to be sent or received to or from any receiver. The clock distribution block is shown in Figure 5-14, designed as two unidirectional drivers with transmission lines spanning two receivers. To satisfy all routing requirements, one driver is used to transmit a clock signal to the previous receiver (*PR*) and one to transmit to the next (*NX*), with

unused driver circuits disabled. Using this layout, a clock receiver can broadcast the clock signal to neighboring data receivers, while data receivers can use the incoming clock signal and repeat it to neighboring data receivers in the chain.

Using Figure 5-14 as an example, once the forwarded clock is injection locked to the clock receiver RX(n), a CML clock driver (CLK DRV(n)) transmits a differential clock signal to the data receivers. On the receiving end, CLK DRV(n+1) uses current-mode signaling to receive the clock signal (green dotted trace Figure 5-14) using input from the previous receiver (IN PR path). The clock signal is then used to injection lock the ILO in data receiver RX(n+1) using the path CLK IN (red trace). Since there are two possible CLK IN paths per clock driver, both are multiplexed and only one is chosen at a time to send to RX(n+1). The clock distribution feeds into the same ILO amplifying stage in RX(n+1) that the TIA output does, as shown in Figure 5-1, thus the signal peak-to-peak amplitude can be kept relatively small to save power. A buffer exists between the clock distribution input and the ILO, minimizing any crosstalk due to feedthrough. CLK DRV(n+1) of the data receiver is then configured as a repeater to send the clock to the next data receiver CLK DRV(n+2) using the OUT NX path. If more receivers are needed, the next clock driver block can keep the same repeater configuration as CLK DRV(n+1), which allows the clock to be sent on further down the chain if required. This design allows for a variable number of parallel data receivers to be added as needed, possibly based on link throughput requirements.

This proposed bi-directional clock distribution architecture has advantages in this reconfigurable architecture over one-way clock distribution rings and clock buses as seen in works such as [25] and [82]. Firstly, since the clock recovery receiver can change dynamically, each driver is designed to drive the same interconnect load and reach receivers to the left or right

of the clock receiver. This prevents the need of the one-way clock distribution network to loop around from the last receiver on one end of the chip back to the first on the other end to cover all receivers. Secondly, because the architecture allows for dynamically adding or removing receivers from the chain without affecting the clock distribution network, it has less effect on the system performance since the transmission lines are only as long as needed to reach the neighboring receiver. This has the attractive feature of completely disabling unused circuitry in drivers and receivers not needed for a given system configuration without them adding parasitic capacitance to a clock distribution bus.

In this design, the transmission lines are shorter than  $\lambda / 10$  of the clock's electrical signal and they are a standard, straight-line structure. Hence it was concluded that matching and an in-depth electromagnetic analysis was not required. Instead, extracted simulations were done investigating transmission line design and capacitive loading. These looked at spacing between conductors, transmission length, conductor width and ground plane options. The chip contained a total of three receivers, each 375 µm in length. The width of the wire was set using DC current flow analysis to satisfy design rules. Values of 0.5  $\mu$ m wire widths and 1  $\mu$ m wire spacing were found to give sufficient performance. It was decided to use a GSGSG (Ground; Signal) configuration on metal 6 (M6), with one ground plane on the metal 4 (M4) layer below as illustrated in Figure 5-15. This allowed metal one to three underneath the ground plane to be used for basic control signal routing without coupling with the RF signal path. The metals above the interconnect were kept clear until the top-level (metal 9) for power supply routing, which runs perpendicular to the transmission lines. The power grid over the clock distribution lines will add some capacitance to the structure. However, since the clock distribution runs across the chip, grid traces were required to pass over them to uphold a strong and continuous power grid for proper power

distribution. Bandwidth of the 375µm interconnect was simulated using extracted RLC blocks and found to be approximately 10 GHz when loaded with the driver circuitry.



Figure 5-14 - Reconfigurable on-chip clock distribution network between receivers. In this example, RX(n) is the clock receiver. Sizes indicate W/L transistor ratios used.



Figure 5-15 - Implemented GSGSG transmission line for clock distribution network with metal 4 ground plane and metal 6 signal routing (not to scale).

# DATA PATH

When a receiver is repurposed to accept incoming data, the ILO path from the TIA is disabled and the latches enabled. Four fully differential CML latches are implemented in each receiver, in a primary/secondary configuration (Figure 5-16). The latches use the clock phases from the ILO within the data receiver, thereby giving the ability to phase align the clock to the data on a per channel basis. Latches in the data path can be enabled or disabled to support full, half or quarterrate architectures. This feature allows for optimizing performance and power to cover a larger range of input data rates. As discussed in [83], CML latches have the desired power-supply and common-mode rejection qualities and also offer better sensitivity over the strong-arm latch design, although at the expense of higher power consumption. The latches were designed for up to 6 Gb/s operation to accommodate multiple rate architectures. Parasitic extracted simulations showing full-rate operation at 24 Gb/s (6 Gb/s per latch) are presented in Figure 5-17. Several lone-ones are strategically sent on the input data port to allow each latch to receive one of these logic-high bits. The outputs of the latches are multiplexed and one of the possible four outputs can be sent off chip for measurement. The data output path uses a system like the clock distribution drivers to select one latch output from among all data receivers to route off the chip for high-speed probing.



Figure 5-16 - (a) Primary/secondary latch architecture and (b) CML latch circuit.



Figure 5-17 - Extracted simulations of latches showing functionality at 24 Gb/s.

## **EXPERIMENTAL RESULTS**

The reconfigurable receiver prototype is implemented in TSMC 65 nm technology and contains a total of three receivers (Figure 5-18). Two of the three receivers are identical, containing a TIA, ILO and latches (RX1 and RX2). The layout was not optimized for area, however each of these receivers fits within a 210  $\mu$ m x 160  $\mu$ m space (active area of TIA 6000  $\mu$ m<sup>2</sup>; core 5500  $\mu$ m<sup>2</sup>; DACs 10000  $\mu$ m<sup>2</sup>). A third receiver (RX3) was also implemented on-chip and has an identical circuit core; however, it did not contain a TIA and was used for characterization purposes. The entire chip measures 1 mm x 0.7 mm and is wire-bonded inside a 64-pin LQFP package. High-speed probes are used for input and output signals on two sides of the chip. The power breakdown of the active blocks used during the experimental testing is illustrated in Figure 5-19.



Figure 5-18 - Die photo of 1 mm x 0.7 mm receiver chip in 65 nm CMOS, containing three reconfigurable receivers.



Figure 5-19 - Power consumption of source-synchronous link during experimental verification.

The analog front-end (TIA and MA) circuitry consume between 10 and 15 mW each on a 1 V supply, depending on the gain and bandwidth settings required by the channel. The ILO blocks are tunable from 2 to 10 GHz, and consume 1.5 mW each during the demonstrated operation at 2 GHz for an 8 Gb/s input data stream. This consumption includes a CML buffer in the injection circuitry path, used for switching between TIA and clock network input. The four CML latches consume a total of 1.9 mW, with an additional 1 mW clock buffer circuit between the ILO and latches. The buffer can also delay the clock phases independently to adjust for phase offsets due to variation if needed. Future iterations could include more methods of tuning to ensure proper spacing between the four phases. The CML clock distribution network, consisting of the two blocks in Figure 5-19, consumes 2.6 mW. Breaking this down into two categories, a repurposed data receiver dissipates 19.9 mW at 8 Gb/s or 2.49 pJ/b (0.61 pJ/b core excluding the TIA block) and the clock receiver, envisioned as being shared and power consumption amortized amongst multiple receivers, consumes 14.6 mW (4.6 mW core excluding the TIA block). The power of

the clock distribution network would increase by 2.6 mW, regardless of the clock speed, for each data receiver added to account for the extra clock routing. Considering a per lane CDR implementation can require more than 26 mW in a similar technology and data rate [19], the proposed design with 5.6 mW of per lane clock circuitry overhead is small. The clock receiver power includes the TIA, ILO and clock distribution network. Not included in this power breakdown are the output drivers used to extract the clock and data to off-chip 50 Ohm terminated measurement equipment.

The experimental setup is illustrated in Figure 5-20. Experimental results in this chapter do not include optical signals or crosstalk and are only focusing on the proposed reconfigurable receiver architecture. To show repurposing of the receivers, a GSGSG probe was landed on the chip and applied attenuated RF electrical signals as input from a programmable pattern generator (PPG). The probe was positioned to span two optical-ready receivers and provided single-ended electrical input to two separate differential TIAs simultaneously. One probe output provided an 8 Gb/s, PRBS 2<sup>31</sup>-1 data sequence to a data configured receiver. The other probe output provided a quarter rate clock signal (2 GHz) to a second TIA, configured as a clock receiver. A microcontroller was used to serially program registers inside the chip, providing the ability to set up the circuits and configure receivers. Clock or data output from the chip can be obtained using a GSG probe. A data output stream was extracted and sent to a bit error rate tester (BERT) to obtain bit error rate (BER) values. It was observed that the ILOs in both the clock and data receivers successfully locked to the input reference clock signal, thus showing proper functionality of the clock driver network. The output of the clock after passing three clock drivers and an output stage is shown in Figure 5-21. The data output of the latches was extracted from the chip using a GSG probe and an error detector was used to obtain BER measurements.

The voltage applied to the TIA during experimental measurement was then used in simulation to obtain the approximate corresponding current input and used to plot the BER curves in Figure 5-22(a) below, for each of the four latches during quarter-rate operation. This simulation accounted for all RF attenuators used in the experimental setup. Following this, the inputs to the TIAs were interchanged (PPG CLK and PPG Data) and the receivers repurposed. This is to test the second case of a clock input to the first TIA and data input to the second TIA. BER curves were obtained and are presented in Figure 5-22 (b). Converting this TIA input current to an optical power as a reference, 100 µApp corresponds to approximately 167 µW of peak optical power or -7.82 dBm (assuming a Germanium photodetector with responsivity of 0.6 A/W using). These results are important for the proof of concept of the design, as they show each receiver able to successfully be reconfigured from a clock receiver to a data one. Results found in the BER curves show similar sensitivity for both receivers when accepting data input, and thus show intended system functionality. The reference clock, feeding input to one of the TIAs, was finely delayed and bathtub curves for each of the two repurposing scenarios are presented in Figure 5-23. During the bathtub curve measurements, data input amplitude was held at 130  $\mu A_{\text{pp}}.$  The bathtub curve represents only one latch output from the possible four. Next, bathtub curves for each of the four latches are presented in Figure 5-24 for 8 Gb/s operation. They are created by sweeping the delay of the programmable pattern generator (PPG) clock, which serves as the chip's input clock. The delay of the clock is varied with respect to the data input.

The input clock signal during the bathtub and BER measurements at 2 Gb/s (quarter-rate of 8 Gb/s) was set as a default to 194  $uA_{pp}$ . To show the effect of clock strength on the ILO locking characteristics, the reference clock input amplitude was attenuated and the locked ILO jitter

recorded. The TIA input current versus clock jitter is shown in Figure 5-25. During the clock input attenuation, all gains of the circuits on-chip stayed constant.



Figure 5-20 - Experimental setup.



Figure 5-21 - ILO clock output.







Figure 5-22 - BER curves for (a) clock input to RX1 and data input to RX2, (b) data input to RX1 and clock input to RX2. Measurements are for a per latch data speed of 2 Gb/s (8 Gb/s total) with a clock input amplitude of 194  $\mu$ A<sub>pp</sub>.



Figure 5-23 - Bathtub curves for data input to RX1 and clock input to RX2, then clock input to RX1 and data input to RX2.



Figure 5-24 - Measured bathtub curves of each latch during quarter-rate operation at 8 Gb/s.



#### Locked ILO Jitter Characteristics

Figure 5-25 - Effect of reference clock input strength on ILO clock jitter.

A knee in the curve is observable at about 75 uA input current, where the jitter starts to rise quickly. As discussed in [84], the increase in jitter can be due to the reduced locking strength of the forwarded clock signal. With lower injection strength, ILO jitter dominates and resembles more of the free-running ILO characteristics. Considering the TIA is the first block in the chain, the sensitivity of it also determines the lowest input signal strength. Below this level, noise dominates the signal. For different clock attenuations, BER and bathtub curves were recorded to see the effect of higher ILO jitter and plotted in Figure 5-26and Figure 5-27. As the input signal reduces so does the injection strength and a rising BER floor is observed. The 0 dB attenuation corresponds to an input current of 194  $\mu$ A<sub>pp</sub>. The 4 dB attenuation corresponds to 123  $\mu$ A<sub>pp</sub>, approximately equal to the input data signal for the BER of 10<sup>-12</sup>. It is thus important not to underestimate the importance of the clock power in optically forwarded links, as an optimum setting must be found to minimize both laser power as well as BER.



Figure 5-26 - Effect of reference clock input strength on BER curves. Clock attenuation of 0 dB corresponds to an input signal of 194  $\mu A_{pp}$  and -4 dB corresponds to 124  $\mu A_{pp}$ .



Figure 5-27 - Effect of reference clock input strength on bathtub curves. Clock attenuation of 0 dB corresponds to an input signal of 194  $\mu A_{pp}$ , data input amplitude held at 130  $u A_{pp}$ .

## DISCUSSION

The receiver presented in this chapter is compared to other state-of-the-art work in the literature in Table I. The TIA used here is designed as a high-performance differential device for higher data rates and for this reason is the main power consumer. Compared to the other works with similar link applications, an optimized TIA will significantly increase the energy efficiency and sensitivity of the system. The power dissipation of the proposed core circuitry is comparable to others in the literature, with the lower power due to the quarter-rate architecture. The LC oscillators used in [24] provide better phase noise than ring oscillators, however, occupy much more chip area and are not practical when many oscillators are needed. The implementation of one oscillator per receiver is not uncommon, especially when per lane deskew is done, with architecture complexity comparable to other optical receivers in the literature [18]. There is very little overhead in area for data-configured receivers as all blocks are used to latch incoming bits. Once the extensive use of tuning DACs in the proposed design and the smaller technology node of [25] are accounted for, the proposed design is of competitive area. The overhead in area for the single clock-configured receiver is higher since the clock buffer, latches and their respective

tuning DACs are disabled yet still present. However, this area can be relatively small and accounts for an overhead in area of 24 % of the clock-configured receiver in the implemented design. There is no overhead in energy consumption in the proposed architecture since disabled blocks do not draw current. The TIA block is also bandwidth and power scalable to increase energy efficiency in the lower speed clock receiver.

The test receiver without a TIA, containing only the core receiver circuitry, was found to operate at speeds greater than 12.5 Gb/s (limited by available test equipment) with a minimum input to the latches of 40 mVpp for a BER of  $1 \times 10^{-12}$ . A bathtub curve opening of 16 ps demonstrates the latches can properly function at 6.25 Gb/s each with additional room for setup and hold-time variation.

Parasitic extracted simulations were done on the proposed design as well as one where the clock path buffer is physically disconnected from the TIA+MA output node. The simulations showed a 6 % decrease in bandwidth on the TIA+MA node with the clock path buffer connected. This indicates that the extra clock path buffer in the proposed design is not the main contributor of bandwidth lowering parasitic capacitance. The main loading on the TIA with MA is thus due to routing capacitance and the four differential latches in the data path.

The TIA's power dissipation is higher than some state-of-the-art designs optimized for 8 Gb/s, in part because it was optimized to support a higher data rate. However, the TIA is not the focus of this work and only serves as an input to the proposed reconfigurable architecture. In simulation, our design shows much better energy-efficiency compared to inductor-based design in [78], [85] and [86]. Even after fabrication, the energy-efficiency is still comparable to [86] and much better than in [78] and [86]. Moreover, our receiver occupies the smallest area compared to these other full-bandwidth reference designs.

Future iterations of this architecture however can focus on low-power or low-bandwidth optimized TIAs to achieve better power savings as in [25]. The core circuits were overdesigned to provide wide tuning range, allowing the bandwidth limitations of the test chip to be explored. To address this in any future iterations, load resistors can be increased for a given bandwidth specification, saving power by reducing current to achieve an equal voltage amplitude.

# CONCLUSION

An efficient and reconfigurable source-synchronous receiver architecture is demonstrated. Each on-chip receiver can be repurposed to act as a clock or data receiver. The receiver is meant to explore the domain of electronics made for optics, and work towards addressing the manufacturing variable in photonic integrated circuits. The experimental proof of concept implementation shows a quarter-rate receiver functioning at 8 Gb/s with an energy efficiency of 2.4 pJ/b. Repurposing was demonstrated using BER and bathtub curves from two identical receivers with clock input to one and data input to another. The inputs to the receivers were then reversed and show similar functionality, thus showing the desired repurposing ability. The clock driver network discussed can be extended to more receivers as needed. The extra clock path at the output of the TIA with MA is only one-fourth the size of the latch input of the data path. This overhead accounts for only a small increase in capacitive loading of the TIA, translating into a 6 % bandwidth reduction on the TIA with MA output node in extracted simulations. Since unused blocks can be completely turned off, this represents no power overhead for repurposing receivers in the proposed architecture.

|                                   | [25]               | [17]               | [24]              | This<br>Work |  |  |  |
|-----------------------------------|--------------------|--------------------|-------------------|--------------|--|--|--|
|                                   |                    |                    |                   |              |  |  |  |
| Link                              | Optical            | Electric           | Optical           | Optical      |  |  |  |
| Application                       | CIK<br>Fwd         | al Clk             | CIK<br>Fwd        | Clk          |  |  |  |
|                                   | Twu                | Twu                | Twu               | Fwd          |  |  |  |
| Technology<br>(nm)                | 28                 | 65                 | 65                | 65           |  |  |  |
| Reconfigurat ion of lanes?        | No                 | No                 | No                | Yes          |  |  |  |
| Clock<br>Frequency<br>(GHz)       | 8                  | 3.7                | 12.5              | 2            |  |  |  |
| Data Rate<br>(Gb/s)               | 32                 | 7.4                | 25                | 8            |  |  |  |
| Sanaitivity                       | 118                | -                  | 74                | 120          |  |  |  |
| Sensitivity                       | $\mu A_{pp}$       |                    | $\mu A_{pp}$      | $\mu A_{pp}$ |  |  |  |
| Clock                             | CMOS               | CML                | LC                | CML          |  |  |  |
| Architecture                      | ring               | ring               |                   | ring         |  |  |  |
| Data RX                           |                    |                    |                   |              |  |  |  |
| Active Area<br>(mm <sup>2</sup> ) | 0.004 <sup>1</sup> | 0.026 <sup>2</sup> | 0.06 <sup>3</sup> | 0.022        |  |  |  |
| TIA (mW)                          |                    |                    | 6.5               | 15           |  |  |  |
| Core (mW)                         |                    | 6.8                | 7                 | 4.9          |  |  |  |
| Core                              |                    | 0.92               | 0.26              | 0.61         |  |  |  |
| Efficiency<br>(pJ/b)              |                    |                    |                   |              |  |  |  |
| Total (mW)                        | 4.22               | 6.8                | 13.5              | 19.9         |  |  |  |
| Total                             | 0.13               | 0.92               | 0.54              | 2.49         |  |  |  |
| Efficiency<br>(pJ/b)              |                    |                    |                   |              |  |  |  |
| Clock RX                          |                    |                    |                   |              |  |  |  |
| Active Area<br>(mm <sup>2</sup> ) | 0.003              | 0.026 <sup>1</sup> | 0.35 <sup>3</sup> | 0.022        |  |  |  |
| TIA (mW)                          |                    |                    | 5                 | 10           |  |  |  |
| Core (mW)                         | 2.77               | 8                  | 9                 | 4.6          |  |  |  |
| Total (mW)                        | 2.77               | 8                  | 14                | 14.6         |  |  |  |

TABLE 3 - COMPARISONS WITH STATE-OF-THE-ART

1. Inferred from text or chip photo.

2. Does not include input equalization.

3. Does not include DACs

# LAYOUT AWARE DESIGN METHODOLOGY

When designing the CMOS receiver project, the first version of the receiver failed due to poor layout choices. Three receivers on a chip led to electrical signal crosstalk in several places and unexpected bandwidth limitation, the layout shown in Figure 5-28. This brought about a very important realization, that the layout process is just as important as the schematic design process. The schematic can be optimized and perform better than anything presented in the literature, however, without proper layout, the circuit will not live up to expectations. This section presents some items considered before any circuit schematics were completed. This is a *layout-aware* design methodology, where the idea of the final layout influences the schematic and architecture design choices from the very start of the project.



Figure 5-28 – Layout of the first version of configurable receiver. The receiver had many problems dealing with inter-receiver crosstalk.

## FLOORPLANNING

Floorplanning is the first and most important step in the layout process. Here, approximate locations of circuit blocks and the required interconnections between these blocks are considered. At the lowest level, device placement should consider things such as matching and

orientation, heat dissipation and gradients, coupling, input and output locations, etc. Any highfrequency signal lines should be properly shielded and lengths considered for parasitic loading or matching. Power and ground routing both within each circuit block can also be worked into the planning of device locations, to ensure strong connections to supplies. This should be an iterative process, where the planning is dynamic and is updated at different levels of abstraction as the top level becomes more complete.

# SUPPLY REGULATION

Supply regulation is an important consideration when using sensitive or voltage dependent circuits. A few methods were found in the literature to keep the voltage supply stable. The first method is on-chip local capacitance for filtering. This would be considered the simplest method, where the capacitor sinks or sources current when the supply voltage deviates from its average value. Local capacitance refers to keeping the capacitors as close as possible to the sensitive nodes. Examples of such on-chip devices would be metal-insulator-metal (MIM) capacitators or MOSFET capacitors. A second method is the low-dropout (LDO) regulator. This circuit uses either a PMOS or NMOS device, controlled by a feedback loop, to regulate the supply voltage. These methods often use an op-amp circuit in the feedback loop to regulate to a reference voltage. LDOs are used extensively in larger systems. The downside is that they can sometimes require large on-chip capacitors, or connections to off-chip capacitors which require a chip pad. In both cases the area requirements can be very costly. In this project, where chip area was tight and multiple receivers are in close proximity, an LDO per receiver was not feasible. We thus had to explore alternatives to reduce or avoid coupling.

## **CROSSTALK REDUCTION THROUGH ESD PATH**

Electro-static discharge can be a significant problem, possibly leading to permanent failure of the chip or parts of it. Protection against the high-voltage discharge, caused by a human or machine, can be accomplished using a diode structure illustrated in Figure 5-29(a). When multiple pads use ESD, a dedicated voltage supply, set to the highest supply on-chip, is normally used as a reference to reverse bias the upper diode (*VDD*). However, this diode biasing scheme allows for a high frequency signals to see a low impedance path between protected nodes. This can be seen when looking at the capacitance in the high frequency model of a diode where the VDD is effectively not connected directly to an AC ground, shown in Figure 5-29(b). This situation occurred due to insufficient and poor placement of on-chip decoupling capacitance, thereby reducing its effects. In version one of the project, all inputs and outputs were using ESD protection, biased by shared supply and ground. This was found to cause large noise signals at the input, due to the signals at the outputs. This led to major issues with the operation of the first version of the receiver.



Figure 5-29- (a) ESD connection with pads; (b) ESD coupling paths between two I/O pads.

## **GROUND SPLITTING**

When designing sensitive circuits, one method to reduce coupling is to use a separate supply or ground signals. An example of this would be the separation of digital and analog grounds onchip as in Figure 5-30. A problem one encountered with this is that layout versus schematic checker (LVS) understands both grounds as a single connection through the substrate. Short of a tool that extracts substrate coupling paths, the substrate is assumed be a perfect conductor and connects the body terminal of all NMOS devices together as well. Assume for example two NMOS devices with their body and source terminals tied together, one labeled as *ANALOG\_GROUND* and one as *DIGITAL\_GROUND*. This would result in an LVS error due to the substrate connection linking the two net names to the same substrate node.



Figure 5-30 - Separated supply and grounds.

Another LVS error may occur as well in this situation if global net names or inherited connections are given. For example, I decided to break-up the ESD protection devices into multiple portions as illustrated in Figure 5-31(a) (break in ESD ring), to eliminate any coupling paths. A problem in Cadence software is that the ESD provided in the TSMC PDK works with global nodes. To split up ESD instances, LVS must understand that one set of global nodes are intentionally not connected to the other set of global nodes. In order to obtain an LVS clean design in these circumstances, it must be conveyed to the tool that you wish to treat these items differently. This can be accomplished using the *SUB2* layer in Cadence. By drawing a box around these items, the tool understands that global nodes, such as substrate and inherited

connections, do not apply within. An example of this is shown on the separated ESD pads in Figure 5-31(b).



Figure 5-31 - Ground separation using SUB2 layer.

## SUBSTRATE NOISE PATHS

Once grounds are separated and LVS passed, the designer must also think about coupling paths not visible in normal extracted simulations. The substrate itself is a conductor, offering a coupling path for unwanted signals to impose themselves on sensitive circuits. High impedance paths in the substrate, such as a specially doped region or deep trenches, help reduce coupling. These high impedance paths vary from process to process. An example of one method used in this receiver design is shown in Figure 5-32. Here a high-impedance layer (NT\_N layer) is separating two properly biased substrate regions. It offers a high-impedance path, pushing signals lower into the substrate and thus increasing impedance. The result is that signals will take the path of low resistance, exiting the substrate and into the ground connection biasing the substrate (metal connected to substrate connections). This also highlights the need to properly bias the substrate with as many VIA connections as possible.



Figure 5-32 - Substrate isolation using NT\_N layer.

#### SUBSTRATE BIASING

The substrate and wells serve as the body connection to MOSFET devices. Poor biasing of the substrate can result is latch-up, a slow exponential increase of leakage current within the substrate causing potential circuit failure. This is due to parasitic bipolar transistors starting to conduct because of substrate current and parasitic substrate resistance [87]. To avoid latch-up effects, the p-type substrate should contain as many connections to the lowest potential as possible (normally ground). The same can be done for PMOS devices in n-wells, only using the highest supply for well biasing.

# PACKAGING CONSIDERATIONS

The designed chip must communicate with the outside world to be useful. Voltage supplies are normally brought on-chip via bond wires or bumps. These connections, both on and off chip, present parasitic resistances, inductances and capacitances. Wire bonding presents a parasitic inductance of approximately 1 nH/mm. The voltage across an inductor is related to the change in current through the inductor, shown in (20).

$$\boldsymbol{v} = \boldsymbol{L} \frac{di}{dt} \tag{20}$$

In Figure 5-33, the current consumption of two latches is shown. The red plots a strong-arm latch and the blue a CML type latch. This highlights one of the trade-offs between the two latch types, where the CML is higher average power consumption but less change in current consumption. With the CMOS type circuit, the high rate of change in the latch will modulate the voltage supply. The power supply fluctuating can affect many other circuits relying on a steady supply, such as oscillators or low-noise amplifiers. The supply fluctuations can be reduced through the use of on-chip capacitors if needed, but should be investigated first.



Figure 5-33 - Current draw from supply of CMOS strong-arm latches (red) and CML latches (blue).

# POWER DISTRIBUTION CONSIDERATIONS

How power will be distributed to the various circuits on chip should be considered early-on in the layout process. Compact layouts may be good for speed but area should be dedicated to solid ground and supply connections as well. Two limitations that may occur when planning the chip layout are the lack of chip pads or package pins. Both these can results in trade-offs that reduce the performance of the chip.

Ideally, each circuit block would have its own supply voltage, as illustrated in Figure 5-34(a). When a chip is pad limited, there is not enough chip space to fit all the pads the designer may want. This results in many circuits sharing the same supply voltage, illustrated in Figure 5-34(b). The problems that can result from this sharing are due to items previously discussed in this chapter, such as supply variation due to current consumption changes through the wire bond inductance. Unable to use the design in Figure 5-34(a), the star connection in Figure 5-34(b) can be considered.



Figure 5-34 – (a) Direct versus (b) star interconnect connections.

As implemented in version two of the receiver, the star connection provided each receiver a different branch of the supplies that were shared. In addition, MIM capacitors were added to each branch of the star connection to further isolate the receivers from one another. The layout of this is shown in Figure 5-35.



Figure 5-35 - Power supply routing using star configuration. Example taken from version two of chip layout.

In another scenario, the package may be the limiting factor causing supply sharing. In version two of this receiver, this was found to be a problem. The bond wires required to connect the chip pads to the package pins would become too long. To solve this, I decided to use one package pin for two separate on-chip supplies. This is illustrated in the bonding diagram in Figure 5-36. In this configuration, the bond wire also acts as a low-pass filter to prevent signal routes created off-chip.





Figure 5-36 – Bonding diagram and actual chip image of packaging using fan out design.

# SUPPLY ROUTING

Supplying a PCB with a voltage supply on the bench is relatively simple. Properly getting the voltage from the package to the on-chip circuit may be difficult. Because of the non-ideal nature of on-chip metals, supply voltages on these lines can vary with the current that is used by a circuit. To reduce the effect of one circuit's current draw on another, it is important to consider the supply and return paths for each circuit block.

Assuming the chip has a limited number of pins, a designer may choose the star configuration for power distribution discussed previously. As opposed to sharing main branches of the star connection, circuits can directly use power and ground connections to main nodes. This is illustrated in Figure 5-37, where (a) shows the supply and return path for one circuit running through another circuit. In (b), both circuits are fed directly from the main branch of the distribution network. The same is done with the ground connections. As illustrated, the main branches are normally thicker than the smaller connections to the circuit. This helps to reduce resistance and inductance of the supply and ground wires.



Figure 5-37 - Supply and ground routing using (a) indirect and (b) direct connections.

## **POST-LAYOUT SIMULATIONS**

Schematic simulations are important, but extracted simulations after layout should be considered even more so. The extracted simulations will give feedback on layout choices made, usually resulting is a reduction in performance in high speed circuits. The extraction tool creates extracted views by calculating and then adding parasitic components to the netlist. This, however, can severely slow down the simulation once the netlist starts to get very large. It should be decided what the designer is trying to accomplish with the simulation being performed. For example, if the bandwidth of a circuit is important to characterize, the designer may select to extract only parasitic capacitances of the layout. By ignoring the parasitic resistance, parallel capacitors may be grouped together, reducing the complexity of the netlist by avoiding a large amount of small RC parasitic circuits. In another scenario, if the designer is worried about voltage loss over a circuit due to current pull, an extracted simulation including only the parasitic resistors can reduce the netlist complexity. After several iterations of the layout and schematic, the designer can then use the full extraction of parasitic capacitors and resistors to get a more complete picture of the performance. Inductance is another option available in some extraction tools as well. This might be of use while simulating interconnects or long supply or ground connections, used by high-speed circuits.

Presented in this section is the grating-assisted all silicon photodetector (Si-PD) project that has been designed, fabricated and tested. This work was done in collaboration with Monireh Moayedi Pour Fard, a McGill doctoral student in photonics at the time. For my contribution, I investigated the photodiode structure, designed the different p-i-n and p-n junction structures and completed the layout design that was then fabricated. The work presented in this chapter has been published in conference and journal papers in [88], [89] and [90] respectively.

## **DIODE JUNCTION DESIGN**

In present-day data centers, 90% of optical connections are on the scale of 100 m or less and done using 850 nm VCSEL lasers [38]. As discussed earlier, silicon is capable of absorbing this 850 nm wavelength light. In addition, silicon has dominated the integrated circuit domain and is thus an ideal candidate to investigate as a material for use in photodiodes. The proposed design is shown in Figure 6-1. It is implemented in a 0.12 um SOI process. Laser light is applied at an angle of 24 degrees from the *Z* axis to optical grating couplers, which redirect the light to the horizontal *Y* direction into a silicon waveguide where it can be absorbed by a reversed biased p-n junction. The structure extends the p-n or p-i-n junction, responsible for capturing the carriers created from the applied laser light, along the same path as the light in the horizontal Y direction. This is in contrast to the common vertical PD design which captures light applied in the Z direction within the substrate in a small region (as discussed in Chapter 2). The grating coupler uses an alternating pattern, whose period is used to target the specific applied wavelength of 850 nm laser light. The grating structure is based on the Bragg condition and then optimized using 3D FDTD CAD tools from Lumerical [91].



Figure 6-1 - Proposed grating-assisted horizontal photodetector showing (a) conceptual illustration; (b) grating coupler and PiN structure; (c) side view of grating coupler [88].

An important characteristic of a photodetector in optical data communication is the bandwidth of the device. The bandwidth limitations of the photodetector can be broken down into two types, carrier transit and RC limits. When light is absorbed in the substrate, an electron-hole pair is created. The two particles, opposite in charge, are quickly pulled apart by either the built-in potential or applied electric field. For the carriers to be used, they must first make their way out of the substrate and into the circuit stages following the photodetector. The speed at which this happens is thus limited by either the speed of the carrier in the substrate or the distance which it must travel to exit the substrate, referred to as the transit time. If the particles are accelerated by the internal electric fields they add to the faster drift current, otherwise they make up the diffusion current which is a slower mechanism. The other limitation a device can have is the RC limit, created by a time constant equal to the product of the resistance and capacitance of the output node of the photodetectors. Either of these limitations can dominate the frequency response. Examples of each are illustrated in Figure 6-2. Initial calculations indicate that our devices would most likely be transit limited [91].



Figure 6-2 - Bandwidth limitations on lateral photodetector junctions.

To test different methods of designing the proposed devices, wide and narrow PiN junctions were designed and are shown in Figure 6-3(a) - (c). It illustrates a top view, looking onto the surface of the substrate. The doping is done within optical waveguides 220 nm high. Not shown here are the optical grating couplers, into which the light is applied and redirected. Using the calculations described above, the large intrinsic design was chosen as a variant to increase sensitivity by extending the electric field region. By using a long absorption length, the light applied would be subject to the largest possible area containing an electric field. To make a faster design, design variant in Figure 6-3(b) was done. This intrinsic region is on the order of 1  $\mu$ m, compared to the approximately 10  $\mu$ m width in (a). It was thought to be a good middle ground between speed and responsivity needs. Variant Figure 6-3(c) is a p-n type diode with no added intrinsic region, thus the smallest transit distance of the three. Since the light travels along the PN junction interface, a majority of the created carriers will be contributing to drift current due to the positioning of the electric field by design, as opposed to diffusion current in traditional vertical

PDs. By increasing the length of the horizontal PD, capacitance of the junction increases but the device can absorb more photons as discussed in Chapter 2, thus also increasing sensitivity.



Figure 6-3 - Top view of proposed Si-PDs in waveguides showing direction of applied light. (a) PiN structure with large intrinsic region; (b) PiN structure with small intrinsic region and (c) PN junction without intrinsic region.

Another downside to decreasing the intrinsic region is higher dark current. To explain, Figure 6-4(a) - (c) illustrates the effect of the physical dimensions on measurements. In (a), the wide intrinsic region of device (a) collects the most light, thus has the best responsivity. Since the intrinsic region is wide, the bias at reasonable values has little effect on the recorded responsivity. Since the intrinsic region is smaller in design (b), the responsivity is also lower than in design (a). As the intrinsic region becomes narrower, the dark current begins to rise due to tunneling between the *n* and *p* regions. This is due to the higher electric field and the shorter distance between the two semiconductor types, making it easier for a particle to be pulled across (tunnel). In image (c), a sharp increase dark current is observed as the reverse bias is increased.
This is due to very high electric field caused by the narrow width and high applied bias, causing impact ionization (avalanche effect).



Figure 6-4 – Illustrated comparisons between proposed photodetector designs.

To improve sensitivity in the narrower PDs, a multi-finger device variant structure was proposed, illustrated in Figure 6-5. This structure directs the light from the input to several identical, narrow detectors. The narrow detectors reduce transit time and the parallel configuration attempts to increase the sensitivity by collecting more photons over the multiple devices. This design thus aimed to use the strengths of both the narrow and wide PD designs together.



Top view of SOI substrate

Figure 6-5 - Top view of proposed multi-finger Si-PD, showing interconnection of PD and direction of applied light.During the design stage, several different types of grating-couplers were also designed.Layout and microscope images are shown in Figure 6-6. These show the different grating patterns used to redirect light from the vertical to the horizontal direction. The *p* and *n* doped regions are labelled, showing the previously discussed designs and doping considerations.



Figure 6-6 - Layout (top) and microscope images (bottom) of three SOI based grating-assisted SI-PDs; (a) variant with large intrinsic region added, (b) variant with three finger design and (c) variant with focusing grating coupler [88].

## **EXPERIMENTAL RESULTS**

The following experiments were performed by Monireh Moayedi Pour Fard, and published in [88]. Four variants which performed best are presented in this section, all of which contain the focused grating coupler design. The differences in the variants are the width of the intrinsic region with sizes of 5  $\mu$ m, 2  $\mu$ m, 1  $\mu$ m and 0.3  $\mu$ m for variants 1 to 4 respectively. Figure 6-7(a) shows the measured dark current versus applied reverse-bias voltage for an optical input of 0 dBm. The dark current is measured with an ammeter (Keithley, sensitivity: 0.1 pA). The responsivity versus applied reverse bias voltage is presented in Figure 6-7(b), obtained using a VCSEL CW laser input (Thorlabs) at 848.2 nm. In order to test the usefulness of the grating coupler, a variant containing the PD without a grating coupler was tested. Comparing to variant 1, the responsivity fell from 0.24 A/W to 0.006 A/W. This indicates the grating coupler is vital in the design as it increases the responsivity by 40x. In variants 3 and 4, responsivity increases dramatically at higher bias voltages due to the impact ionization effect, known as avalanche mode in PDs. Figure 6-7(c) shows the generated photocurrent versus input optical power, with a constant reverse bias of 8 V.



Figure 6-7 - Experimental DC measurements of the Si-PDs showing (a) dark current along with photocurrent for an optical input of 0 dBm CW light; (b) responsivity; (c) photocurrent of the Si-PD variants with a reverse-bias voltage of 8V [88].

In order to verify the bandwidths of the variants, S-parameter measurements were done using a 50 GHz network analyzer (Agilent PNA-X N5245A). The experimental setup for obtaining the S21 parameter is shown in Figure 6-8(a). The optical CW input is obtained from a Thorlabs 848.2 nm VCSEL source. Figure 6-8(b) shows the S21 results from the four variants. The highest OE bandwidth is obtained, as expected, from the PD with the smallest intrinsic region. The largest intrinsic region also gave the lowest OE bandwidth. Variant 3 has a smaller intrinsic region than variant 2, but a smaller OE bandwidth due to larger diffusion current [91].



Figure 6-8 - (a) Experimental setup for S-parameter measurement; (b) measured S21 OE frequency response of the selected grating-assisted Si-PD variants; (c) resulting bandwidths for all measured devices [88].

Measured eye diagrams for variants 1 to 4 are shown in Figure 6-9(a)-(d) respectively. Data rates of 8 Gb/s and 10 Gb/s for variant 1 are obtained with a PRBS-31 NRZ-OOK pattern at 20 V reverse bias, shown in Figure 6-9(a). Variant 2 results are shown in (b) with data rates of 25 Gb/s, 30 Gb/s and 35 Gb/s with a PRBS-31 NRZ-OOK pattern at 12 V reverse bias. Variant 3

results are shown in (c) with data rates of 25 Gb/s, 30 Gb/s and 35 Gb/s with a PRBS-31 NRZ-OOK pattern at 20 V reverse bias. Variant 4 results are shown in (d) with data rates of 25 Gb/s, 30 Gb/s and 35 Gb/s with a PRBS-31 NRZ-OOK pattern at 14 V reverse bias. This agrees with the OE bandwidth measurements, and the measured SNR follows the trend of the observed responsivity. Thorough comparisons are done in [88] between the proposed method and other photodetectors found in the literature.



(a)



(b)



(c)



(d)

Figure 6-9 - (a) Measured electrical eye-diagrams of Si-PD variant 1 (5 um intrinsic width) with an applied reversebias of 20 V; (b) Measured electrical eye-diagrams of Si-PD variant 2 (2 um intrinsic width) with an applied reversebias of 12 V; (c) Measured eye-diagrams of Si-PD variant 3 (1 um intrinsic width) with an applied reverse-bias of 20 V; (d) Measured eye-diagrams of Si-PD variant 4 (0.3 um intrinsic width) with an applied reverse-bias of 14 V [88].

# **CHAPTER 7 – CONCLUSION**

In this thesis, three main contributions to knowledge in the field were discussed, all dealing with optical communications. First, an investigation into MDM device variation and its effect on crosstalk was presented. Frequency and time domain simulations and experimental results were shown, which helped make conclusions about crosstalk limits and its effect on the transmitted eye. Second, we explored a novel source-synchronous architecture using mode-division multiplexing (MDM), which was shown to reduce the number of required lasers and physical channels. Third, a reconfigurable architecture for use in source-synchronous links using parallel optics was presented. This architecture allows for each receiver to be repurposed, and accept either a clock or data signal. Experimental results for a design in 65 nm CMOS were presented and a detailed discussion of a layout-aware design methodology was given. Fourth, a novel silicon photodetector (Si-PD) was presented, with in-depth explanation of the diode design choices.

## THESIS HIGHLIGHTS

- An Investigation of MDM crosstalk was carried out. Device variation and interconnect length were used as design variables, with frequency domain simulations and experimental results showing how they shape the crosstalk spectrum. Time domain experiments were used to show how crosstalk will affect the transmitted eye. Using measurements, a maximum crosstalk value to maintain a given BER was estimated.
- A novel source-synchronous architecture using mode-division multiplexing (MDM) was proposed. A passive structure was designed and fabricated and experimental results show the architecture to work at 10 Gb/s. It was observed that optical mode crosstalk varies across wavelengths and across SiP chips. We used the measured data to relate crosstalk to

clock jitter, and proposed to put the clock on the mode with the lowest crosstalk for a given wavelength. This resulted in wideband usage of the architecture. We then used time-domain measurements to show how clock jitter is the result of common-mode shifts that are data dependent and caused by co-propagating data signals.

Using observations from the MDM architecture, a CMOS reconfigurable receiver was proposed. This architecture enables each receiver to be repurposed to allow any channel to be a clock or data carrying one. This gives the flexibility to change the clock channel to the optical mode with the least amount of optical crosstalk and thus allows per die optimization. A layout-aware methodology was presented, where we described layout ideas and concepts that influenced the initial schematic design of the second version of the chip.

#### **FUTURE WORK**

This section offers some possible research directions related to the work done in each area so far.

## SOURCE-SYNCHRONOUS ARCHITECTURE USING MDM

Silicon photonics has seen a great increase in popularity since its inception. We have watched optical interconnects make its way from long haul use down to server-to-server applications. With the critical length scale at which it becomes advantageous to use optics continuously reducing, photonic interconnects will continue to have a bright future. The idea of optics penetrating the very-short interconnect market is thus not surprising.

With the introduction of Exascale high-performance computers predicted for 2020 and 2021 from Chinese and American researchers, new methods of connecting chips and cores will become needed. It will be important to investigate the pros and cons of each optical interconnect approach, especially when power efficiency will be an important metric to meet. When

considering that a source-synchronous approach will reduce receiver complexity and an MDM architecture will reduce the number of wavelengths needed, the proposed approach may be a good candidate.

Looking further into the future, several outcomes can be possible for this approach if some key issues are resolved. First, this method is essentially taking a well behaved single-mode interconnect, for example used in a WDM approach, and making it multimode. This can inherently be problematic if the mode crosstalk is not controlled. However, the upside is that there is only a single wavelength needed. Bandwidth density arguments might also allow MDM to be the preferred approach over space-division multiplexing in some applications. Research into signal processing for MDM links can enable longer distances or improved transmission at shorter distances. Another approach may be to improve waveguide design or crosstalk mitigation techniques. Secondly, lasers and packaging may also delay or limit the use of this type of solution at small scales. It is thus critical to incorporate many people with diverse backgrounds to tackle this issue, from mechanical engineers to material scientists.

With respect to the project discussed in this thesis, continuation of this work could be done through an electronic/optic co-packaging exercise. By integrating a MDM interconnect and receiver together, a link can be assembled with minimal optical loss. By closing the link, this can demonstrate a feasible approach to next-generation on-chip MDM optical links. It may also show other unforeseen issues that will require innovative solutions.

A research question that I would like to see answered is whether or not an expression or equivalent circuit can be derived to model crosstalk in many-mode MDM interconnects. This would be very useful in system modelling and circuit simulations.

### **RECONFIGURABLE CMOS CIRCUITS**

In some cases, the manufacturing of photonic components is the bottleneck for performance. The underlying idea of the configurable architecture was to create electronics that can improve photonics. Many photonic techniques aimed at improving yield also require special processing. This can delay optics from being introduced into consumer products. By offering configurable electronics that are co-designed with the optics in mind, the jump from academic project to consumer product can be easier. In turn, this would allow companies to inject money into improving the photonic chip fabrication further. Some situations will require more advanced equalization methods using signal processing to account for high MDM crosstalk. However, minimizing the power intensive operation of equalization using methods described in this thesis is beneficial.

One interesting direction that this work can take is applying this method to other applications using tunable optics. For example, heaters are used to fine tune optical components such as Mach-Zehnder Modulators (MZM), but require constant power to be wasted on this operation. Instead of using heaters to vary optical delay, investigating if an electronic delay can be implemented with higher efficiency to trigger each arm with the required timing offset to achieve an optical phase offset.

## **GRATING-ASSISTED SI-PD**

The all-silicon photodetector is a very interesting device. Once the device was fabricated, I started to look into the possibility of designing the device in an advanced SOI CMOS process. I identified some of the issues that would need to be resolved to successfully do this. Most deal with the thickness of the available silicon layers. In order to create the grating coupler, the incident light would need to see a large enough index difference between a periodic

silicon/silicon-dioxide patterns. This is required if the light is to be successfully redirected from the vertical to the horizontal direction. Once the designer can satisfy this requirement, another challenge is the large amount of metal layers in advanced nodes which increase the oxide thickness. Incident light needs to travel through all these layers before even reaching the silicon layer. Research would need to be done to see how refraction between the different oxide layers would change the direction of applied light by the time it reaches the device in the substrate.

Another research project can deal with converting the multimode VCSEL light into single mode, possibly using mode converters. This would improve efficiency in the grating couplers of the PD since they are based on single mode input.

## REFERENCES

- [1] R. G. Beausoleil, M. McLaren, and N. P. Jouppi, "Photonic Architectures for High-Performance Data Centers," *IEEE J. Sel. Top. Quantum Electron.*, vol. 19, no. 2, Mar. 2013.
- [2] K. T. Settaluri, S. Lin, S. Moazeni, E. Timurdogan, C. Sun, M. Moresco, Z. Su, Y. Chen, G. Leake, D. LaTulipe, C. McDonough, J. Hebding, D. Coolbaugh, M. Watts and V. Stojanovic, "Demonstration of an optical chip-to-chip link in a 3D integrated electronic-photonic platform," *European Solid-State Circuit Conf.*, p. 156–159, Sep. 2015.
- [3] C. Sun, M. Wade, Y. Lee, J. Orcutt, L. Alloatti, M. Georgas, A. Waterman, J. Shainline, R. Avizienis, S. Lin, B. Moss, R. Kumar, F. Pavanello, A. Atabaki, H. Cook, A. Ou, J. Leu, Y. Chen, Y. Chen, K. Asanovic, R. Ram, M. Popovic and V. Stojanovic, "Single-chip microprocessor that communicates directly using light," *Nature*, vol. 528, p. 534–538, Dec. 2015.
- [4] L. C. Kimerling, "Silicon Microphotonics Roadmap Status and Outcomes," *LETI Silicon Photonic Workshop*, Jun. 2013.
- [5] S. J. B. Yoo, "The Role of Photonics in Future Computing and Data Centers," *IEICE Trans. Commun.*, vol. E97.B, no. 7, p. 1272–1280, Jul. 2014.
- [6] C. A. Thraskias, E. Lallas, N. Neumann, L. Schares, B. Offrein, R. Henker, D. Plettemeier, F. Ellinger, J. Leuthold and I. Tomkos, "Survey of Photonic and Plasmonic Interconnect Technologies for Intra-Datacenter and High-Performance Computing Communications," *IEEE Commun. Surv. Tutor.*, vol. 20, no. 4, p. 2758–2783, May 2018.
- [7] L. Chrostowski and M. Hochberg, *Silicon Photonics Design*. Cambridge, 2015.
- [8] G. Chen *et al.*, "Predictions of CMOS compatible on-chip optical interconnect," *Integration*, vol. 40, no. 4, p. 434–446, Jul. 2007.
- [9] W. Bogaerts *et al.*, "Optical Interconnect Technologies based on Silicon Photonics," *MRS Proc.*, vol. 1335, Apr. 2011.
- [10] P. Dong, "Silicon Photonic Integrated Circuits for Wavelength-Division Multiplexing Applications," *IEEE J. Sel. Top. Quantum Electron.*, vol. 22, no. 6, p. 370–378, Nov. 2016.
- [11] D. Dai and J. E. Bowers, "Silicon-based on-chip multiplexing technologies and devices for Peta-bit optical interconnects," *Nanophotonics*, vol. 3, no. 4–5, Jan. 2014.
- [12] X. Wu, C. Huang, K. Xu, C. Shu, and H. K. Tsang, "Mode-Division Multiplexing for Silicon Photonic Network-on-Chip," J. Light. Technol., vol. 35, no. 15, p. 3223–3228, Aug. 2017.
- [13] D. Dai, "Advanced Passive Silicon Photonic Devices With Asymmetric Waveguide Structures," *Proc. IEEE*, vol. 106, no. 12, p. 2117–2143, Dec. 2018.
- [14] C. P. Chen, J. B. Driscoll, R. R. Grote, B. Souhan, R. M. Osgood, and K. Bergman, "Mode and Polarization Multiplexing in a Si Photonic Chip at 40Gb/s Aggregate Data Bandwidth," *IEEE Photonics Technol. Lett.*, vol. 27, no. 1, p. 22–25, Jan. 2015.
- [15] J. Wang, P. Chen, S. Chen, Y. Shi, and D. Dai, "Improved 8-channel silicon mode demultiplexer with grating polarizers," *Opt. Express*, vol. 22, no. 11, p. 12799–12807, Jun. 2014.
- [16] C. Williams, B. Banan, G. Cowan, and O. Liboiron-Ladouceur, "Source-Synchronous Optical Link Using Mode-Division Multiplexing," *IEEE Group IV Photonics Conference*, Aug. 2015.

- [17] M. Hossain and A. Chan Carusone, "7.4 Gb/s 6.8 mW Source Synchronous Receiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, p. 1337–1348, Jun. 2011.
- [18] Y.-H. Song, R. Bai, K. Hu, H.-W. Yang, P. Y. Chiang, and S. Palermo, "A 0.47 0.66 pJ/bit, 4.8-8 Gb/s I/O Transceiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, p. 1276–1289, May 2013.
- [19] S. Choi et al., "A 0.65-to-10.5 Gb/s Reference-Less CDR With Asynchronous Baud-Rate Sampling for Frequency Acquisition and Adaptive Equalization," *IEEE Trans. Circuits Syst. Regul. Pap.*, vol. 63, no. 2, p. 276–287, Feb. 2016.
- [20] T. Toifl et al., "A 72 mW 0.03mm<sup>2</sup> Inductorless 40 Gb/s CDR in 65 nm SOI CMOS," IEEE Solid-State Circuits Conf., pp. 226–598, Jun 2007.
- [21] L. Rodoni, G. von Buren, A. Huber, M. Schmatz, and H. Jackel, "A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm Bulk CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 7, p. 1927–1941, Jul. 2009.
- [22] S.-H. Chung and L.-S. Kim, "A 9.6-Gb/s 1.22-mW/Gb/s Data-Jitter Mixing Forwarded-Clock Receiver in 65-nm CMOS," *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 23, no. 10, p. 2023–2033, Oct. 2015.
- [23] Y.-J. Kim and L.-S. Kim, "A 12 Gb/s 0.92 mW/Gb/s forwarded clock receiver based on ILO with 60MHz jitter tracking bandwidth variation using duty cycle adjuster in 65 nm CMOS," Symp. on VLSI Circuits, Aug. 2013.
- [24] K. Yu et al., "A 25 Gb/s Hybrid-Integrated Silicon Photonic Source-Synchronous Receiver With Microring Wavelength Stabilization," *IEEE J. Solid-State Circuits*, vol. 51, no. 9, p. 2129–2141, Sep. 2016.
- [25] M. Raj, S. Saeedi, and A. Emami, "A 4-to-11GHz injection-locked quarter-rate clocking for an adaptive 153fJ/b optical receiver in 28 nm FDSOI CMOS," *IEEE International Solid-State Circuits Conf.*, Mar. 2015.
- [26] C. Williams, B. Banan, G. Cowan, and O. Liboiron-Ladouceur, "A Source-Synchronous Architecture Using Mode-Division Multiplexing for On-Chip Silicon Photonic Interconnects," *IEEE J. Sel. Top. Quantum Electron.*, vol. 22, no. 6, p. 473–481, Nov. 2016.
- [27] Y. Shen and W. Gu, "Coherent and Incoherent Crosstalk in WDM Optical Networks," J. Light. Technol., vol. 17, no. 5, p. 759–764, May 1999.
- [28] J. F. Buckwalter and A. Hajimiri, "Analysis and Equalization of Data-Dependent Jitter," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, p. 607–620, Mar. 2006.
- [29] Y. Ding, J. Xu, F. Da Ros, B. Huang, H. Ou, and C. Peucheret, "On-chip two-mode division multiplexing using tapered directional coupler-based mode multiplexer and demultiplexer," *Opt. Express*, vol. 21, no. 8, p. 10376–10382, Apr. 2013.
- [30] M. Nikdast, G. Nicolescu, J. Trajkovic, and O. Liboiron-Ladouceur, "Chip-Scale Silicon Photonic Interconnects: A Formal Study on Fabrication Non-Uniformity," J. Light. Technol., vol. 34, no. 16, p. 3682–3695, Aug. 2016.
- [31] C. Williams, B. Banan, G. Cowan, and O. Liboiron-Ladouceur, "Demonstration of Mode-Division Multiplexing for on-chip source-synchronous communications," *Asia Comm. and Photonics Conference*, Nov. 2015.
- [32] P. Huapu *et al.*, "40 Gbps optical receiver based on Germanium waveguide photodetector hybrid-integrated with 90 nm CMOS amplifier," *Conf. on Lasers and Electro-Optics*, Oct. 2012.
- [33] F. Tavernier and M. Steyaert, "A 5.5 Gbit/s optical receiver in 130 nm CMOS with speedenhanced integrated photodiode," ESSCIRC proc., p. 542–545, Nov. 2010.

- [34] X. Wang *et al.*, "Lithography simulation for the fabrication of silicon photonic devices with deep-ultraviolet lithography," *Group IV Photonics Conf.*, p. 288–290, Oct. 2012.
- [35] D. Marcuse, *Theory of dielectric optical waveguides*, 2nd ed. Boston: Academic Press, 1991.
- [36] A. Waqas, D. Melati, and A. Melloni, "Sensitivity Analysis and Uncertainty Mitigation of Photonic Integrated Circuits," J. Light. Technol., vol. 35, no. 17, p. 3713–3721, Sep. 2017.
- [37] M. A. Taubenblatt, "Optical Interconnects for High-Performance Computing," J. Light. Technol., vol. 30, no. 4, p. 448–457, Feb. 2012.
- [38] J. A. Tatum *et al.*, "VCSEL-Based Interconnects for Current and Future Data Centers," *J. Light. Technol.*, vol. 33, no. 4, p. 727–732, Feb. 2015.
- [39] C. Williams, G. Zhang, R. Priti, G. Cowan, and O. Liboiron-Ladouceur, "Modal Crosstalk in Silicon Photonic Multimode Interconnects," *Submitted to Opt. Express J.*, Jun. 2019.
- [40] C. Williams, D. Abdelrahman, X. Jia, A. I. Abbas, O. Liboiron-Ladouceur, and G. E. R. Cowan, "Reconfiguration in Source-Synchronous Receivers for Short-Reach Parallel Optical Links," *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 27, issue 7, Jul. 2019.
- [41] Shyh-Chyi Wong, Gwo-Yann Lee, and Dye-Jyun Ma, "Modeling of interconnect capacitance, delay, and crosstalk in VLSI," *IEEE Trans. Semicond. Manuf.*, vol. 13, no. 1, p. 108–111, Feb. 2000.
- [42] S. Rakheja and V. Kumar, "Comparison of electrical, optical and plasmonic on-chip interconnects based on delay and energy considerations," *Int. Conf. Qual. Electron. Des.*, pp. 732–739, Mar. 2012.
- [43] T. C. Carusone, "Introduction to Digital I/O," IEEE Solid-State Circuits Magazine, vol. 7, no. 4, 2015.
- [44] D. M. Cortes-Hernandez, R. Torres-Torres, O. Gonzalez-Diaz, and M. Linares-Aranda, "Experimental Characterization of Frequency-Dependent Series Resistance and Inductance for Ground Shielded On-Chip Interconnects," *IEEE Trans. Electromagn. Compat.*, vol. 56, no. 6, p. 1567–1575, Dec. 2014.
- [45] Kai Kang et al., "A Wideband Scalable and SPICE-Compatible Model for On-Chip Interconnects Up to 110 GHz," *IEEE Trans. Microw. Theory Tech.*, vol. 56, no. 4, p. 942– 951, Apr. 2008.
- [46] F. Yuan, A. R. AL-Taee, A. Ye, and S. Sadr, "Design techniques for decision feedback equalisation of multi-giga-bit-per-second serial data links: a state-of-the-art review," *IET Circuits Devices Syst.*, vol. 8, no. 2, p. 118–130, Mar. 2014.
- [47] Guoqing Chen *et al.*, "Electrical and Optical On-Chip Interconnects in Scaled Microprocessors," IEEE Symp. on Circuits and Systems, pp. 2514–2517, Jul. 2005.
- [48] M. H. Nazari and A. Emami-Neyestanak, "A 20 Gb/s 136 fJ/b 12.5 Gb/s/um on-chip link in 28 nm CMOS," *IEEE Radio Freq. Integrated Circuits Symp.*, p. 257–260, Jun. 2013.
- [49] J. F. Bulzacchelli et al., "A 10-Gb/s 5-Tap DFE/4-Tap FFE Transceiver in 90-nm CMOS Technology," IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2885–2900, Dec. 2006.
- [50] A. Agrawal, J. F. Bulzacchelli, T. O. Dickson, Y. Liu, J. A. Tierno, and D. J. Friedman, "A 19-Gb/s Serial Link Receiver With Both 4-Tap FFE and 5-Tap DFE Functions in 45-nm SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, p. 3220–3231, Dec. 2012.
- [51] L.-W. Luo *et al.*, "WDM-compatible mode-division multiplexing on a silicon chip," *Nat. Commun.*, vol. 5, Jan. 2014.
- [52] B. Casper, "Clocking Wireline Systems," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 4, Nov. 2015.

- [53] M. Hossain and A. Chan Carusone, "7.4 Gb/s 6.8 mW Source Synchronous Receiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, p. 1337–1348, Jun. 2011.
- [54] A. Ragab, Y. Liu, K. Hu, P. Chiang, and S. Palermo, "Receiver Jitter Tracking Characteristics in High-Speed Source Synchronous Links," J. Electr. Comput. Eng., vol. 2011, Jun. 2011.
- [55] R. Daendliker, "Concept of modes in optics and photonics," *Conf. on Educational Training in Optics and Photonics*, Jun. 2000.
- [56] L. H. Gabrielli, D. Liu, S. G. Johnson, and M. Lipson, "On-chip transformation optics for multimode waveguide bends," *Nat. Commun.*, vol. 3, p. 1217, Nov. 2012.
- [57] F. Tavernier and M. Steyaert, *High-Speed Optical Receivers with Integrated Photodiode in Nanoscale CMOS.* New York, NY: Springer New York, 2011.
- [58] R. S. Muller, T. I. Kamins, and M. Chan, *Device electronics for integrated circuits*. New York, NY: John Wiley & Sons, 2003.
- [59] A. C. Carusone, H. Yasotharan, and T. Kao, "CMOS Technology Scaling Considerations for Multi-Gbps Optical Receivers With Integrated Photodetectors," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, p. 1832–1842, Aug. 2011.
- [60] W.-Z. Chen *et al.*, "A 3.125 Gbps CMOS fully integrated optical receiver with adaptive analog equalizer," *Asian Solid-State Circuits Conf.*, p. 396–399, Jan, 2008.
- [61] S.-H. Huang and W.-Z. Chen, "A 20-Gb/s optical receiver with integrated photo detector in 40-nm CMOS," Asian Solid-State Circuits Conf., p. 225–228, Jun. 2013.
- [62] S. M. Csutak, S. Dakshina-Murthy, and J. C. Campbell, "CMOS-compatible planar silicon waveguide-grating-coupler photodetectors fabricated on silicon-on-insulator (SOI) substrates," *IEEE J. Quantum Electron.*, vol. 38, no. 5, p. 477–480, May 2002.
- [63] K.-P. Ho and J. M. Kahn, *Mode Coupling and its Impact on Spatially Multiplexed Systems*, Optical Fiber Telecommunications, Elsevier, p. 491–568, 2013.
- [64] C. Sun, Y. Yu, M. Ye, G. Chen, and X. Zhang, "An ultra-low crosstalk and broadband twomode (de)multiplexer based on adiabatic couplers," *Sci. Rep.*, vol. 6, no. 1, Dec. 2016.
- [65] T. Gyselings, G. Morthier, and R. Baets, "Crosstalk analysis of multiwavelength optical cross connects," J. Light. Technol., vol. 17, no. 8, p. 1273–1283, Aug. 1999.
- [66] G. P. Agrawal, *Lightwave technology: telecommunication systems*. Hoboken, N.J: Wiley-Interscience, 2010.
- [67] J. D. Downie, "Relationship of Q penalty to eye-closure penalty for NRZ and RZ signals with signal-dependent noise," *J. Light. Technol.*, vol. 23, no. 6, p. 2031–2038, Jun. 2005.
- [68] M. Mansuri et al., "A Scalable 0.128-1 Tb/s, 0.8-2.6 pJ/bit, 64-Lane Parallel I/O in 32-nm CMOS," IEEE J. Solid-State Circuits, vol. 48, no. 12, p. 3229–3242, Dec. 2013.
- [69] O. Liboiron-Ladouceur, C. Gray, D. C. Keezer, and K. Bergman, "Bit-parallel message exchange and data recovery in optical packet switched interconnection networks," *IEEE Photonics Technol. Lett.*, vol. 18, no. 6, p. 779–781, Mar. 2006.
- [70] K. Yu *et al.*, "A 24 Gb/s 0.71 pJ/b Si-photonic source-synchronous receiver with adaptive equalization and microring wavelength stabilization," *IEEE Solid-State Circuits Conf.*, Feb. 2015.
- [71] D. Dai, "Silicon mode-(de)multiplexer for a hybrid multiplexing system to achieve ultrahigh capacity photonic networks-on-chip with a single-wavelength-carrier light," *Asia Commun. Photonics Conf.*, p. Nov. 2012.
- [72] M. Streshinsky *et al.*, "A compact bi-wavelength polarization splitting grating coupler fabricated in a 220 nm SOI platform," *Opt. Express*, vol. 21, no. 25, p. 31019, Dec. 2013.

- [73] K. Hu, T. Jiang, J. Wang, F. O'Mahony, and P. Y. Chiang, "A 0.6 mW/Gb/s, 6.4-7.2 Gb/s Serial Link Receiver Using Local Injection-Locked Ring Oscillators in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, p. 899–908, Apr. 2010.
- [74] M. Georgas, J. Orcutt, R. J. Ram, and V. Stojanovic, "A Monolithically-Integrated Optical Receiver in Standard 45-nm SOI," *IEEE J. Solid-State Circuits*, vol. 47, no. 7, p. 1693– 1702, Jul. 2012.
- [75] N. Rouger, L. Chrostowski, and R. Vafaei, "Temperature Effects on Silicon-on-Insulator (SOI) Racetrack Resonators: A Coupled Analytic and 2-D Finite Difference Approach," J. Light. Technol., vol. 28, no. 9, p. 1380–1391, May 2010.
- [76] C. Z. Tan and J. Arndt, "Temperature dependence of refractive index of glassy SiO2 in the infrared wavelength range," J. Phys. Chem. Solids, vol. 61, no. 8, p. 1315–1320, Aug. 2000.
- [77] B. Razavi, Design of Integrated Circuits for Optical Communications, 2nd ed. Wiley.
- [78] D. Li et al., "A Low-Noise Design Technique for High-Speed CMOS Optical Receivers," IEEE J. Solid-State Circuits, vol. 49, no. 6, p. 1437–1447, Jun. 2014.
- [79] S. Ray and M. M. Hella, "A 53 dB-Ohm 7 GHz Inductorless Transimpedance Amplifier and a 1 THz+ GBP Limiting Amplifier in 0.13 um CMOS," *IEEE Trans. Circuits Syst. Regul. Pap.*, vol. 65, no. 8, p. 2365–2377, Aug. 2018.
- [80] Jieh-Tsorng Wu and B. A. Wooley, "A 100-MHz pipelined CMOS comparator," IEEE J. Solid-State Circuits, vol. 23, no. 6, p. 1379–1385, Dec. 1988.
- [81] S. Chen, Q. Ao, H. Jing, X. Shi, S. Meng, and X. Chen, "An 8 Gb/s 0.75 mW/Gb/s injection-locked receiver with constant jitter tracking bandwidth and accurate quadrature clock generation in 40 nm CMOS," International Conf. on Electronics, Circuits and Systems, Feb. 2015.
- [82] D. Lee, Y.-H. Kim, D. Lee, and L.-S. Kim, "A 0.65-V, 11.2-Gb/s Power Noise Tolerant Source-Synchronous Injection-Locked Receiver With Direct DTLB DFE," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 65, no. 11, p. 1564–1568, Nov. 2018.
- [83] T. Toifl et al., "A 22-Gb/s PAM-4 Receiver in 90-nm CMOS SOI Technology," IEEE J. Solid-State Circuits, vol. 41, no. 4, p. 954–965, Apr. 2006.
- [84] M. Hossain and A. C. Carusone, "CMOS Oscillators for Clock Distribution and Injection-Locked Deskew," *IEEE J. Solid-State Circuits*, vol. 44, no. 8, p. 2138–2153, Aug. 2009.
- [85] S. Galal and B. Razavi, "10-Gb/s limiting amplifier and laser/modulator driver in 0.18-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, p. 2138–2146, Dec. 2003.
- [86] J. Proesel, C. Schow, and A. Rylyakov, "25 Gb/s 3.6 pJ/b and 15 Gb/s 1.37 pJ/b VCSELbased optical links in 90 nm CMOS," in 2012 IEEE International Solid-State Circuits Conference, p. 418–420, Apr. 2012.
- [87] T. C. Carusone, D. Johns, K. W. Martin, and D. Johns, *Analog integrated circuit design*, 2nd ed. Hoboken, NJ: John Wiley & Sons, 2012.
- [88] M. M. Pour Fard, C. Williams, G. Cowan, and O. Liboiron-Ladouceur, "High-speed grating-assisted all-silicon photodetectors for 850 nm applications," *Opt. Express*, vol. 25, no. 5, Mar. 2017.
- [89] C. Williams, M. M. P. Fard, G. Cowan, and O. Liboiron-Ladouceur, "An All-Silicon Photodetector for 850 nm Wavelength Applications," in *Advanced Photonics 2018* Zurich, p. IW1B.1, Jul. 2018.
- [90] M. M. P. Fard, C. Williams, G. Cowan, and O. Liboiron-Ladouceur, "A 35 Gb/s silicon photodetector for 850 nm wavelength applications," *IEEE Photonics Conf.*, Jan. 2017.

[91] M. M. P. Fard, "Towards Design of Power-Effcient and Cost-Effective Optical Interconnect for Short-Reach Applications," McGill University Thesis, 2017.

In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Concordia Univeristy's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to:

http://www.ieee.org/publications\_standards/publications/rights/rights\_link.html

to learn how to obtain a License from RightsLink. If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation.