#### INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, MI 48106-1346 USA 800-521-0600 ## Design of High Performance Multiple-Input Pass-Transistor-Logic XOR Circuits Yuling Yang A Thesis in the Department of Electrical and Computer Engineering Presented in Partial Fulfilment of the Requirements for the Degree of Master of Applied Science at Concordia University Montreal. Quebec. Canada January 2003 © Yuling Yang, 2003 National Library of Canada Acquisitions and Bibliographic Services 395 Wellington Street Ottawa ON K1A 0N4 Canada Bibliothèque nationale du Canada Acquisitions et services bibliographiques 395, rue Wellington Ottawa ON K1A 0N4 Canada Your file Votre nilárence Our file Notes nildeance The author has granted a nonexclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats. The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique. L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation. 0-612-77726-X **Canadä** #### **ABSTRACT** ## Design of High performance Multiple-Input Pass-Transistor-Logic XOR Circuits Yuling Yang XOR gates are basic building blocks in the design of almost all kinds of digital circuits for signal processing, generation and control. The performance of the XOR gates can be a factor determining the performance of the complete circuits. In particular, XOR gates with a large number of inputs, which are often used for parallel processing, may make a major contribution to the delay of the circuits. Some reduction of the delay can be achieved but usually at the expense of the power dissipation. In this thesis, a comprehensive study of the XOR gate design is presented. Based on the study, an approach of designing multiple-input pass-transistor-logic XOR gates is proposed. This approach consists of two aspects. On one hand, the imperfect high-voltage-levels, resulting from the imperfect voltage transmission by single-MOS switches, at some selected intermediated circuit nodes are used to reduce the signal swing so that the power dissipation can be minimized. On the other hand, the voltages at some circuit nodes are compensated to maximize the current capacity in order to reduce the delay. Thus, the XOR gates can be designed to have a significant higher performance, in terms of speed, than other pass-transistor-logic XOR gates and the power dissipation of the circuits can also be lowered. An example of application of this design approach is also presented in this thesis. ### Acknowledgments The present work has benefited from a number of people who have provided their important help and support. My first and earnest acknowledgment must go to my supervisor. Professor Chunyan Wang, for all her enthusiasm, inspiration, and continuous support through my study at Concordia University. She is acknowledged, in particular, for her substantial dedication of time and energy in helping me in the experiments and thesis writing throughout the work of my thesis. She provided encouragement, good teaching, fruitful ideas, and invaluable suggestions. I have learned various things, from her, not only knowledge, but also new ways of thinking and working. I will forever remain indebted to her for being my mentor and advisor. Thanks to all at VLSI laboratory for providing such a friendly and supportive environment. I am grateful to my friends David Claveau. Bin Qiu. Jian Zhang and Yi Yang etc., for their help with all types of research and technical problems. The most heartfelt thank-you goes to my wonderful parents. For always being there when I needed them most. It was their love and trust that made my work worthwhile. They deserve far from more credit than I can even give them. Last, but no means last, my deepest thanks must go to my husband Jianlong Lin, for his eternal support and enthusiastic encouragement. Thanks for being part of my life, and thanks for many years unfailing loving me and push me to keep balance. ## **Table of Contents** | List of Figures | vii | |----------------------------------------------|-----| | List of Tables | X | | List of Acronyms | xii | | List of Primary Symbols | xii | | | | | I. Introduction | 1 | | I-1 Introduction | 1 | | I-2 Perspective and Motivation. | 2 | | I-3 Objective of the Thesis | 3 | | I-4 Brief Presentation of the Thesis | 4 | | | | | II. Basic Building Block - XOR Circuits | 5 | | II-1 Introduction | 5 | | II-2 Specifications of Digital Gates | 6 | | 2-2-1 Delay | 6 | | 2-2-2 Power Dissipation | 6 | | 2-2-3 Input Resistance - Fan in and Fan out | 7 | | 2-2-4 Output Voltage Level | 7 | | 2-2-5 Size | 7 | | II-3 Two-Input XOR Cell | 8 | | 2-3-1 Conventional CMOS XOR | 8 | | 2-3-2 Pass-Transistor Logic XOR | 9 | | 2-2-3 Mixed Logic XOR Circuits | 10 | | II-4 Level Recovering Techniques | 12 | | 2-4-1 Partial CMOS Switch XOR Cell | 12 | | 2-4-2 Level Recovering Inverter | 13 | | 2-4-3 Single Pull-Up PMOS | 15 | | 2-4-4 Self-Cross Pull-Up Pull-Down Structure | 18 | | 2-4-5 Cross-Coupled Pull-Lip PMOS | 20 | | II-5 Conclusion | 22 | |-------------------------------------------------------------------------------------|-----------| | III. Multiple-Input XOR Circuits | 24 | | III-1 Introduction | | | III-2 Structure of Multiple-Input XOR | | | 3-2-1 Chain Structure | | | 3-2-2 Tree Structure | | | 3-2-3 Mixed Structure | | | III-3 An Approach to the Improvement of Speed and Power Dissipation | 31 | | 3-3-1 Description of the Proposed Approach | 31 | | 3-3-2 Different Schemes of the XOR Gates with Level Restoring | 36 | | III-4 Analysis of the Pass Transistor-Logic XOR Gates with a Large Number | - | | 3-4-1 Analysis of the Delay Caused by the Change of the Input Applied sion Terminal | 43 | | Terminal | | | 3-4-3 Effect of the Pull-Up Transistors | 50 | | 3-4-4 Summary of Section 3-4 | 54 | | III-5 Simulation of the Multiple-Input XOR Gates | 54 | | 3-5-1 Simulation of the Four Four-Input XOR Gates | 55 | | 3-5-2 Simulation Results and Comparison | 58 | | 3-5-3 Simulation of XOR Gates with a Large Number of Inputs | 61 | | III-6 Conclusion | 63 | | IV. An Application of Multiple-Input XOR Gates - Design of a Hamming | g Decoder | | *************************************** | | | IV-1 Introduction | | | IV-2 Hamming Decoder Logic Diagram | | | IV-3 Approach of the Evaluation of the Operation Speed | | | IV-4 Simulation of the Critical-Path of the Hamming Decoder | 71 | | Ribliography | 86 | |----------------------------------------------------------------------------|-------------| | VI. Conclusion | 82 | | IV-6 Conclusion | 80 | | 4-5-3 Realization of the Oscillation-Test Circuit | 77 | | 4-5-2 Estimation of Signals Generated by the Test Circuits | 76 | | Method | 74 | | 4-5-1 Design approach for the Evaluation of the Operation Speed by Oscill | lation-Test | | IV-5 Design of a Prototype Circuit for Estimation of the Propagation Delay | 74 | | 4-2-2 Results of the Comparison | 72 | | 4-4-1 Test Circuits and Test Conditions | 71 | # **List of Figures** | Figure 1.1 | A typical data transmission or storage system with error control process | 2 | |-------------|--------------------------------------------------------------------------|---------------| | Figure 2.1 | Example of the conventional CMOS XOR circuit | 8 | | Figure 2.2 | Basic scheme of two-input pass-transistor-logic XOR cell | 9 | | Figure 2.3 | Pass-transistor-logic XOR cell, using complementary switches | 9 | | Figure 2.4 | Cross-coupled XOR and XNOR cell | 11 | | Figure 2.5 | Pseudo-NMOS and cross-coupled pass-transistor XOR cell | 12 | | Figure 2.6 | One modified version of cross-coupled XNOR cell | 13 | | Figure 2.7 | Level recovering inverter and its operating points | 14 | | Figure 2.8 | Single pull-up PMOS circuit | 16 | | Figure 2.9 | Simulation results | 17 | | Figure 2.10 | Self-cross pull-up XOR-XNOR cell | 18 | | Figure 2.11 | Example of a troubled dynamic operation of the circuit shown in Figure 2 | 2.10 | | | | 19 | | Figure 2.12 | NMOS output $V_F$ Voltage level problem solved by pull-up PMOS using | $V_{\bar{F}}$ | | | | 20 | | Figure 2.13 | Pull-up transistor works in this circuit | 21 | | Figure 2.14 | Cross-coupled pull-up PMOS circuit | 21 | | Figure 2.15 | Cross-coupled pull-down NMOS circuit | 22 | | Figure 3.1 | Multiple-input XOR with a chain structure of two-input XOR gates | 25 | | Figure 3.2 | Two-input pass-transistor-logic XOR gates arranged in a chain structure. | 26 | | Figure 3.3 | Tree structure | 27 | | Figure 3.4 | Level problem when two-input pass-transistor-logic XOR cell used in the | tree | | | structure | 28 | | Figure 3.5 | Example of mixed structure of XOR gate | 30 | | Figure 3.6 | Multiple-input XOR circuit using the proposed approach | 33 | | Figure 3.7 | Using inverters as the level-restoring cell | 36 | | Figure 3.8 | PMOS and NMOS tree structure. PMOS pass-transistor-logic cells are us | sed | | | to insure a "good-1" voltage level of the signals at the MOS gates | 37 | | Figure 3.9 | Short-circuit current in PMOS and NMOS tree structure circuit38 | |-------------|------------------------------------------------------------------------------------------| | Figure 3.10 | Improving the performance of XOR circuit of tree structure by using pull-up | | | PMOS transistors at gate nodes | | Figure 3.11 | Charging and discharging processes with and without pull-up PMOS40 | | Figure 3.12 | Pass-logic XOR cell42 | | Figure 3.13 | MOS pass transistors connecting one of the diffusion input node and the out- | | | put node, when the voltage change occurs at the diffusion input while the gate | | | voltages of the pass transistors are VDD43 | | Figure 3.14 | Equivalent RC-network for the transistor chain44 | | Figure 3.15 | First transistor in the pass transistor chain | | Figure 3.16 | Modified model of the pass-logic-transistor chain45 | | Figure 3.17 | Computation results using the models of the classical RC-network, modified | | | model with the current-source, respectively, and the Hspice simulation results | | | of the gate circuit46 | | Figure 3.18 | Voltage change at gate terminal | | Figure 3.19 | Current $i_2$ generated by a changing gate voltage $V_{G2}$ and another current $i_1$ by | | | a changing source (drain) voltage $V_I$ . On average, $i_2$ is weaker than $i_I$ during | | | the transition49 | | Figure 3.20 | Finite rise/fall time of the gate signals resulting in short-circuit currents50 | | Figure 3.21 | Discharging and charging process for the complementary output51 | | Figure 3.22 | Scheme (b): Pass-transistor-logic XOR gates of tree structure without pull-up | | | PMOS transistors | | Figure 3.23 | Scheme (c): XOR gate consisting of PMOS and NMOS pass-transistor-logic | | | cells | | Figure 3.24 | Scheme (d): XOR gates consisting of NMOS pass-transistor-logic cells with | | | pull-up PMOS transistors used to restore the level of the gate voltages56 | | Figure 3.25 | Test circuit57 | | Figure 3.26 | The variation of the input signals corresponding to the worst case delay. For | | | each of the cells, the inputs applied at the gates of the MOS pass transistors | | | are indicated by thick lines | | Figure 3.27 | Currents in the transistors of the load inverter in cases of Scheme (b) and | | | Scheme (c) in Figure 3.23 and Figure 3.2460 | |-------------|------------------------------------------------------------------------------------| | Figure 3.28 | Delay comparison of the three XOR gates62 | | Figure 4.1 | Parity check matrix of the (72, 64) SEC-DED code67 | | Figure 4.2 | Logic diagram of a (64, 72) Hamming decoder | | Figure 4.3 | Critical path of the Hamming decoder for testing the operation speed of the | | | circuit | | Figure 4.4 | Simplified critical path of the Hamming decoder. U1, U2, U4 are four-input | | | XOR gates. U3, U5 are two-input XOR gates. U6 is a CMOS two-input AND | | | gate. U7 is a multiplexer70 | | Figure 4.5 | Part of the signals applied to the circuit shown in Figure 4.4 for the simulation | | | and the expected responses at the MOS gate nodes of the circuit, as well as | | | that at the output node | | Figure 4.6 | Digital oscillation-test method for delay test | | Figure 4.7 | Critical-path block works as an inverter76 | | Figure 4.8 | Structure of the oscillation-test circuit with the controls signals. The detail of | | | each block is shown in Figure 4.7. The control signals are as follows78 | # **List of Tables** | Table 3.1 Simulation result with 1.8V power supply | 58 | |-------------------------------------------------------------------------|----| | Table 4.1 Simulation result of the critical-Path circuit of the decoder | | | Table 4.2 Cycle-time Estimation of the test circuit | 77 | ### **List of Acronyms** ASIC Application Specific Integrated Circuit CMOS Complementary Metal-Oxide Semiconductor ECC Error Correcting Circuit EDC Error Detection and Correction DCVS Differential Cascade Voltage Switch DFF D-type Flip Flop GND Ground IC Integrated Circuit IEEE Institute of Electrical and Electronic Engineers MOS Metal-Oxide Semiconductor MOSFET Metal-Oxide Semiconductor Field-Effect Transistors NMOS N-channel Metal-Oxide Semiconductor PMOS P-channel Metal-Oxide Semiconductor PTL Pass Transistor Logic SEC-DED Single-Error-Correction Double-Error-Detection SOP Sum-of-Products VLSI Very Large Scale Integration # **List of Primary Symbols** | α | Size scale factor in tapered buffers | |-------------------|-------------------------------------------------------------------------------------------------------| | $\alpha_k$ | Coefficient describing the switching activities at the $k^{th}$ node | | $C_k$ | Capacitance of the $k^{th}$ node | | $f_{clk}$ | Clock frequency of a signal | | GND | Ground | | i(V) | Voltage control current source | | i <sub>scn</sub> | Node short-circuit current | | $i_N$ | Current of the $N^{th}$ path | | $I_{avg}$ | Average current | | $I_{\mathcal{D}}$ | Drain current of a MOSFET | | k | Number of circuit stages | | $K_n$ | Trans-conductance coefficient | | n | Number of inputs of a multiple-input XOR circuit | | N | Number of layers of circuits | | $O_{k-i}$ | Intermediate or output node in a tree structure XOR circuit, the $k^{th}$ layer and the $i^{th}$ node | | $P_d$ | Power dissipation | | $P_{dyn}$ | Dynamic power dissipation | | $R_{j}$ | Resistance of the $j^{th}$ resistor in an RC-network | | τ | Time constant of an RC-network | | $\tau_0$ | Delay of a minimum-sized gate | | $T_{ck}$ | System clock period | | $t_d$ | Delay of a basic cell | Total delay of an *n*-input XOR $T_d$ Fall time t<sub>fall</sub> Rise time of an input signal $t_{r_in}$ Rise time of an output signal $t_{r\_out}$ Rise time trise Propagation delay of a combinational circuit $t_{pd}$ Propagation delay time, HIGH-to-LOW-level output $t_{pHL}$ Propagation delay time. LOW-to-HIGH-level output t<sub>pLH</sub> Delay of a flip-flop $t_q$ $V_D$ Voltage of the drain terminal of a MOSFET Positive power supply $V_{DD}$ Voltage of the gate terminal of a MOSFET $V_{G}$ $V_{i}$ Voltage of the *i*<sup>th</sup> node High-level output voltage $V_{OH}$ Low-level output voltage $V_{OL}$ Voltage of the source terminal of a MOSFET $V_{s}$ Negative power supply $V_{SS}$ Zero-bias threshold voltage $V_{t0}$ Threshold voltage of PMOS $V_{tp}$ Zero-bias threshold voltage of PMOS $V_{tp0}$ Threshold voltage of NMOS $V_{tn}$ Zero-bias threshold voltage of NMOS $V_{tn0}$ W/L Width/Length of a MOS gate *i*<sup>th</sup> input $X_i$ # Chapter 1 ## Introduction #### I-1 Introduction In the design of integrated circuits, high-speed, low-power dissipation, and small size are often the essential objectives to be achieved. A higher operation speed is always needed to meet the increasing demand for more capacity of signal processing and communication, whereas minimizing the overall power dissipation and the size of a system has become an important concern while the scale of the integration of devices is increasing. In order to improve the circuit performance in terms of speed and power dissipation, we have been working on the design and implementation of Very Large Scale Integration (VLSI) circuits in the transistor level, and the emphasis is on the problems in the basic aspects of the circuits. #### I-2 Perspective and Motivation The work of this thesis aims at improving the performance of digital systems involving XOR gate. Needless to mention its roles in ALUs. XOR gate is a fundamental unit in all kinds of digital circuits for signal processing, computation and control. XOR gate is also widely used for communications and, in particular, they are basic building blocks in error detection and correction systems, e.g. that for Error-Correcting Circuit (ECC) in data transmission or storage systems as shown in Figure 1.1 [44]. The research on the XOR gates becomes extremely valuable when the major parts of the digital circuits are composed of XOR gates. For example, the number of the XOR gates is over three fourth of the total number of the gates used in an Hamming decoder for ECC. About 90% of the delay of a Hamming decoder is usually contributed by the XOR gates employed [2][3][6]. In other words, an improvement of the XOR gates can result in a considerably better performance of the circuit. Figure 1.1 Typical data transmission or storage system with error control process. Besides the error coding circuits. XOR gate are also frequently used in other kinds of digital logic circuits. such as equality (or inequality) comparator, pattern generator, and so on. Some circuits like the ones to realize Exclusive-OR Sum-of-Products (SOP) expressions, e.g., Reed-Muller canonical circuits, which are widely used in logic synthesis [28], are completely composed of XOR gates. It is needless to say how important it is to design a high-speed and low-power XOR gate in these kinds of digital circuits. #### I-3 Objective of the Thesis Since XOR function has some special features, it can be implemented more easily and efficiently in other kinds of logic gates than the complementary CMOS ones [19][21][22][45]. As a result, the design and implementation of XOR gates is very diversified. Therefore, the first objective of the thesis work is to investigate the performance of different kinds of XOR circuits in terms of robustness, power dissipation and delay. From the results of the investigation, it can be seen that some measures have been taken to achieve some improvements in these circuits while some new problems emerge. Based on the analyses of previous work, the second objective of the thesis is to design XOR circuits with better performance without losing the robustness of the circuit operations. It should be noted that in this thesis, the main focus is on the design of XOR gates with a large number of inputs applied in signal processing, communication and storage systems, such as the one applied in an error correcting circuit, a Hamming decoder. Hence, the third and the most important objective of this thesis is to propose a design approach to improve the performance in terms of speed and power dissipation of the XOR gates that have a large number of inputs. These improvements should be verified by evaluating the performance of a prototype circuit, such as a Hamming decoder in which XOR gates are designed using the proposed approach. #### I-4 Brief Presentation of the Thesis The thesis is divided into five Chapters including this introduction. In Chapter 2, previous work on the design and implementations of varieties of XOR gates is described. Most of them are two-input ones. In particular, a comprehensive study of level restoring techniques used in non-conventional CMOS designs is presented. The effectiveness of these techniques is evaluated. In Chapter 3, different structures of XOR circuits with a large number of inputs are presented. The speed and power dissipation of each of the structures are described. The emphasis of the work is on the design and analysis of Pass-Transistor-Logic (PTL) XOR circuits. An approach to improving the performance of the XOR gates with a large number of inputs are proposed. Chapter 4 is dedicated to a circuit example. Hamming decoder, for an assessment of the XOR gates in a processing circuit. The role of XOR gates in the operation of the Hamming decoder is described. An approach of the evaluation of the circuit speed is presented and a related test circuit is proposed Some important issues in the implementations of the test circuit are discussed. Chapter 5 summaries the work of the thesis and describes briefly the future research on the design and implementation of pass-transistor-logic XOR circuits. # Chapter 2 # **Basic Building Block – XOR Circuits** #### II-1 Introduction The exclusive-OR gate is a fundamental unit in digital circuits. Typical applications include full adders, binary comparators, pattern generators and Error Correcting Circuits (ECC). For example, in many circuits implementing error control codes, XOR gate is one of the most frequently used logic units. The global specifications of ECC circuits depend greatly on the performance of the XOR cells integrated. In this chapter, we will discuss in detail the performance of different existing two-input XOR circuits. The performance comparison of these circuits will be presented and discussed. #### II-2 Specifications of Digital Gates Before presenting different kinds of XOR circuits, we need to have a look at the important specifications of a digital gate, including speed, power dissipation, size, etc. #### 2-2-1 Delay The delay of a gate is related to the current driving capacity of the transistors in the gate and the load capacitances. The current is determined by the size ratios (Width/Length) of the transistors in the current path. In case of a complex network circuit, such as a multiple-input XOR gate, the delay depends also on the number of stages and the capacitances contributed by the interconnections. Another factor related to the delay is the voltage levels of input signals and those at the intermediate circuit nodes. This aspect will be elaborated in later sections. #### 2-2-2 Power Dissipation In the work related to this thesis, the power dissipation is an important concern. Power dissipation can be divided into two parts: static power dissipation and dynamic power dissipation. The static power dissipation includes that contributed by leakage currents, and in some cases, by the $V_{DD}$ -to- $V_{SS}$ currents in some circuit branches. The latter results are often from the imperfection of the voltage levels of the binary voltage signals. Dynamic power dissipation has two components: the switching power consumption and the short-circuit power consumption. They are determined by the switch activities and the node capacitances contributed by the silicon layers and metal wires. The equation used to estimate the dynamic power dissipation can be expressed as [45]: $$P_{dyn} = V_{DD}^2 \cdot f_{clk} \cdot \sum_{k} \alpha_k \cdot c_k + V_{DD} \cdot \sum_{k} i_{scn}$$ (2.1) where $V_{DD}$ is the supply voltage. $f_{clk}$ is the clock frequency of the signal. $\alpha_k$ is the coefficient describing the node switching activities at the $k^{th}$ node. $C_k$ is the node capacitance at the output node of the $k^{th}$ stage. $i_{scn}$ is the node short-circuit currents. and k is the number of the stages. #### 2-2-3 Input Resistance The input resistance of a CMOS gate is infinite. The input resistances can be a finite value if the input nodes are not gate terminals. #### 2-2-4 Output Voltage Level This specification closely relates to the noise margins of the circuits. Moreover, it also relates to the current driving capacity which determines the speed and power dissipation of the circuits. Details will be presented in Section II-3 and II-4. #### 2-2-5 Size The circuit size depends on the number of transistors and their sizes. It also depends on the wiring complexity. The minimization of the circuit volume facilitates the integration of a system. Besides, this minimization may reduce the parasitic capacitance of the circuit and power dissipation. However, in terms of the operation speed, it may not lead to optimum results due to smaller transistor sizes. #### II-3 Two-Input XOR Cell We describe some two-input XOR cells in this sub-chapter to show some of the problems in the design of XOR circuits. #### 2-3-1 Conventional CMOS XOR An example of CMOS XOR gate are shown as shown in Figure 2.1[20][45]. Compared to other kinds of XOR gates, the number of transistors of a complementary CMOS XOR circuit is relatively larger. Moreover, the circuit can be very complex if the number of inputs is large. Therefore, some other kinds of XOR circuits have been proposed to fit different requirements. Figure 2.1 Example of a conventional CMOS XOR circuit. #### 2-3-2 Pass-Transistor-Logic XOR One commonly used logic to build XOR gate is pass-transistor logic which has a structure of a multiplexer as shown in Figure 2.2. Figure 2.2 Basic scheme of two-input pass-transistor-logic XOR cell. If each switch is a single MOSFET, it requires a very smaller number of transistors, which reduces delay, power dissipation and the size. However, it has its weakness of non-ideal output voltage levels, called "poor 1" and "poor 0"[45]. The problem can be solved by using complementary switches, as shown in Figure 2.3. However, in this case, complementary switches and control signals have to be used, the number of the transistors has to be doubled, and the circuit wiring will be much more complex, which reduces the significance of the advantages the pass-transistor logic gates have over the CMOS ones. Figure 2.3 Pass-transistor-logic XOR cell, using complementary switches. It should be noted that some of the input terminals of pass-transistor logic circuits are source/drain terminals with finite-value resistances. Therefore, any non-ideal operation state in the pass-transistor-logic circuits can affect the operation of the preceding stages. Usually, buffers are needed to separate the stages. #### 2-3-3 Mixed Logic XOR Circuits Some other simple XOR circuits based on pass-transistor logic mixed with other logic gates have been proposed. They are discussed below. #### 2-3-3-a Cross-Coupled XOR and XNOR [45] The circuits shown in Figure 2.4 are two examples. Each of them consisting of only four transistors to realize the XOR or XNOR function. But they both have the level problem. For example, in the circuit shown in Figure 2.4(a), when both inputs are LOW, the output level is higher than $V_{SS}$ . Moreover, the circuit does not have infinite input resistances. In conclusion, they have all the problems that pass-transistor-logic circuits of single-MOS transistor switches have. Figure 2.4 Cross-coupled XOR and XNOR cell. - (a) Logic gate realizing XOR function. In this circuit, the low level of the output can be higher than zero due to the "poor-0" problem of PMOS. - (b) Logic gate realizing XNOR function. In this circuit, the high level of the output can be lower than $V_{DD}$ due to the "poor-1" problem of NMOS. #### 2-3-3-b A Combination of the Pseudo-NMOS and the Pass-Transistor Circuit The circuit shown in Figure 2.5 [20] is even simpler. There are only three transistors in this XOR cell, which is a combination of pseudo-NMOS and a pass-transistor circuit. In terms of the number of transistors required to form the cell, it is so far the best. However, like pseudo-NMOS gates, this circuit has the problems of non-zero static power dissipation and non-ideal low voltage levels. Figure 2.5 Pseudo-NMOS and cross-coupled pass-transistor XNOR cell. #### II-4 Level Recovering Techniques CMOS XOR gates have the advantages, such as robustness and regularity in the circuit structure. However, it requires large number of transistors, which can cause a larger delay and requires more silicon space. The pass-transistor-logic circuit and mixed logic XOR gates and mixed logic ones are simpler. However, the problem of a "poor-1" or a "poor-0" of single-MOS switches affects the noise margins, speed and power dissipation of the gates. Thus, it is necessary to solve this problem. Several techniques have been proposed. We are going to discuss them in this sub-section. #### 2-4-1 Partial CMOS Switch XOR Cell As we discussed in Section 2-3-2, we can solve the level problem by using complementary switches. However, since it requires both inputs have complementary signals controlling the switches, which can eliminate the advantages of the pass-transistor-logic gates over the CMOS ones. Due to this, a modified version is shown in Figure 2.6 [23], which could be seen as a partial CMOS switch circuit. It is also similar to the cross-coupled XOR cell shown in Figure 2.4 except that a PMOS transistor $P_3$ is added in order to compensate the level difference when A = B = "I". Meanwhile, only one complementary signal of the inputs is required to control the additional switch $P_3$ . Although this circuit attempts to keep the structure simple while solving the level problem. This version still does not reach the goal to have a small number of transistors. Another drawback is that its structure is irregular. Figure 2.6 One modified version of cross-coupled XNOR cell. #### 2-4-2 Level Recovering Inverter One of the approaches to compensating for the output level is to add a level-recovering inverter, as illustrated in Figure 2.7. (a) Level recovering using an inverter Figure 2.7 Level recovering inverter and its operating points. Poor voltage levels can result in static current and power dissipation. With this approach, the output voltage of the inverter shown in Figure 2.7 can be almost full swinging, i.e. $V_{OH} \approx V_{DD}$ , $V_{OL} \approx V_{SS}$ . However, the following two points should be underlined. - At least one of the two input voltage levels of the inverter is different from the supply voltage (or the GND) and the difference is large than the zero-bias threshold voltage of the transistors, $V_{tn0}$ or $|V_{tp0}|$ . Thus, the static power dissipation of the CMOS inverter is non-zero, as shown in Figure 2.7(b). - The transistors of the inverter have to be carefully sized, in order that the threshold voltage is correctly set, not to be too close to the voltage level of "poor 1" (or "poor 0"). In other words, there is a critical limitation for sizing the transistors. #### 2-4-3 Single Pull-Up PMOS Another solution proposed to solve "poor 1" problem is to apply a pull-up PMOS transistor[22][45]. As shown in Figure 2.8. if $V_x$ is high, its output $V_\gamma$ will be high, which drives $V_{\overline{\gamma}}$ to a low level. This low level $V_{\overline{\gamma}}$ drives the pull-up PMOS P<sub>2</sub> on and $V_\gamma$ is expected to reach $V_{DD}$ after $i_l$ charging the capacitor $C_{\mu}$ Thus, the level of $V_{\gamma}$ is restored. However, this circuit can have problems in dynamic operation. Assume the circuit is initialized as $V_{\overline{x}} = 0$ V, $V_X = V_{DD}$ , and $V_{\gamma} = V_{DD}$ . Then $V_{\overline{x}}$ changes from 0 to $V_{DD}$ , and $V_X$ intends to change to 0V, so does $V_{\gamma}$ . While $V_{\overline{\gamma}}$ is still lower than $V_{DD} + V_{tp0}$ , both P<sub>2</sub> and N<sub>2</sub> are on. Whether the voltage $V_{\gamma}$ can be reduced to a low level depends on the difference between the two currents $i_I$ and $i_3$ . If $i_3 > i_1$ , the capacitor $C_Y$ will be discharged and $V_Y$ will reach zero. However, if $i_3 \le i_1$ , $V_X$ and $V_Y$ will increase or remain the same. An equilibrium may be reached and $V_Y$ remains at a relatively high level. The simulation results of such a case is shown in Figure 2.9. Figure 2.8 Single pull-up PMOS circuit. - (a) Single pull-up PMOS circuit. The PMOS transistor $P_2$ and the inverter are added to compensate the voltage dropping at $V_Y$ in case $V_X = V_{DD}$ . (b) Equivalent circuit during the transitions in case $V_{AB}$ is changed from $V_{DD}$ . - (b) Equivalent circuit during the transitions in case $V_X$ is changed from $V_{\rm DD}$ to 0V. Figure 2.9 Simulation results. During the period, $i_3 \le i_1$ , then $V_Y$ remains high and $V_{\overline{Y}}$ remains low, which causes logic errors. To insure $i_3 > i_1$ during the transition, we should carefully choose the size ratios of the transistors concerned. This is a limitation of this level-recovering technique. #### 2-4-4 Self-Cross Pull-Up Pull-Down Structure Another version of the XOR circuit with level compensation has been presented by D. Radhakrishna [27]. As shown in Figure 2.10, the circuit consists of two parts. "a pseudo-NMOS-like" cross-coupled pass-transistor XOR circuit similar to that shown in Figure 2.5, and its complementary version. The two parts make a closed loop with the output signal of each part feed to the other one as shown in Figure 2.10. In each part, the pull-up PMOS (or pull-down NMOS) is controlled by the output from the other part. It attempts to perform both the XOR and XNOR functions simultaneously with full swing voltage levels. Figure 2.10 Self-cross pull-up XOR-XNOR cell [27]. The circuit seems to have full swing output in the static operation. However, it does have the problem of the level and static power dissipation due to its feedback structure. In the case shown in Figure 2.11, the resulting outputs $V_F$ and $V_{\overline{F}}$ may not have good voltage levels and static currents $i_l$ and $i_2$ may exist in the circuit. Figure 2.11 Example of a troubled dynamic operation of the circuit shown in Figure 2.10. Assume that the input $V_X$ and $V_Y$ are respectively from the preceding gates. and the circuit is initialized as $V_{\overline{X}} = V_{DD}$ , $V_{\overline{Y}} = V_{DD}$ , $V_X = 0V$ , $V_Y = 0V$ , the output $V_F = 0V$ , and $V_{\overline{F}} = V_{DD}$ . Then, $V_{\overline{X}}$ changes from $V_{DD}$ to $V_{SS}$ , and $V_X$ rises toward $V_{DD}$ . $V_F$ intends to change to $V_{DD}$ . However, if $i_I$ is comparable with $i_3$ , $V_F$ can not rise to a high voltage. Then $P_3$ can not be cut off completely and $V_{\overline{F}}$ remains higher than zero. This non-zero signal is fed back to N1 and keeps it on. Therefore, the current $i_I$ , $i_2$ and $i_3$ are non-zero in static state. It could not only cause the static power dissipation problem but also a logic error. From above discussion, we have the conclusion that although this circuit requires only a small number of transistors to realize the XOR and XNOR function, there is a great risk that the circuit can not work properly. #### 2-4-5 Cross-Coupled Pull-Up PMOS Another solution to the voltage level problem in an NMOS pass-transistor logic circuit is to use only PMOS transistors shown in Figure 2.12 [22]. The PMOS transistor functions as a pull-up transistor. However, unlike the case described in Section 2-4-4, the gate voltage $V_{\bar{F}}$ , shown in Figure 2.12 is not produced by the pass-transistor logic gate itself, but provided by an independent gate performing the complementary function. Since only NMOS transistors are used in the gate, the output levels should have only the "poor-1" problem. When it occurs, its complementary signal $V_{\bar{F}}$ , which is also generated by an NMOS pass-transistor logic gate, is expected to be at a low level which is used to drive the pull-up PMOS transistor on and it brings the level of $V_{\bar{F}}$ to $V_{DD}$ . It should be noted that, unlike the case shown in Figure 2.11, in the NMOS pass-transistor-logic illustrated in Figure 2.12, there is no node eventually switched to the ground. Figure 2.12 NMOS output $V_F$ Voltage level problem solved by pull-up PMOS using $V_{\overline{F}}$ . This circuit avoids the possible dynamic operation problem that single pull-up PMOS and self-cross pull-up pull-down structures have. As shown in Figure 2.13, since there are no current path connecting to $V_{SS}$ , the output $V_F$ can be charged to perfect high voltage level $V_{DD}$ by $i_I$ and $i_2$ . Figure 2.13 Pull-up transistor works in this circuit. Figure 2.14 Cross-coupled pull-up PMOS circuit. As the true signal $V_F$ and its complementary signal $V_{\overline{F}}$ are required in most of the large circuits, two blocks performing functions F and $\overline{F}$ can be placed together with their pull-up PMOS transistors as shown in Figure 2.14. Both output signals have full swing logic levels. The circuit also works with cross-coupled pull-down NMOS if using PMOS pass-transistor logic. It is shown in Figure 2.15. Figure 2.15 Cross-coupled pull-down NMOS circuit. #### II-5 Conclusion In this chapter, the basic building blocks, XOR circuits have been discussed. Besides the conventional CMOS and pass-transistor logic XOR gates, some existing mixed logic XOR gates, which are simpler than the CMOS ones, are also described. In these circuits, level restoring techniques are usually applied in order to solve the "poor 0" or "poor 1" level problems of NMOS or PMOS switches. However, these techniques have their shortcomings. The partial CMOS switch XOR cell does not reduce the number of transistors and it loses the regularity in structure. The level recovering inverter may cause a static power dissipation and this method has a limitation of selecting the transistor sizes. Another method that using single pull-up PMOS may cause logic error if the size ratio of the transistors are not chosen properly. The similar logic-error could happen in the self- cross pull-up pull-down structure. Using cross-coupled pull-up PMOS transistors is so far a good solution for level restoring without risk of static power dissipation. The technique of adding cross-coupled pull-up PMOS transistors can be further explored in the design of multiple-input XOR circuits, which will be discussed in the next section. ### Chapter 3 ### **Multiple-Input XOR Circuits** #### **III-1 Introduction** XOR gates employed in signal detecting and processing circuits, such as ECC circuits, often have a large number of inputs ( $\geq 4$ ). To have a reasonable speed and power dissipation combined, we need design specifically structured multiple-input XOR gates, instead of connecting single two-input XOR gates mechanically. In this section, we first present the analysis of the existing multiple-input XOR circuits. Based on the analysis, an approach to the improvement of speed and power dissipation of PTL XOR circuits are presented. A study of the performance improvements and implementation of the approach are also described, as well as the evaluation of the performance by Hspice simulation. #### **III-2** Structure of Multiple-Input XOR Because of the simplicity of the circuits structure, the approach of PTL is often taken for the design of low-power and high-speed XOR gates. Thus, two-input pass-transistor-logic XOR gates are usually used as the basic unit to build multiple-input XOR blocks. There are three types of XOR structures commonly used: the chain structure, the tree structure, and the mixed structure. #### 3-2-1 Chain Structure The basic scheme of a multiple-input XOR gate with a chain structure is shown in Figure 3.1. It should be noted that the basic cells in the scheme are two-input XOR gates. Figure 3.1 Multiple-input XOR with a chain structure of two-input XOR gates. Assume that all the inputs are provided by CMOS gates with full swing voltage levels. All the gates of MOS pass transistors of the circuit are driven by full swing voltage levels as shown in Figure 3.2. The low level of the output voltage is zero volts, and the high level is $V_{DD}$ - $V_m$ in the worst cases. Figure 3.2 Two-input pass-transistor-logic XOR gates arranged in a chain structure. If n is the number of inputs of a multiple XOR gate. n-l two-input XOR gates are required to build the circuit. If each unit has a delay of $t_d$ , then the total delay of the circuit, consisting of n-l cascading stages, is $T_d$ , where $T_d \ge (n-1) \cdot t_d$ . The best case $(n-1) \cdot t_d$ only occurs when each pass-transistor-logic stages is well isolated, usually by inverters. Otherwise, the delay increases exponentially, because the charging or discharging current required to change the voltage at the output terminal is provided by the preceding gate, the current flows through a chain of pass transistors. Thus, in order not to have a serious delay, a large number of chained stages should be avoided, which limits the number of inputs of the XOR gates. The power dissipation of this structure $P_D \approx (n-1) \cdot p_D$ , i.e., it is nearly a linear function of the number of stages. #### 3-2-2 Tree Structure The multiple-input XOR gate can also be made using basic XOR cells in a tree structure as shown in Figure 3.3. Figure 3.3 Tree structure. For an n-input XOR gate of tree structure, the total number of basic two-input units is n-l; same as for the chain structure. Because of the same number of the units used, the power dissipation of the tree structure circuits is similar to the chain structure circuits. To estimate the delay, the number of stages for signal propagation, N, needs to be examined. In an n-input XOR gate of tree structure, $N = \log_2 n$ , instead of n-l in a chain structure. Therefore, the former has a total delay of $T_{d-tree} \ge (\log_2 n) \cdot t_d$ , where $t_d$ is the delay of the basic unit, compared to $T_{d-chain} \ge (n-1) \cdot t_d$ for the latter. Evidently, the tree structure XOR gates have advantages of speed over the chain structure ones, because of their smaller number of layers. However, this advantage may not be significant in some cases and some critical level problems may occur. One such example is shown in Figure 3.4. Figure 3.4 Level problem when two-input pass-transistor-logic XOR cell used in the tree structure. As we can see in a basic cell of the second layer (Figure 3.4), the four inputs of the basic cell are from the first layer, and all of them have "poor-1" problems ( $V_{OH} = V_{DD} - V_{in1}$ ), where $V_{in1}$ is the threshold voltage of the transistors. Two of these signals are applied at the gate terminal of the cell. The output in the high level of the second layer will be the voltage applied at the gate minus its threshold voltage, $v_{in2}$ , i.e., $V_{OH}=(V_{DD}-V_{tn1})-V_{tn2}$ . When this signal continues to be transmitted to the $k^{th}$ stage, the high output level will be $V_{OH}\approx V_{DD}-kV_{tn}$ . Therefore, the poor voltage level of each layer will be propagated and accumulated. In order to insure sufficient noise margins of the circuits, the number of layers cannot exceed two. Otherwise, level recovering techniques have to be taken for the circuits to operate correctly. The level problem described above affects not only the noise margins of the circuits but also the operation speed. If the "poor-1" voltage signals from the preceding layer are applied at the gates of the pass transistors, the transistor currents will be weakened compared to the case of "good-1" gate voltages. Consequently, the propagation delay will be increased. From the above discussion, we can conclude that, unless some measures are taken to solve the level problem, the pass-transistor-logic of the tree structure will not have significant advantages over those of the chain structure in terms of power dissipation and operating speed due to the level problem. Moreover, the routing for the tree structure is more complex than that of the chain structure. #### 3-2-3 Mixed Structure There are some types of multiple-input XOR gates apparently having different features from those of the chain structure and the tree structure. One of the examples is the DCVS circuit. [12][13][16][45]. A DCVS (Differential Cascade Voltage Switch) XOR is, in fact, a passtransistor-logic network transferring logic "0" to one of the output terminals, while the voltage of the other output node is pulled to $V_{DD}$ by means of a PMOS. Since the pass-transistor-logic network consists of NMOS pass transistors, the low-level at the output is zero volts. Thus, there is neither "poor-1" nor "poor-0" problems. A typical DCVS XOR circuit is shown below in Figure 3.5. Figure 3.5 Example of mixed structure of XOR gate. - (a) DCVS XOR[45] - (b) The same circuit (a) placed horizontally to show the transmission of the "0" to the node out or out. The speed of the DCVS XOR circuit is related to the number of layers in the pass-transistor-logic network. A large number of layers results in a large delay. The advantage of this circuit is that all the inputs voltage signals are applied to the gate terminals of the pass-transistor-logic network. Thus, the resistance of each input terminal is infinite, and there is no current flowing from preceding stages, which is the same as that in CMOS PUN/PDN gates. # III-3 An Approach to the Improvement of Speed and Power Dissipation We have discussed three kinds of multiple XOR gates composed of two-input pass-transistor-logic XOR units: chain structure, tree structure and mixed structure. The chain structure is simple, but in order to avoid a long current path, the number of inputs, i.e., the number of stages has to be limited. The same problem also exists in the mixed structure. The tree structure has fewer layers with a seem-to-be smaller delay compared to the chain structure and the mixed structure. However, due to the problems of "poor-1" or "poor-0" of MOS pass transistors, a successive voltage level loss can be produced, which not only weakens the current driving capacity of the transistors, but also cause logic errors in some cases. Therefore, the chain, tree or mixed structure XOR circuits all have disadvantages in different aspects. A design approach for improving the performance in terms of speed and power dissipation is presented in the following sub-sections. #### 3-3-1 Description of the Proposed Approach The improved circuit is based on the tree structure for its advantage of a small number of layers of the gates. By using the tree structure instead of the chain structure, the total delay can be reduced $\frac{n-1}{\log_2 n}$ times (Section 3-2-2), where n is the number of inputs. The larger the number of inputs, the greater speed improvement of the tree structure XOR gates over the chain structure gates, if the problem of the poor levels can be solved. The level problem in the tree structure pass-transistor-logic circuit is due to the successive voltage level reduction. This problem can be solved by applying level-restoring cells to make the gate voltages of all the MOS pass transistors to have a full swing. By doing so, we can not only eliminate successive voltage level degradations, but also maximize the current driving capacity of the MOS pass transistors, which results in a maximum speed. However, for both speed and power concerns, we propose to make only the gate voltages full swing, not the voltages at the diffusion terminals. An example illustrating the approach is shown in Figure 3.6. This is a multiple-input XOR circuit based on basic pass-transistor-logic XOR units and level-restoring cells. Each XOR unit has two inputs. The one drawn in thick gray line (Figure 3.6b) is applied at the MOS gate and the other drawn in thin line at the diffusion node (Figure 3.6a). Level-restoring cells are placed only at the branches connecting to MOS gates, so that all of the gate signals have a full voltage swing. The analysis of the improvement resulting from this arrangement is elaborated in the following paragraphs. Figure 3.6 Multiple-input XOR circuit using the proposed approach. - (a) Basic XOR cell with asymmetric inputs. $X_1$ and $\overline{X}_1$ are the inputs controlling the gates of MOS pass transistors indicated by thick lines. $X_2$ and $\overline{X}_2$ are the signals applied at the diffusion terminals. - (b) Circuit scheme consisting of the basic XOR cells and level-restoring cells which are used to recover only the levels of the gate voltages. #### 3-3-1-a Voltage Level By means of the level-restoring cells, the levels of gate voltages are 0V and $V_{DD}$ . Those at the diffusion nodes are 0V and $V_{DD}$ - $V_{tm}$ , in an NMOS pass transistor network. Therefore, the output voltage loss is limited at $V_{tm}$ no matter how many gate layers there are in the circuit. #### 3-3-1-b Speed The level-restoring cell makes the voltage at every gate terminal full swing. The maximized $V_G$ results in a maximized current driving capacity of the MOS pass transistor and consequently, a short delay. The charging time is usually longer than the discharging time, because the driving capacity of PMOS transistors is weaker than that of NMOS transistors of the similar size in preceding stages. Therefore, the delay due to rise-time is more critical than that of fall-time. For each of the MOS pass transistor, during the rise-time, the input of the pass transistor, is the drain terminal of the transistor. The NMOS pass transistor is always in saturation during the transition, as $V_D \ge V_G - V_{tn}$ . The current is approximately $i_D \approx K_n \cdot \frac{W}{L} [v_{GS} - V_{tn}]^2$ , For the two cases, $v_G = V_{DD}$ in the full swing gate voltage and $$v_G = V_{DD} - V_{tn}$$ in the case of "poor-1" gate voltage, the current ratio $\frac{i_D|_{V_G = V_{DD}}}{i_D|_{V_G = V_{DD}}}$ can be expressed as $$\frac{(V_{DD}-v_S-V_{tn})^2}{(V_{DD}-v_S-2V_{tn})^2}$$ . If $v_t \approx 0.5V$ , $V_{DD}=1.8V$ , at the beginning of the switching period, the current in the case of $V_G = V_{DD}$ is $\frac{(1.8 - 0.5)^2}{(1.8 - (2 \cdot 0.5))^2} = 2.6$ times larger than that of the other case. This example shows, by means of raising the gate voltage from $V_{DD}$ - $V_{tn}$ to $V_{DD}$ , the current can be increased by about 2.6 times. With the same $\Delta V$ , the voltage variation, and load capacitor, the rise-time of transient period is greatly reduced with a full swing high voltage level at the MOS gate, which results in a faster speed for the whole circuit. #### 3-3-1-c Power Dissipation The power dissipation of the circuit is optimized by not restoring the voltage level of the signal applied at the diffusion nodes of the basic XOR cell. In this case, the maximum voltage at this node is $V_{DD}-V_{in}$ , $(V_{in}>V_{in0})$ , due to the body effect), the energy storage/dissipation is about $C(V_{DD}-V_{in})^2$ , instead of $CV_{DD}^2$ , where C is the node capacitance. Therefore, without the level restoration, the energy loss by charge and discharge at this node is about $\frac{(V_{DD})^2}{(V_{DD}-V_{in})^2}$ times smaller than that when voltage levels are restored. If $v_t \approx 0.5V$ , $V_{DD} = 1.8V$ , that is $\frac{(1.8)^2}{(1.8 - 0.5)^2} = 1.9$ times smaller. This results in less power dissipation at those nodes. #### 3-3-2 Different Schemes of the XOR Gates with Level Restoring #### 3-3-2-a XOR Gates with Level-Restoring Inverters Several level-restoring techniques have been described in Section 2-4. By using an inverter as shown in Figure 3.7, we can easily restore the signal levels. However, the introduction of the inverter brings an additional delay and power dissipation. Moreover, the additional power dissipation is mainly caused by a static current in the inverter. Because of this delay, the improvement of speed may become insignificant or even negative. Figure 3.7 Using inverters as the level-restoring cell. #### 3-3-2-b PMOS and NMOS Pass-Transistor-Logic Tree Structure As discussed above, the basic point of improving the speed is making the gate voltages of the pass transistors to be equal to $V_{DD}$ . For this purpose, instead of inserting a level-restoring cell, we may use PMOS pass-transistor-logic cells to generate the "good-1" signals to be applied to the MOS gates of the next stages, as shown in Figure 3.8. Thus, without placing any level-restoring cells, i.e., no additional delay, high gate voltages of $V_{DD}$ are insured and the high level of the output voltages will not be lower than $V_{DD}-V_{tm}$ , the same as that of the circuit in Figure 3.8. Figure 3.8 PMOS and NMOS tree structure. PMOS pass-transistor-logic cells are used to insure a "good-1" voltage level of the signals at the MOS gates. It should be noted that the use of the PMOS cells provides a good high level of the output voltage of the cell. But it results in a "poor-0" level, which leads to a problem of large static power dissipation in the succeeding stage. The poor low voltages, applied at the MOS gates of the succeeding stage make the NMOS transistors of the stage not completely off. As the two inputs of each XOR gate unit are complementary, and, as mentioned, neither of the two pass transistors are off, a current path from $V_{DD}$ to $V_{SS}$ is formed via the two pass transistors, as shown in Figure 3.9. This current results in a static power dissipation. Figure 3.9 Short-circuit current in PMOS and NMOS tree structure circuit. In conclusion, this approach of the PMOS and NMOS pass-transistor-logic tree structure improves the speed at the expense of power dissipation. Moreover, because of the static currents in the succeeding stage, the levels of the output voltage of that stage are not perfect. This level loss will affect the operation in the following stages. Thus, the number of stages (or layers) of this kind of XOR gates is limited, and inverters must be added at the end to recover the voltage levels. #### 3-3-2-c NMOS Pass-Transistor-Logic Tree Structure with Pull-Up PMOS An effective method to restore the signal level without introducing an additional delay and power dissipation is to use pull-up PMOS transistor pair controlled by complementary signals, as shown in Figure 3.10. Figure 3.10 Improving the performance of XOR circuit of tree structure by using pull-up PMOS transistors at gate nodes. The use of these pull-up PMOS transistors in Figure 3.10 introduces almost zero additional delay to the circuit. Moreover the voltage rise during the transition is accelerated, the PMOS transistors can provide an additional path for the charging current (Figure 3.11). Therefore, with the pull-up PMOS transistors, the rise-time can be shortened and the speed of the circuit can be further improved. This speed improvement is achieved at almost no expense of power dissipation as the current from the PMOS is designed to contribute neither to the short-circuit current during the transition, nor the static current in steady state as shown in Figure 3.11 (b) and (d). Figure 3.11 Charging and discharging processes with and without pull-up PMOS. - (a) The charging process in the circuit without pull-up PMOS. There is one current path to charge the capacitor at the output node $V_O$ . - (b) The charging process in the circuit with pull-up PMOS. There is two current paths to charge $V_O$ . Thus, the rise-time is shorter. No path for static current exists. - (c) The discharging process in the circuit without pull-up PMOS transistor. - (d) The discharging process in the circuit with pull-up PMOS transistor. The pull-up PMOS is supposed to be off. The difference is the discharged voltage of $V_O$ changes from $V_{DD}$ - $V_{tn}$ to $V_{DD}$ , which is desirable. As discussed above, we have presented three schemes for level restoring of the gate voltages of the pass transistors: level restoring inverters. PMOS and NMOS tree structure and "complementary" pull-up PMOS transistors. Among these three schemes. only that with the pull-up PMOS transistors can solve the problems of voltage levels and static power dissipation. Moreover, with the pull-up transistors, the tree structure can be easily expanded to multiple layers without the loss of the voltage level. More detailed analysis of this circuit structure will be given in the next sub-section. # III-4 Analysis of the Pass-Transistor-Logic XOR Gates of with a Large Number of Inputs As mentioned in previous sub-sections, for a tree structure, if n is the number of inputs and $n = 2^N$ , the number of the layers is $N = \log_2 n$ . The tree structure is beneficial for a large n, which corresponds to a relatively small $\log_2 n$ , i.e., a relatively short path of transistors in series. Before starting the discussion about the delay of an XOR circuit of tree structure, it should be noted that such an XOR circuit has its particularity that is discussed in the following paragraph, as shown in Figure 3.12. Figure 3.12 Four-input pass-transistor-logic gate with two layers of two-input XOR cells. In each cell, shown in a dashed frame, the complementary input pairs are applied at the gate terminals and the diffusion terminals of the MOS pass transistors respectively. For each of the input cases, the number of transistor between output node $O_{2-1}$ (or $O_{\overline{2-1}}$ ) and the input node, is two. (The level-restoring-cells are not shown here in order not to make the diagram complex.) It is evident that, in a two-input XOR circuit, the change of the output signal from its high level to its low level or vice versa, is caused by the change of only one of the two inputs, e.g., $(X_1, \overline{X_1})$ or $(X_2, \overline{X_2})$ . In other words, if both inputs change, the output should remain the same. In a pass-transistor-logic XOR circuit such as shown in Figure 3.12, the output voltage of each basic cell can be changed only when one of the input pairs changes, and one pair of the inputs is applied at the gate terminal of the pass transistors, while the other is at the diffusion terminals. Therefore, the change of output of the n-input XOR circuit ( $n = 2^N$ ) is caused by the change either at the input gates or at the input diffusion terminals. To evaluate the time required for the output voltage to change from one level to the other, we can consider the cases that when the input at the gate terminal is changing while that at the diffusion terminal remains the same and vice versa. These two cases will be discussed below. ## 3-4-1 Analysis of the Delay Caused by the Change of the Input Applied at a Diffusion Terminal As shown in Figure 3.12, at any time, there is only one current path between the output node OUT (or $\overline{OUT}$ ) and one of the diffusion input nodes. It should be noted that, for each of the input cases, the number of pass transistors between the input and output nodes remains the same, which is $\log_2 n$ , where n is the number of inputs of the circuit. To evaluate the delay of an n-input pass-transistor-logic XOR gate, the circuit can be simplified into a chain consisting of $\log_2 n$ , $n=2^l$ , $2^2$ ...., as shown in Figure 3.13, in case that the signal is transmitted through the channels of the transistors, while the gate voltage of each transistor is $V_{DD}$ . Figure 3.13 MOS pass transistors connecting one of the diffusion input node and the output node, when the voltage change occurs at the diffusion input while the gate voltages of the pass transistors are V<sub>DD</sub>. The structure shown in Figure 3.13 is often modeled as an RC-network[46] with a time constant $\tau = \sum_{j=0}^{k} C_k \sum_{j=0}^{k} R_j$ , where $R_j$ and $C_k$ are the resistance and capacitance on the RC network as illustrated in Figure 3.14. However, this model may not be suitable for a transient analysis of the pass-transistor-logic circuit, as the transistors may not always act as resistors during the transient period. Figure 3.14 Equivalent RC-network for the transistor chain. In a gate circuit, the rise-time, is usually more critical than the fall-time, because the driving capacity of PMOS transistors is weaker than that of NMOS transistors with a similar size in preceding stages. The pass-transistor-logic gate is not an exception. Thus, we focus on the evaluation of the rise-time. It should be noted that when the input voltage of the circuit changes from low to high, the RC-network model, as shown in Figure 3.14, needs to be modified for evaluating the delay. As shown in Figure 3.15, during the rise process, $V_I$ changes from zero to $V_{DD}$ - $V_{In}$ , where $V_{In}$ is the threshold voltage of the MOS pass transistor $M_1$ . This pass transistor, if it is on, is always in the saturation mode and behaves as a voltage-controlled current source, and its current is determined by $V_{GSI}$ = $V_{DD}$ - $V_I$ , the gate-to-source voltage of the transistor. For each of the other NMOS transistors of the circuit shown in Figure 3.13, the drain-source voltage is between zero and $V_{DD}$ - $V_{In}$ . These transistors are always in the linear region. Therefore, the model for the evaluation of the delay should be that shown in Figure 3.16, where $R_I$ is the resistance of $M_1$ in the saturation mode, and it is much larger than that of the transistors in the linear mode, and the current $i(V_{GSI})$ is determined by $V_{GSI}$ . Figure 3.15 First transistor in the pass transistor chain. Figure 3.16 Modified model of the pass-logic-transistor chain. The new model presented above can be used for an evaluation of the delay of an n-input XOR gate with the tree structure when one of the input signals applied at the diffusion input terminals changes, while the gate voltage of each of the transmission transistor is $V_{DD}$ . The delays of the XOR gates with 2. 4. 8,..., till 128 inputs, respectively, have been calculated using the two models, the classical RC model, and the one of Figure 3.16. The results are plotted in Figure 3.17. The electrical simulation results of the NMOS pass transistor circuit shown in Figure 3.13 are also plotted in the same figure for comparison. Figure 3.17 Computation results using the models of the classical RC-network, modified model with the current-source, respectively, and the Hspice simulation results of the gate circuit. From Figure 3.17, we can see a significant discrepancy between the two curves obtained by using the classical RC-network model and the results of the electrical simulation. The curve of the RC-network model deviates from that of the electrical simulation more than 30%. However, the curve obtained by using the modified model including the current-source is almost superposed on the curve of the simulation, and the deviation rate is less than 2%. Therefore, without requiring more computational power, the modified model provides much better results for the evaluation of the delay of the circuit than the RC model. In conclusion, we proposed a new circuit model for evaluating the delay of the multiple-input XOR circuits in case that the signal is transmitted via the diffusion nodes of the transistors. This model provides a much more accurate delay estimation of the XOR circuits than the classical RC-network model. As an extension, this model can also be used in transmission lines such as pass-transistor-logic chains. As mentioned previously, in the XOR circuit, a change of the output voltage can result from a change of the input applied at gate terminal, which will be discussed in next sub-section. ## 3-4-2 Analysis of the Delay Caused by the Change of the Input Applied at a Gate Terminal As mentioned in the beginning of Section III-4, the level of output voltage can be changed because an input signal applied at a gate terminal changes, while the inputs at the diffusion terminals remain the same. The time required for the output voltage to change is different from the case we have discussed in Section 3-4-1. If the gate voltage of the pass transistor $M_1$ is initially zero, as shown in Figure 3.18, the transistor is initially off. If the voltages $V_{O1-2}$ and $V_{O2-1}$ at the two nodes separated by $M_1$ are initially different, when the gate input voltage changes from zero to $V_{DD}$ , $V_{O2-1}$ starts to change. This $V_{O2-1}$ may be applied to a gate of a MOS pass transistor in the next stage. The worst case of delay occurs when the all gate voltages of the MOS pass transistors in the path of the signal propagation need to be charged from the low level to the high level. Figure 3.18 Voltage change at gate terminal. As shown in Figure 3.18, the gate voltage is a response to the signal at the preceding stage, and it has a finite rise-time as shown in Figure 3.19(a). Let's look at the two cases shown in Figure 3.19. As the gate voltage $V_{G2}$ rises gradually during the transition, as shown in Figure 3.19(b), the current $i_2$ is weaker than $i_1$ that is produced by a constant $V_{G1}=V_{DD}$ . Thus, the propagation delay due to a voltage change at the gate is much larger than that at the diffusion. Moreover, due to the change of the gate voltages, some other problems that are not critical in case of a small number of inputs, may have be taken into consideration in case of a large number of inputs. The problem of short-circuit currents may be the most critical one. Figure 3.19 Current $i_I$ generated when the gate voltage $V_{GI}$ is constantly high and the source (drain) voltage $V_I$ is changing, whereas $i_2$ is generated by a changing gate voltage $V_{G2}$ . On average, $i_2$ is weaker than $i_I$ during the transition. As shown in Figure 3.20, each pair of NMOS pass transistors is controlled by a pair of complementary gate voltages. The complementary output voltages change simultaneously in the opposite direction. During the transition, both NMOS pass transistors are turned on simultaneously. The short current flowing through them weakens the charging or discharging current to the targeted circuit node capacitor and increases the power dissipation. The duration of the short current is related to the rise/fall time of the input voltages at the gate terminals. Thus, a slow variation of the gate signals of the pass transistor results in an even slower variation at the output voltage of the pass transistor. If this output voltage is to be applied to the gate in the succeeding stage, the propagation and power dissipation will be even more critical. Therefore, it is very important that the input signals applied at the gate terminal have short rise and fall times. Figure 3.20 Finite rise/fall time of the gate signals resulting in short-circuit currents. #### 3-4-3 Effect of the Pull-Up Transistors As we know, when the number of inputs of the XOR circuits is large, the current path for charging or discharging consists of several transistors, as shown in Figure 3.21. The transient time increases with the number of transistors. Placing pull-up PMOS transistors can not only restore the high level but also reduce the rise-time of the output voltage by providing an additional current for charging the capacitor at the circuit node. This additional current, shown as $i_1$ in Figure 3.21(a), flows through only the pull-up PMOS transistor itself. If the PMOS transistor is fully on, $i_1$ will usually be stronger than $i_3$ that is the current flowing through the pass transistors in series. Thus, the pull-up PMOS can help to reduce significantly the rise-time. However, it should be noted that such a speed-up can only be obtained when certain conditions are satisfied. Figure 3.21 Discharging and charging process for the complementary output. - (a) Pull-up PMOS helping to accelerate the charging process at one of the output nodes. - (b) Pull-up PMOS slowing down the discharging process at the other output node, as it contributes negatively to the discharge current. - (c) Transient period divided into three phases. As described in previous sub-sections, the gate voltages of the NMOS pass transistors in the charging/discharging paths is made to be full swing by means of the pull- up PMOS transistor pairs. As shown in Figure 3.21: (a) and (b), when one of the complementary output voltages, e.g., $V_{\overline{Oi}}$ is changed from zero to $V_{DD}$ , the other one, $V_{Oi}$ , is expected to change from $V_{DD}$ to zero. Each of the two PMOS transistors, $M_1$ and $M_2$ , can provide a current to charge the node capacitors respectively at certain moment. The current from $M_1$ , in this case, is in favour of reducing the delay, but that from $M_2$ is not desirable. For a better analysis of the transition, the transient period is divided into three phases as shown in Figure 3.21(c). Let us look at the operation of the transistors during these three phases. - Phase 1: $t_0 \le t < t_1$ , $V_{oi} > V_{DD} |V_{tp}|$ . As $V_{Oi}$ is larger than $V_{DD} |V_{tp}|$ . $M_1$ is off, making no contribution to the charging process. $V_{\overline{Oi}}$ is increased by the charging current $i_3$ flowing through the NMOS pass transistors. During this phase, $V_{Oi}$ is still high. and $V_{\overline{Oi}}$ is low enough to make $M_2$ strongly conductive. Concerning the capacitor $C_{Oi}$ , there are two currents. One from the pass transistors is discharging it and the other from the PMOS $M_2$ is charging it. Hence, $V_{Oi}$ decreases only if the discharging current is stronger than the charging current. Thus, $V_{Oi}$ may decrease very slowly during this phase. The temporal existence of a strong current $i_2$ , due to the low level of $V_{\overline{Oi}}$ , slows the discharging process at node $V_{Oi}$ and the charging process at the node $V_{\overline{Oi}}$ . - Phase2: $t_1 \le t < t_2$ , $V_{oi} < V_{DD} |V_{tp}|$ and $V_{\overline{oi}} < V_{DD} |V_{tp}|$ . Because the rise of the voltage $V_{\overline{Oi}}$ gradually weakens the current $i_2$ of $M_2$ , the discharging is faster than that in Phase 1. Meanwhile, once $V_{Oi}$ is lower than $V_{DD}$ - $|V_{tp}|$ , the pull-up transistor $M_1$ provides a current $i_1$ . This accelerates the rise of $V_{\overline{Oi}}$ , which makes the current $i_2$ decrease faster. The circuit enters a cycle of accelerating both the rise of $V_{\overline{Oi}}$ and the fall of $V_{Oi}$ . • Phase 3: $t_2 \le t < t_3$ , $V_{\overline{oi}} > V_{DD} - |V_{tp}|$ . In the last period, $V_{\overline{Oi}}$ is charged to be higher than $V_{DD} - |V_{tp}|$ . $M_2$ is gradually turned off, while $M_1$ is fully conductive. $V_{\overline{Oi}}$ rises quickly to $V_{DD}$ . As discussed above, in order to accelerate the rising rate of $V_{\overline{Oi}}$ , we wish to turn on the pull-up PMOS transistor $M_1$ as quick as possible. The condition of turning on $M_1$ is that $V_{Oi}$ should be lowered to a voltage smaller than $V_{DD}$ - $|V_{tp}|$ by the discharging current $i_{discharge}$ . The discharging current $i_{discharge} = i_4 - i_2$ , where $i_4$ is the current that flows through a series of NMOS transistors and $i_2$ is the drain current of the pull-up PMOS $M_2$ . Thus $i_2$ should be zero or as small as possible to obtain a large discharging current. A transistor current is proportional to the size ratio W/L. This ratio should be minimized, which can also result in a smaller gate area of W\*L. Thus, the capacitance of nodes $O_i$ and $O_{\overline{i}}$ is also minimized, which shortens the charging and discharging periods and accelerates the rising rate of $V_{\overline{Oi}}$ . From the above discussion, it can be seen that adding the pull-up PMOS transistors can reduce the rise-time of the output voltage by providing additional currents, but the size of the PMOS transistors should be small to minimize the node capacitance and the undesired current that may contribute negatively to the circuit operation. #### 3-4-4 Summary of Section 3-4 In this sub-section, the analysis of the NMOS pass-transistor-logic XOR gate with a large number of inputs involving the pull-up PMOS is presented. Two extreme cases of the delay of the XOR circuit are evaluated, i.e., the inputs at the diffusion terminals change when those at the gate terminals remain the same and vice versa. In the first case, we proposed a model for the delay estimation of a pass-transistor-logic chain, and this modified model has been proved to be more accurate than the classical RC-network model. The phenomenon related to the delay when the gate voltages change in the circuit is also discussed. In particular, the functions of the pull-up PMOS transistors, are analyzed. It should be stressed that the PMOS transistors are used to recover the high level of the gate voltages, maximizing the currents of the NMOS pass transistors, and to provide an additional charging current to reduce the rise-time of the node voltages. The Hspice simulations of the circuits described in Section III-3 and Section III-4 have been done to verify the analysis results. They are presented in the following subsection. #### III-5 Simulation of the Multiple-Input XOR Gates We have described the approach to improve the speed and the power dissipation of the multiple-input XOR circuits in Section III-3 and III-4. In order to estimate the performance of the improved circuits comparing with that of the conventional CMOS and pass-transistor-logic circuits, Hspice simulation has been done using the models of a 0.18 micron process. #### 3-5-1 Simulation of the Four Four-Input XOR Gates Four types of four-input XOR gates are chosen in this simulation. The first one is a conventional CMOS XOR gate. The second one consists of two-input pass-transistor-logic XOR cells without level restoring (Figure 3.22). The third one is a combination of one PMOS pass-transistor-logic and two NMOS pass-transistor-logic XOR cells (Figure 3.23). The last one consists of NMOS pass-transistor-logic cells with pull-up PMOS transistors restoring the levels of the gate voltages (Figure 3.24). All MOS transistors are sized 0.22 µm/0.18 µm. The input voltages of the four XOR gates are provided by inverters placed at the proceeding stages. Each of the XOR gates has a minimum sized inverter as its load as shown in Figure 3.22 to Figure 3.24. The test circuit is shown in Figure 3.25. Figure 3.22 Scheme (b): Pass-transistor-logic XOR gates of tree structure without pull-up PMOS transistors. Figure 3.23 Scheme (c): XOR gate consisting of PMOS and NMOS pass-transistor-logic cells. Figure 3.24 Scheme (d): XOR gates consisting of NMOS pass-transistor-logic cells with pull-up PMOS transistors used to restore the level of the gate voltages. Figure 3.25 Test circuit. As we discussed in Section III-4, for the multiple-input XOR gates shown in Figure 3.22 to Figure 3.24, the worst case delay happens when the signal propagation is through the gate voltages of the pass transistors as shown in Figure 3.26. Figure 3.26 The variation of the input signals corresponding to the worst case delay. For each of the cells, the inputs applied at the gates of the MOS pass transistors are indicated by thick lines. #### 3-5-2 Simulation Results and Comparison The simulation results with 1.8V power supply are shown in Table 3.1. The delay and power dissipation of the loads (inverters) are included. Table 3.1 Simulation result with 1.8V power supply. | Scheme | | Max delay<br>(ns) | Power dissipation (10 <sup>-5</sup> W) | |--------|---------------------------------------------------------------------------|-------------------|----------------------------------------| | (a) | Complementary CMOS | 0.42 | 0.389 | | (b) | NMOS Pass-transistor-logic<br>XOR gate without pull-up<br>PMOS transistor | 0.31 | 1.753 | | (c) | NMOS-PMOS combined pass-<br>transistor-logic XOR gate | 0.23 | 4.760 | | (d) | NMOS pass-transistor-logic<br>XOR gate<br>with pull-up PMOS transistors | 0.13 | 0.284 | #### 3-5-2-a Speed The simulation results presented in Table 3.1 show that the NMOS pass-transistor-logic XOR gate with the pull-up PMOS transistors has the highest operating speed, which is 3.23 times faster than the complementary CMOS circuit and is 2.38 times faster than the NMOS pass-transistor-logic gate without the pull-up PMOS transistors. The speed of the NMOS-PMOS pass-transistor-logic combined XOR circuit is also higher than that of the conventional CMOS XOR gate and that of the NMOS pass-transistor-logic gate without pull-up PMOS transistors. These results are justified as follows. • Scheme (b) suffers the poor high level at gate terminals of the MOS pass transistors in the second layer, and consequently, a weak current driving capacity and a low speed. - In Scheme (c), the problem of the poor high level is solved by using PMOS pass-transistor-logic cell in the first stage. The speed is improved, however, it is not as fast as Scheme (d) because of the weaker current driving capacity of PMOS transistors compared to similarly sized NMOS transistors. - Adding pull-up PMOS transistors in Scheme (d) restores only the level of the gate voltages in the second layer to ensure a maximum current driving capacity of the NMOS pass transistors, and provides an additional path for the charging current, which greatly improves the speed of the circuit. #### 3-5-2-b Power Dissipation From the results in Table 3.1, the improved circuit Scheme (d) has the lowest power dissipation, which is 73% of that of the conventional CMOS XOR gate and is only 16% of that of Scheme (b), the circuit without the pull-up PMOS transistors. It has been mentioned that there is a large power dissipation in the XOR gates of Scheme (b) and Scheme (c) due to the static currents in the inverters, in the case of Schemes (b) and (c), and that in the PTL part in the circuit of Scheme (c). These currents result from the poor high level of the NMOS pass transistor in the case of the Scheme (b) (c), as shown in Figure 3.27, or the poor low level of the PMOS pass transistors in the case of the Scheme (c). These results confirm that the poor voltage levels affect not only the noise margins of the gate circuits, but also cause the problems of static power dissipation. Figure 3.27 Currents in the transistors of the load inverter in cases of Scheme (b) and Scheme (c) in Figure 3.22 and Figure 3.23. - (a) Pass-transistor-logic XOR circuit with the inverter as a load of the circuit. - (b) Voltage transfer characteristic and current profile of the inverter. It shows the inverter has a non-zero current in the steady state due to the poor high level of the output of the NMOS pass transistors in Scheme (b) and Scheme (c). The advantage of low-power dissipation of Scheme (d) results from the factors as follows: - It is evident that, in terms of power dissipation, the XOR gate with the pull-up PMOS used for recovering the gate voltages has the advantages. As the gate voltages have the perfect high and low levels, there is no static current path in the pass-transistor-logic circuit and in the inverter. - The voltage swing at the intermediate nodes is $V_{DD}$ - $V_{tn}$ , instead of $V_{DD}$ , which reduces the dynamic power dissipation. - With the pull-up PMOS transistors, the gate voltages have a shorter rise time, compared with the pass-transistor-logic circuits without pull-up PMOS transistors, which contributes also to the reduction of the dynamic power dissipation. ## 3-5-3 Simulation of XOR Gates with a Large Number of Inputs In order to compare the speed of different XOR gates with a large number of inputs, three kinds of circuits are simulated; they are the complementary CMOS XOR gate, the NMOS pass-logic XOR gate of chain structure, and the NMOS pass-logic XOR gate of tree structure with pull-up PMOS transistors restoring the gate voltages. The simulation results are shown in Figure 3.28. The number of inputs of the gates in the simulations is up to 128. Figure 3.28 Delay comparison of the three XOR gates. The delay of the NMOS pass-logic circuits of chain structure increases exponentially with the increment of the numbers of inputs, which agrees with the current-source-plus-RC model presented in Section 3-4-1. Therefore, the pass-transistor-logic chain structure is not suitable for XOR gates with a large number of inputs. As shown in Figure 3.28, compared with the complementary CMOS circuit, the modified NMOS pass-transistor-logic XOR gate with the pull-up PMOS operates nearly two times faster. By placing pull-up PMOS transistors in the pass-transistor-logic XOR circuit, the rise-time of the output is reduced as the drain currents of the pass transistors are maximized by the raised gate voltages. The pull-up PMOS transistors also provide additional currents to the nodes expected to be raised, which results in a faster operating speed. As a conclusion, both the theoretical analysis and the electrical simulation show that the NMOS pass-transistor-logic XOR gate has a better performance than that of the CMOS gates and the pass-logic circuit of chain structure. #### **III-6 Conclusion** In this chapter, we discussed different structures of multiple-input XOR circuits: chain structure, tree structure, and mixed structure. The focus of the study was on the tree structure as it has smaller number of transistor layers compared to that of the chain structure for the same number of inputs. However, it may suffer some critical level problems due to the level loss. This level problem results in not only a reduced noise margin, as identified by many designers, but also a low operating speed and a large power dissipation. Based on the analysis on speed and power dissipation of the XOR gates of the three structures. an improved XOR gate structure is presented in Section III-3. The improved gate structure is composed of the basic two-input NMOS pass-transistor-logic XOR units and the pull-up PMOS transistors used to recover the levels of the voltages applied at the gate terminals of the MOS pass transistors. In Section III-4, we discussed the case of n-input pass-transistor-logic XOR gates of tree structure with $n \ge 4$ . For such a gate circuit, half of the inputs are connected to the gate terminals of the MOS pass transistors, and the other at the diffusion input terminals. Because of the nature of XOR function, a voltage change at one of the outputs of XOR unit cell is caused by a change either at a gate or diffusion input of an MOS pass transistor. Thus, two important cases of the delay of the circuit are evaluated: when the input at the gate terminal is changing while that at the diffusion terminal remains the same, and vice versa. To evaluate the delay of the output signal with respect of a voltage change of an input of diffusion terminal, and the signal propagated through a chain of MOS pass transistor, we have proposed a current-source-plus-RC model, differing from a currently used RC-network model. Qualitative analysis based on MOS transistor characteristics and the results of numerical computations show that the modified model provides significantly better results than that of the conventional RC-network model. The deviating rate is reduced from over 30% to less than 2% for the delay estimation of pass-transistor-logic XOR gates. In fact, this model can be used not only in the case of pass-transistor-logic XOR gate, but also in the cases of transmission lines involving transistor chain. The second case is a pair of input signals applied at the gate terminals of a pair of pass transistors change while the inputs at the diffusion terminals remain the same, and the signals are propagated while the gate voltages are changed. During the transient period, the gate-to-source voltages of the transistors rise gradually and result in weaker currents, comparing to the case that the gate voltages remain $V_{DD}$ while the inputs applied at diffusion terminals changes. Thus, the delay caused by the change of the input applied at gate terminals is more critical than that caused by the change at diffusion terminals. Moreover, during the transient period, the short current weakens the charging/discharging current to the circuit node. In case of multi-layer pass-transistor-logic circuits, the propagation delay and power dissipation will be more critical in succeeding layers. The pull-up PMOS transistors are added for level-recovering, so that the output voltages can have full swing levels. During the transition, one of them can also provide an additional charging current to reduce the rise-time of the output voltage, while it is possible for the other PMOS transistor to contribute undesirably a current to the discharging process at the other output node. In order to minimize this undesirable contribution, the aspect ratio of the PMOS transistors should be small, which avoids to add a significant capacitive load at the output terminals. In order to verify the results of the analysis in Section III-3 and Section III-4, the Hspice simulations have been done and the results are presented in Section III-5. It is proved that the proposed PTL XOR gates with pull-up PMOS transistors provide a significantly higher operation speed and meanwhile lower power dissipation, compared to the other XOR gates. Based on the theoretical analysis and the experience of the electrical simulation given in this chapter, the improved multiple-input XOR circuit will be further evaluated by its application in Error Correction circuits (ECC) in the next chapter. # Chapter 4 ## An Application of Multiple-Input XOR Gates ## - Design of a Hamming Decoder #### IV-1 Introduction In Chapter 3, we have discussed different kinds of multiple-input PTL XOR gates and proposed a approach for speed-power improvement. In this chapter, we present a further assessment of the performance of the multiple-input XOR gates employed in digital systems and appraise the improvement resulting from the use of the XOR gates in these systems. Hamming decoder involves a large number of multiple-input XOR gates. These gates, in their turn, affect significantly the performance of the decoder. Hamming decoder is thus chosen as a circuit example for this assessment and further studies. ## IV-2 Hamming Decoder Logic Diagram. The Hamming decoder in question is a (72, 64) SEC-DED (single-error-correction double-error-detection) circuit which involves 64 bits code-word and 8 bits check bits to go through a 72-bit parity checker [5][6]. The parity check matrix **H** is shown in Figure 4.1. | Byte | | | | 1 | | | | | | 2 | | | | | | | 3 | | | | | | | 4 | | | | | | 5 | | | | | | ( | 6 | | | l | | | 7 | | | | | | i | 8 | | | | | Ch | ıec | ck | | |------------|---|-----|-----|---|---|---|-----|---|----|-----|-------------|------|----|----|----|-----|------|------|----|---|----|----------|-----|------|----|----|-------|------|----|------|------|------|-----|------|------|----|----|------|-----|---------------|----|-----------------------------------------------|-----|------|------|----|------|-----|------|----|------|----------------|---|----------|------|------|-------|-------| | Bit | 0 | 1 | 2 1 | • | s | 6 | , | 9 | 10 | 111 | <b>Z</b> 1: | 3 14 | 15 | 16 | 17 | 8 1 | 9 25 | ) ZI | 22 | 2 | 24 | <u>ජ</u> | 6 2 | 7 28 | 29 | 30 | 3 (2) | 2 13 | и. | 15 3 | 16 3 | 1 38 | 15 | 10 4 | 1 42 | 43 | 44 | 15 4 | 6 4 | 748 | 49 | 50 : | 515 | 2 53 | 3 54 | 55 | 56 5 | 7 5 | 1 59 | 60 | 61 ( | 52 6: | | 1 <2 | ، د، | 4 63 | 5 ، 6 | e7 c1 | | 51 | - | 1 | 1 1 | ı | ı | ı | 1 0 | 0 | 0 | 0 | 1 1 | 1 1 | 1 | 0 | 0 | | 1 | 1 | 1 | : | 0 | 0 ( | , , | 0 | 1 | 0 | 0 0 | 1 | 1 | 0 1 | . 0 | • | 0 | 1 0 | 0 | 0 | t | 0 0 | 0 | 1 | 0 | 0 | 0 1 | 0 | 0 | ٥ĺ | 1 ( | _ | • | • | 0 | 0 0 | t | <u> </u> | 0 ( | | - a | • | | <b>S</b> 2 | 1 | | | | | | _ | | | | | | | 1 | | _ | | | | _ | _ | _ | _ | _ | | _ | - | _ | _ | _ | _ | _ | - | _ | | _ | _ | _ | _ | • | | _ | | _ | | - | | | _ | _ | _ | _ | - | | _ | | _ | 9 ( | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 0 | | s: | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 ( | | S5 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 0 | | 1 | | | | | | | _ | | | | | | | | | _ | _ | | _ | _ | _ | _ | _ | _ | _ | _ | _ | _ | | _ | | _ | | _ | _ | | _ | | _ | • | _ | _ | | _ | _ | - | | _ | _ | _ | _ | _ | ٠ | _ | _ | _ | _ | 1 0 | | S8 | n | 9 ( | 0 | 0 | ŋ | 0 | : 0 | 0 | Œ | 1 ( | , ( | 0 | 1 | a | 0 | , ; | , | G | • | 1 | 0 | 3 ( | , | 0 | 1 | | ماه | · | 1 | | | i | i f | | 1 | , | 0 | 0 0 | 0 | <del>¦.</del> | , | <u>. </u> | | ÷ | ÷ | آه | | | Ť | ÷ | | <del>` '</del> | ۲ | - | 9 4 | | - | | Figure 4.1 Parity check matrix of the (72, 64) SEC-DED code [44]. The hamming decoder logic diagram is shown in Figure 4.2. The inputs of the decoder are 64-bit code-word and 8-bit check bits. A large number of XOR gates are used to perform modulo-2 addition and to generate the syndrome bits. Calculating each syndrome bit requires an XOR tree with 27 inputs selected from the 64 data bits. As the parity check matrix is based on a four-bit structure, the basic XOR gates of the XOR tree are four-input XOR gates [5]. According to the error detecting and correcting rule, if there is one or more bits in the syndrome are not "0", at least one error occurred during the transmission. The eight-input OR gate, as shown in Figure 4.2, is used to detect whether there is an error. For SEC-DED code, if there are odd numbers of "1" in the syndrome, it is considered that there is a single-error occurred and the error is correctable. If there are even numbers of "1" in the syndrome, this is a double-error which can not be corrected. Thus, as shown in Figure 4.2, another eight-input XOR gate is used in the circuit to determine whether it is a single-error or a double-error. The AND gates array and XOR gates shown at the bottom of Figure 4.2 are used to correct the single error. Figure 4.2 Logic diagram of a (64, 72) Hamming decoder [5]. It can be seen that the multiple-input XOR cells make the most important part of this circuit in terms of the operation speed. Further research on this circuit application will be presented in the following sub-sections. ## IV-3 Approach of the Evaluation of the Operation Speed From the logic diagram of the system shown in Figure 4.2, we can see that the decoder circuit includes a large tree of XOR gates. The speed, the power dissipation and the size of the Hamming decoder are greatly related to those XOR gates employed. We are going to evaluate the performance, in terms of the speed and the power dissipation, of the Hamming decoder in which the proposed XOR gates described in Chapter 3 are used. It is difficult to simulate the entire circuit due to its complexity. Moreover, it is unnecessary to do so for evaluating the operation speed. In fact, the evaluation can be done by just testing the delay in the critical paths. For a (64, 72) Hamming decoder, the critical path is indicated by the thick lines as shown in Figure 4.3. The simplified gate scheme showing the critical path of the Hamming decoder is illustrated in Figure 4.4. This approach of critical path facilitates the evaluation of the speed of the decoder circuit. The simulation results will be presented in the next sub-section. Figure 4.3 Critical path of the Hamming decoder for testing the operation speed of the circuit. Figure 4.4 Simplified critical path of the Hamming decoder. U1, U2, U4 are four-input XOR gates. U3, U5 are two-input XOR gates. U6 is a CMOS two-input AND gate. U7 is a multiplexer. #### IV-4 Simulation of the Critical-Path of the Hamming Decoder In this sub-section, the performance, in terms of speed and power dissipation, of Hamming decoder circuits using different kinds of multiple-input XOR gates described in Chapter 3 is evaluated by simulation and the results are compared. #### 4-4-1 Test Circuits and Test Conditions Four types of multiple-input XOR cells, presented in Figures 3.24~3.26, have been chosen for the electrical simulation. Scheme (a): Complementary CMOS XOR cell. Scheme (b): XOR gate consisting of NMOS pass-transistor-logic without pull-up transistors. Scheme (c): XOR gate consisting of PMOS and NMOS pass-transistor-logic cell. Scheme (d): XOR gate consisting of NMOS pass-transistor-logic XOR cells with pull-up PMOS transistors used to restore the level of the gate voltage. Other components that used in the circuits are: one AND gate, one two-to-one multiplexer. The AND gate uses CMOS logic. The multiplexer is a pass-transistor-logic switch. The circuits are simulated using the transistor models of a 0.18 micro process. The supply voltage is 1.8V. For the pass-transistor-logic circuits, the physical W/L size of PMOS and NMOS transistors are 0.22 $\mu$ m/0.18 $\mu$ m(W/L). Instead of applying different test vectors to measure the propagation delay of the decoder, we take a simple approach to test the worst case of delay. As described in Chapter 3-5-1, the longest delay occurs when the signals propagate through the MOS gates of the pass transistors of the XOR gate tree. As the delay of the XOR gates dominates that of the decoder, the worst case of the XOR gates reflects that of the decoder. Figure 4.5 illustrates part of the signals applied and the expected responses. The power dissipation is that of the devices constituting the critical path and measured under the condition of the worst case of delay. Figure 4.5 Part of the signals applied to the circuit shown in Figure 4.4 for the simulation and the expected responses at the MOS gate nodes of the circuit, as well as that at the output node. ## 4-4-2 Results of the Comparison The simulation results are shown in Table 4.1. Table 4.1 Simulation result of the critical-Path circuit of the decoder | Scheme | XOR gates used in the decoder circuit | Max delay<br>(ns) | Power dissipation (10 <sup>-5</sup> W) | |-----------------------|--------------------------------------------------------------------------------|-------------------|----------------------------------------| | (a) | Complementary CMOS | 1.82 | 2.818 | | (b)<br>Figure<br>3.24 | NMOS pass-transistor-<br>logic XOR gate<br>without pull-up PMOS<br>transistors | 1.55 | 20.67 | | (c)<br>Figure<br>3.25 | NMOS-PMOS pass-tran-<br>sistor-logic combined<br>XOR gate | 1.04 | 23.26 | | (d)<br>Figure<br>3.26 | NMOS pass-transistor-<br>logic XOR gate with pull-<br>up PMOS transistors | 0.59 | 2.1396 | From the simulation results presented in Table 4.1, we can see that the circuit using NMOS pass-transistor-logic XOR gates with pull-up transistors has the highest operation speed, which is 3.1 times faster than that using the complementary CMOS gates, 1.8 times faster than that using NMOS-PMOS pass-transistor-logic combined XOR gates, and 2.6 times faster than that using the NMOS pass-transistor-logic XOR gates without pull-up PMOS transistors. It is evident that the improvement of the speed of the XOR gates helps to improve significantly that of the decoder. As shown in Table 4.1, the circuit with Scheme (a), the complementary CMOS XOR gates has a low-power dissipation as we expected. However, the power dissipation of the circuit involving Scheme (d) (using NMOS pass-transistor-logic XOR gates with pull-up PMOS transistors) is even lower. 1.3 times lower than that of Scheme (a) (using complementary CMOS XOR gates). Such results are coherent with those presented in Chapter 3-5-2. It should be mentioned that the circuit used for the simulation does not contain the complete XOR gate tree of the decoder. The benefit of power reduction of the proposed XOR gates would be more significant in the complete decoder circuit as the XOR gates take more important part in this circuit. #### IV-5 Design of a Prototype Circuit for Estimation of the Propagation Delay To further analyze the performance of the circuit, a scheme of prototype for estimating the operation speed of the decoder is proposed using a 0.18 µm technology. The design approach, the structure of the circuit, and the required control signals, are described in the following sub-sections. # 4-5-1 Design approach for the Evaluation of the Operation Speed by Oscillation-Test Method. The core of the prototype circuit is the gates forming the critical path for the delay of the Hamming decoder. The simulation results presented in Table 4.1 show that the delay of the decoder is about 1 ns or below. Thus, the frequency of the input signal should be about hundreds of mega-Hertz. Operations at such a high operation frequency is not supported by the available test facilities in our lab. Particularly, it would be very difficult to input such a high-speed signal to the circuit. One of the solutions to the problem is to generate the input signals inside the test chip. Moreover, the problem of the I/O delay may affect the observation and measurement of the fast varying output signal. Hence, in the test circuit, the circuit unit of the critical path is placed a number of times to have a multiple delay so that the frequency of the signal passing through the output pad would be divided to facilitate the measurements. One of the frequently used method, called oscillation-test method [37][38], is applied in order to test the delay of the circuit. There is an odd number of inversion blocks, each of which consists an odd number of inversion gates, cascaded in the test circuit. The output terminal of the last block is shorted to the input terminal of the first block. The circuit becomes an ring oscillator and generate a signal to feed itself. To secure a good initial condition for the oscillation of the circuit, an input stimulus is applied to initiate the oscillation. Figure 4.6 Digital oscillation-test method for delay test. The number of the inversion blocks in the test circuit that we proposed is twenty-three. Each inversion block is a critical-path circuit of the Hamming decoder as shown in Figure 4.7. The block has an inversion function as required by the oscillator-test method. Therefore, the cycle-time of the ring oscillator shown in Figure 4.6 is expected to be $2 \times 23$ ; times the delay of one critical-path block shown in Figure 4.7. Figure 4.7 Critical-path block as an inversion unit. ## 4-5-2 Estimation of Signals Generated by the Test Circuits Four oscillation-test circuits are built. Each of them has the structure as shown in Figure 4.6. Each of the inversion block in this structure is the critical path circuit shown in Figure 4.7. In these four oscillation-test circuits, four kinds of XOR gates. CMOS gate, NMOS PTL gate without pull-up transistors. PMOS-NMOS PTL combined gate, and NMOS PTL gate with pull-up PMOS are employed, respectively. The cycle-time of the output signal of each of the four oscillation-test circuits has been estimated by $T = 2 \times n \times t_d$ , where $t_d$ is the delay of each inversion block and is obtained in the simulations in Section 4-4-2. The estimated values are shown in Table 4.2. These results show that, with the twenty-three repeating blocks, the cycle-time of the output signal of the test circuits is long enough to be measured easily. These results can also be used to verify the correctness of the signal in the chip test. Table 4.2 Cycle-Time Estimation of the Test Circuit | | cycle-time (ns) | |------------------------------------|-----------------| | CMOS | 82.8 | | NMOS PTL gate without pull-up PMOS | 40.0 | | PMOS-NMOS PTL combined gate | 31.3 | | NMOS PTL gate with pull-up PMOS | 36.3 | ## 4-5-3 Realization of the Oscillation-Test Circuit When we implement the proposed test circuit, some issues should be addressed to facilitate the circuit test. #### 4-5-3-a Control Signals Restricted usually by the chip size and the number of I/O pins. we have to minimize the number of control signals in the test circuit. However, three kinds of control signals, as shown in Figure 4.8, cannot be spared. - Loopetrl. NODE1, and NODE2 are loop controls for initiating the logic state of the blocks of the oscillator, and for checking the logic state of these blocks. - INITa. INITb. and RESETi are used to control the inputs of the circuit. - ZERO is a compel signal for manually resetting logic state of the blocks. Figure 4.8 Structure of the oscillation-test circuit with the controls signals. The detail of each block is shown in Figure 4.7. The control signals are as follows. INITa, INITb Used to initiate the circuit. **RESETi** Enabling the initiation of the circuit. **Loopetri:** Used to make the loop open or closed. NODE1, NODE2Used to separate blocks for problem-finding in case of malfunction. **ZERO** Forcing the output voltage to zero when it is low. It can be used to check the function of the output buffer and the test bed. Several voltage followers are needed to observe the voltage signals of some internal nodes such as OUT<sub>1</sub>. OUT<sub>2</sub>....OUT<sub>i</sub>, in order to verify the functionality of the circuit. #### 4-5-3-b Specific Considerations for Some Details Besides the core part of the oscillation-test circuit, some specific considerations need to be taken into account to ensure the testability of the circuit. #### Output Buffer To output signals of the test circuit, buffers are needed to drive large capacitive loads contributed by connections and the output pads. Tapered buffers can be used in this case. The total delay of such buffers can be estimated using the following equations [46]. $$\tau_{piotal} = (N+1) \times \tau_0 \left(\frac{C_{load}}{C_g}\right) \text{ and } (N+1) = \frac{\ln\left(\frac{C_{load}}{C_g}\right)}{\ln \alpha}$$ (4.1) where. N is the number of stages, $\alpha$ is the size scale factor. $C_g$ is the input capacitance of the first stage inverter, and $\tau_0$ is the delay of the $i^{th}$ stage. In case that $C_{load}=0.05 \mathrm{pF}$ . $\tau_0\approx 0.06 ns$ . $\frac{C_{load}}{C_g}=50$ , and $\tau_{total}<1 ns$ is required which is less than 3% of the cycle-time of the oscillation-test circuit, a three-stage buffer with the scale factor $\alpha=3$ is sufficient to meet the delay requirement. #### Pass-Transistor Switches for Control Signals As shown in Figure 4.8, each of the control switch is a complementary switch. If single-NMOS switches are used, the problem of "poor-1" may result in static power dissipation in the following stages in the circuits. In some cases, the circuit delay may be affected by the poor high level. Thus, complementary switches are used in the test circuits for transferring control signals with a full swing voltage to shorten the delay and to output good voltage levels. #### Observation of the Voltage Signal at Some Critical Nodes In order to observe a voltage signal variation at an internal node for circuit analysis, two measures should be taken. Firstly, a voltage follower should be placed. It can transfer a voltage signal to the node without adding a significant resistive and capacitive load. Secondly, an analog output pad instead of a digital one, is needed to be able to observe a continuously varying signal, instead of a binary one. Other special considerations in the chip design, such as ESD protection, circuit redundancy, are also arranged to ensure the circuit to be properly tested. #### **IV-6** Conclusion In this chapter, we have evaluated the speed and the power dissipation of a (64, 72) Hamming decoder which is mainly composed of multiple-input PTL XOR cells. Many of the XOR gates have 27 inputs. Therefore, the speed and power dissipation of the Hamming decoder circuit and mainly determined by the XOR gates used in the circuit. To avoid complex simulation of the entire decoder circuit which has a large number of transistors and nodes, we have built a critical-path circuit of the decoder and evaluated the speed of the decoder by means of this critical-path approach. The simulation results show that the decoder circuit, involving the NMOS pass-transistor-logic XOR gate with pull-up PMOS transistors, has shorter delay than that using CMOS XOR gate and other PTL XOR gates. No doubt that the use of the improved multiple-input XOR gates will also lower the power dissipation of the decoder. These advantages of using the improved multiple-input XOR gates is also significant in other circuits, such as adders, parity checkers and pattern generators. In order to further estimate the circuit performance, a scheme of a prototype circuit is proposed using a 0.18 µm technology. Since the output signal of the test circuit has a very high operation frequency which is hundreds of mega-Hertz, we multiply the circuit and use the ring oscillation method to generate input signals inside the chip and to facilitate the measurement of the output signals. For the implementation of the test circuit, several issues have been addressed to make the circuit testable. Firstly, we have summarized the effective control signals in the circuit. It should be mentioned that the number of the control signals should be minimized. Secondly, the need of buffers has been explained. Thirdly, the choice of the switches for control signal in the test circuit has been justified. The use of CMOS switches are to optimize the speed of the signal transfer. Finally, we has explained the importance of voltage followers and appropriate use of the output pads. The test circuit has been implemented. However due to the mis-connection of the power pads, the circuit test could not be completed. Nevertheless, we have used this implementation to find, apart from detecting the connection faults, necessary measures to be taken for the next prototype implementation. The possibility of using other methods evaluating the performance of the decoder, such as using DFF method or pattern generators [43] will also be considered. ## Chapter 5 ## Conclusion The objective of this research is to design XOR gates in the transistor level to achieve high performances in terms of delay, power dissipation and robustness. The main focus is on those XOR gates with a large number of inputs. The research has been done by two major steps: the investigation on the previous work on the design of XOR gates, and the proposal of the an approach of designing high-speed and low-power XOR circuits with a large number of inputs. Variety of XOR gate designs, most of which are non-conventional CMOS twoinput XOR gates, have been studied. These circuits, such as pass-logic circuits and mixed logic circuits, usually are much simpler than that of the conventional CMOS ones, but, with the disadvantages, such as poor output voltage levels which affects the noise margins, speed and power dissipation of the gate. Some level recovering techniques have been proposed to solve the problems. However, most of these techniques have their limitations, and the level restoring is done either by adding a level recovering inverter or compensating for the poor level by a feedback (Section II-4). The former does not solve the problem of additional power dissipation due to the poor level, and in the latter a new level problem can be created. Having studied different schemes, we concluded that placing a pair of PMOS transistors on a complementary output node pair of the pass-transistor-logic gate can be an efficient solution to the level problem and that does not introduce extra power dissipation. However, the way to apply this technique for the optimization of the performance of pass-transistor-logic circuits in general, and multiple-input XOR circuits in particular, needs to be studied. Since the focus of the thesis is mainly on the design of multiple-input XOR gates. in Chapter 3. a comparative analysis of different structures of XOR gates is presented. Based on this analysis, an approach is proposed to improve the speed of large pass-transistor-logic gates without sacrificing the power dissipation. In this approach, the voltage level recovery technique is used to shorten the transient period. This can be done by two aspects. On one hand, all gate voltages of the pass-transistor are recovered to the full-swing magnitude to maximize the transistor currents during transient periods. On the other hand, the voltage swing at the other intermediate nodes are kept as small as possible so that only a very small voltage variation is required during the transition. Using this approach, we have presented a design example of four four-input pass-transistor-logic XOR gates. A comparative study of the four different four-input XOR gates (including the proposed one) has been carried out, the results of the theoretical analysis and the electrical simulations of the circuits have shown that the speed of the circuit scheme using the proposed approach is improved more than tripled compared with the CMOS gate and more than doubled compared with the pass-transistor-logic gate without compensation. Its power dissipation is lowered compared to that of the other schemes. For the XOR gate with a large number of inputs (the number of inputs is greater than 4), the effectiveness of the proposed approach has also been proven by the studies presented in the thesis. We have also presented a detailed analysis of the XOR circuits with a large number of inputs based on the proposed approach. For this analysis, we have presented a modified equivalent RC-network model that provides a more realistic characteristics of the multiple-input XOR circuits than that by the classical RC-network model. We have also studied the dynamic operation of the pass-transistor-logic circuits with the pull-up PMOS transistors, and proposed a guideline for determining the parameters of the transistors. To further evaluate the performance of the XOR circuits designed using the proposed approach, a circuit example of a Hamming Decoder involving the new XOR gates has been presented. A prototype circuit for an estimation of the propagation delay of the decoder has been proposed and simulated. The result shows the circuit using the proposed approach is more than three times faster than the conventional CMOS one. We plan to have the circuit processed for further evaluations. Based on the work of the thesis, our further research on the design of XOR circuits with a large number of inputs will focus on the adaptation to different applications. As mentioned previously. XOR gates are widely used in signal processing and generation. However, the types of the implementations are different, because the XOR operations are performed differently. For example, the XOR gates in a parallel multiplier need to be arranged in an array incorporated with other logic gates. Rather than a single multiple-variable XOR function in the case of a Hamming decoder, the multiplier performs a large number of two-variable XOR operations combined, in almost all the stages, with other logic operations. The structure of such combined logic functions can be found in many other circuits, and the regularity of combinations may change from circuit to circuit. Using the proposed design approach, we will study and explore the regularities of the structure of functions to propose approaches of designing pass-transistor-logic processing circuits with better operation speed and power efficiency than the existing ones. ## **Bibliography** - [1] K. Yeo, et al., "CMOS/BiCMOS ULSI: Low-Voltage, Low-Power", Prentice Hall, 2002 - [2] K. Gray. "Adding error-correcting circuitry to ASIC memory," *IEEE Spectrum*, vol. 374, pp. 55-60. Apr. 2000. - [3] L. Litwin, et al., "Error control coding: an overview of modern coding techniques in a digital communications system." *IEEE Potentials*, vol. 20, no. 1, pp. 26-28, Feb. Mar. 2001. - [4] L. Litwin, et al., "Linear block codes: a popular type of error correction codes," *IEEE Potentials*, vol. 20, no. 1, pp. 29-31, Feb. Mar. 2001. - [5] M. Y. Hsiao. "A class of optimal minimum odd-weight-column SEC-DED codes." *IBM Journal. Research Develop*, vol. 14, pp395-401, July 1970. - [6] C. L. Chen, et al., "Error-correcting codes for semiconductor memory applications: a state-of art review," *IBM Journal Research Develop*, vol. 28, no. 2, pp124-134, Mar. 1984. - [7] F. Alzahrani, et al., "On-chip TEC-QED ECC for ultra-large, single-chip memory systems," *IEEE International Conference, Computer Design: VLSI in computers and proceedings*, pp132-137, 1994. - [8] B. Benjauthrit, et al., "An overview of error control codes for data storage," *IEEE International NonVolatile Memory Technology Conference*, pp120-126, 1996. - [9] L. K. Wang, et al., "A low power high speed error correction code macro using complementary pass transistor logic circuit." *IEEE ASIC Conference and Exhibition*, pp17-20, 1997. - [10] C. Santos, "Byte-serial (72, 64) hamming error correcting encoder for high speed data transmission." MIT Electrical Engineering and Computer Science, http://cerberus.lcs.mit.edu/6.371/reports/santos/final.html. - [11] M. Heshami, et al., "A 250Mhz skewed-clock pipelined data buffer," *IEEE Journal of Solid-State Circuits*, vol. 31, no. 3. pp376-383, Mar. 1996 - [12] G. Giacalone, et al., "A 1MB, 100MHz intergrated L2 cache memory with 128b interface and ECC protection," *IEEE International Solid-State Circuits Conference*, pp370-371, 1996. - [13] R. Vancu. et al., "A 35nS 256k CMOS EEPROM with error correcting circuitry," *IEEE International Solid-State Circuits Conference*, pp64-65, 1990. - [14] H. L. Davis. "A 70nS word-wide 1-M bit ROM with on-chip error-correction circuits." *IEEE Journal of Solid-State Circuits*, vol. sc-2-, no. 5, pp958-963, Oct. 1985. - [15] M. Asakura, et al., "An experimental 1-M bit cache DRAM with ECC," *IEEE Journal of Solid-State Circuits*, vol. 25, no. 1, pp5-10, Feb. 1990. - [16] J. A. Fifield, et al., "High-speed on-chip EDD for synergistic fault-tolerant memory chips," *IEEE Journal of Solid-State Circuits*, vol. 26. no.10. pp1449-1452, Oct. 1991. - [17] C. Su, et al., "Structural approach for performance driven ECC circuit synthesis," IEEE Design Automation Conference, 1997. Proceedings of the ASP-DAC'97 Asia and South Pacific, pp89-94, 1997. - [18] A. Wu, et al., "Fast, area-efficient CMOS parity generation," *IEEE Circuits and Systems*, 1990. Proceedings of the 33rd Midwest Symposium, pp874-876, 1990. - [19] J. Wang, et al., "New efficient designs for XOR and XNOR functions on the transistor level," *IEEE Journal of Solid-State Circuits*, vol. 29, no. 7, pp780-786, 1994. - [20] D. Huang, et al., "On CMOS exclusive OR design," *IEEE Proceedings of the 32nd Midwest Symposium on Circuits and Systems*, vol.2. pp829-832. 1990 - [21] K. Cheng. et al., "High efficient 3-input XOR for low-voltage low-power high speed applications," *IEEE ASIC*, *AP-ASIC99*, pp166-169, 1999. - [22] R. Zimmermann, et al., "Low-power logic styles: CMOS versus pass-transistor logic," *IEEE Journal of Solid-state Circuits*, vol. 32. no. 7. pp1079-1090, 1997. - [23] H. Lee, et al., "New low-voltage circuits for XOR and XNOR," *IEEE Southeastcon* '97. *Proceedings*, pp225-229, 1997. - [24] H. A. Mahmoud. et al., "A 10-transistor low-power high-speed full adder cell." *Proceedings of the 1999 IEEE International Symposium*, vol. 1. ppI43-I46, 1999. - [25] A. M. Shams, et al., "A novel low-power building block CMOS cell for adders." *IEEE Circuits and Systems, ISCAS'98, Proceedings of the 1998 IEEE International Symposium*, vol. 2. pp153-156, 1998. - [26] T. S. Cheung. "Pass-transistor logic and its sub-V<sub>DD</sub> voltage swing behaves in low-voltage circuit design." *IEEE Microelectronics and VLSI*, TENCON'95 Region 10 International Conference, pp307-310, 1995. - [27] D. Radhakrishnan, "Low voltage CMOS full adder cells." *IEEE Electronics Letters*, vol. 35. no. 21. pp1792-1794, 14th Oct. 1999. - [28] K. Lin, et al., "A low-cost realization of multiple-input exclusive-OR gates," IEEE ASIC Conference and Exhibit, Proceedings of the Eighth Annual IEEE International, pp307-310, 1995. - [29] C. Wickman, et al., "Cost models for large file memory DRAMs with ECC and bad block marking," IEEE 1999 International Symposium on Defect and Fault Tolerance in VLSI Systems. pp319-327, 1999. - [30] M. Rudack, et al., "Yield enhancement considerations for a single-chip multiprocessor system with embedded DRAM," *IEEE 1999 International Symoposium on Defect and Fault Tolerance in VLSI Systems*, pp31-39, 1999. - [31] C. H. stapper. "Synergistic fault-tolerance for Memory chips," *IEEE Transactions on Computers*, vol. 41, no. 9, Sept. 1992, pp1078-1087. - [32] N. Zhuang, et al., "A new design of the CMOS full adder," *IEEE Journal Solid-state Circuits*, vol. 27, no. 5, pp840-844, May 1992. - [33] A. Parameswar, et al., "A swing restored pass-transistor logic-based multiply and accumulate circuit for multimedia applications," *IEEE Journal Solid-state Circuits*, vol. 31, pp805-809, June 1996. - [34] M. Song, et al., "Design methodology for high speed and low power digital circuits with energy economized pass transistor logic (EEPL)," *Proceedings 22nd European Solid-state Circuits Conference*, Neuchatel, Switzerland, Sept. 1996, pp120-123. - [35] K. Furutani et al., "A built-in Hamming code ECC circuit for DRAM's," *IEEE Journal Solid-state Circuits*, vol. 24, pp50-56, Feb. 1989. - [36] H. Kalter, et al., "A 50 ns 16Mbit DRAM with a 10-ns data rate and on-chip ECC," *IEEE Journal Solid-state Circuits*, vol. 25, pp1118-1128, Oct., 1990. - [37] K. Arabi, et al., "Digital oscillation-test method for delay and stuck-at fault testing of digital circuits," *IEEE Proceedings, International Test Conference*, pp91-100, 1998. - [38] C. Dufaza. "Multiple paths sensitization of digital oscillation built-in self test." Computer Design, 1999. (ICCD'99) International Conference, pp166-174, 1999. - [39] P. Maurine, et al., "Output transition time modeling of CMOS structures," IEEE 20001 International Symposium on, Circuits and Systems, 2001. ISCAS 2001, Volume: 5, pp 363-366, 2001. - [40] T. Sakurai, et al., "Alpha-power model, and its application to CMOS inverter delay and other formulas," *IEEE Journal Solid-state Circuits*, vol.25, pp.584-594, April 1990. - [41] S. Dutta, et al., "A comprehensive delay model for CMOS inverters," *IEEE Journal of Solid-state Circuits*, vol. 30, no. 8, pp.864-871, 1995 - [42] T. Sakurai, et al., "A simple MOSFET model for circuit analysis," *IEEE Transactions On electron devices*, vol. 38, no. 4, pp.887-894, April, 1991. - [43] P. Franco, et al., "Analysis and detection of timing failures in an experimental test chip," *Proceedings of IEEE International Test Conference*, pp691-700, Oct., 1996. - [44] S. Lin, "Error Control Coding," Prentice-Hall, Inc. Englewood Cliffs, New Jersey, 1983. - [45] N. H. Weste, et al., "Principle of CMOS and VLSI Design A Systems Perspective 2nd Edition," Addison Wesley 1993. - [46] S. Kang, et al., "CMOS Digital Integrated Circuits: Analysis and Design," The McGraw-Hill Companies, Inc. 1999. - [47] W. Wolf, "Modern VLSI Design: System-on-Chip Design," Prentice Hall, Hardcover, 3rd edition, 2002. - [48] T. Rao et al., "Error-Control Coding for Computer Systems," Prentice-Hall, Englewood Ciffs, New Jersey, 1989. - [49] A. Krstia. et al., "Delay fault testing for VLSI circuit." Boston. Kluwer Academic Publishers, c1998. - [50] K. Seng. et al., "CMOS/BiCMOS ULSI: Low-Voltage, Low-Power", Prentice Hall, 2002