INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps.

ProQuest Information and Learning
300 North Zeeb Road, Ann Arbor, MI 48106-1346 USA
800-521-0600

UMI®
NOTE TO USERS

This reproduction is the best copy available.

UMI
A VHDL Code Generator for Reed-Solomon Encoders and Decoders

Vladimir Glavac

A Thesis
In
The Department
Of
Electrical and Computer Engineering

Presented in Partial Fulfillment of the Requirements
For the Degree of Master of Applied Science at
Concordia University
Montreal, Quebec, Canada

April 2003
© Vladimir Glavac, 2003
The author has granted a non-exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author’s permission.

L’auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

L’auteur conserve la propriété du droit d’auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Abstract

A VHDL Code Generator for Reed-Solomon Encoders and Decoders

Vladimir Glavac

Reed-Solomon codes are error correcting codes that are used in many applications such as satellite communications, digital audio tape, and in CDROMs. Such diverse applications call for the use of many different Reed-Solomon codes. The topic of this thesis is the development of a program to produce synthesizable VHDL code for an arbitrary Reed-Solomon encoder or decoder. A novel extension of the Massey-Berlekamp algorithm for solving the key equation is presented. This modified algorithm is a key aspect of the Reed-Solomon decoder designs discussed in this thesis. The details of the design of both RS encoders and decoders are presented in detail. A program written in a high level language was designed so as to generate the VHDL code that corresponds to the algorithms for encoding and decoding. Several encoders and decoders were synthesized for the Xilinx XCV1000 series of field programmable gate arrays (FPGAs). The resulting area and speed metrics are presented for several designs of Reed-Solomon encoders and decoders.
Dedicated to my parents, and my children Emilie and Jeremy.
ACKNOWLEDGEMENTS

I would like to thank Dr. Reza Soleymani for having inspired me, and guided me during my graduate studies and my thesis. I would also like to thank the management of EMS Technologies Canada Limited for having encouraged me to go ahead and do this work. I would especially like to thank all of my friends and colleagues at work who provided me with many hours of lively discussions on digital communications and mathematics. Finally, I would like to thank my parents, for having given all that they could, and who have taught me that perseverance pays off.
# Table of Contents

List of Figures ................................................................................................................. ix
List of Tables .................................................................................................................. xii
Table of Acronyms ......................................................................................................... xiii

1 Introduction .................................................................................................................. 1
   1.1 Error-Correcting Codes ......................................................................................... 2
      1.1.1 Linear Block Codes ...................................................................................... 4
      1.1.2 Convolutional Codes .................................................................................... 14
   1.2 Digital Design Implementation Technologies ......................................................... 15
      1.2.1 Integrated Circuits (SSI, MSI, LSI, VLSI) .................................................... 15
      1.2.2 Application Specific Integrated Circuits (ASICs) ........................................... 16
      1.2.3 Programmable Logic .................................................................................... 18
   1.3 Contributions and Contents of the Thesis ................................................................. 19
   1.4 Previous Work ........................................................................................................ 20

2 Reed-Solomon Encoding and Decoding Algorithms .................................................... 24
   2.1 RS Encoding ........................................................................................................ 24
   2.2 Syndrome Calculation .......................................................................................... 25
   2.3 Chien Search ....................................................................................................... 25
   2.4 Decoding for 1 Error ............................................................................................ 27
   2.5 Decoding for 2 Errors .......................................................................................... 31
   2.6 Decoding For 3 or More Errors ........................................................................... 36
      2.6.1 Original Massey-Berlekamp Algorithm ......................................................... 37
      2.6.2 Inversionless Massey-Berlekamp Algorithm ................................................ 38
      2.6.3 Extended Inversionless Massey-Berlekamp Algorithm ................................. 40

3 Structure of the RS Encoder/Decoder Core Generator .............................................. 45
   3.1 File Structure ....................................................................................................... 45
   3.2 VHDL Core Generator for a RS Encoder ............................................................. 46
   3.3 VHDL Core Generator for a RS Decoder for 1 or 2 Errors ................................. 46
   3.4 VHDL Core Generator for a RS Decoder for More Than 2 Errors ..................... 48
   3.5 Test Bench .......................................................................................................... 49
3.6 Utility Functions and Procedures ......................................................... 49

4  VHDL Implementation of Galois Field Operations .................................. 54
   4.1 VHDL Implementation of a Galois Field Adder .................................. 54
   4.2 VHDL Implementation of a Galois Field Multiplier .............................. 55
   4.3 VHDL Implementation of a Galois Field Inverter ................................ 59

5  VHDL Design of a RS Encoder ................................................................. 61
   5.1 Encoder Overview ............................................................................. 61
   5.2 VHDL Implementation of an RS Encoder ........................................... 63
      5.2.1 Encoder Timing Diagram ........................................................... 63
      5.2.2 Encoder Control Logic ............................................................... 64
      5.2.3 Parity Registers and Output Signals ............................................ 66

6  Synthesis and Test Results for RS Encoders ........................................... 68
   6.1 General Remarks ............................................................................. 68
   6.2 RS Encoder Synthesis Speed Results .............................................. 68
   6.3 RS Encoder Synthesis Area Results ................................................ 70
   6.4 Encoder Testbench ........................................................................... 71

7  VHDL Design of a General RS Decoder (1 or 2 errors) .......................... 73
   7.1 Decoder Overview ............................................................................ 73
   7.2 Syndrome Calculation ....................................................................... 73
      7.2.1 Syndrome Calculation Timing Diagram ...................................... 74
      7.2.2 Syndrome Calculation Constants, Signals, and Control Logic ....... 74
      7.2.3 Syndrome Registers ................................................................. 78
   7.3 Chien Search and Error Correction for 1 Error ................................... 80
      7.3.1 Constants and Signal Definition ............................................... 80
      7.3.2 Control Signals ......................................................................... 81
      7.3.3 Delay Block .............................................................................. 84
      7.3.4 Chien Search, Error Calculation and Error Correction ................ 84
   7.4 Chien Search and Error Correction for 2 Errors ................................. 87
      7.4.1 Constants and Signal Definition ............................................... 87
      7.4.2 Control Signals ......................................................................... 89
      7.4.3 Delay Block .............................................................................. 91
### List of Figures

<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 1</td>
<td>Block Diagram of a Digital Communication System</td>
<td>3</td>
</tr>
<tr>
<td>Figure 2</td>
<td>Block Diagram of a (4,3,4) Convolutional Encoder</td>
<td>14</td>
</tr>
<tr>
<td>Figure 3</td>
<td>Schematic Diagram Example</td>
<td>16</td>
</tr>
<tr>
<td>Figure 4</td>
<td>VHDL Example For a Counter</td>
<td>18</td>
</tr>
<tr>
<td>Figure 5</td>
<td>Chien Search Block Diagram</td>
<td>26</td>
</tr>
<tr>
<td>Figure 6</td>
<td>Reduced Form of Chien Search For Single Error Correcting Code</td>
<td>29</td>
</tr>
<tr>
<td>Figure 7</td>
<td>Single Error Correcting RS Decoder Block Diagram</td>
<td>30</td>
</tr>
<tr>
<td>Figure 8</td>
<td>Double Error Correcting RS Decoder Block Diagram</td>
<td>35</td>
</tr>
<tr>
<td>Figure 9</td>
<td>Multi-Error (&gt;2) Correcting RS Decoder Block Diagram</td>
<td>44</td>
</tr>
<tr>
<td>Figure 10</td>
<td>Example of a Generics.txt File Used by the Testbench</td>
<td>49</td>
</tr>
<tr>
<td>Figure 11</td>
<td>VHDL Code for Galois Field Addition, $GF(2^8)$</td>
<td>55</td>
</tr>
<tr>
<td>Figure 12</td>
<td>VHDL Code for Galois Field Multiplication, $GF(2^4)$, $p(x)=x^4+x+1$</td>
<td>59</td>
</tr>
<tr>
<td>Figure 13</td>
<td>VHDL Code for Galois Field Inversion, $GF(2^4)$, $p(x)=x^4+x+1$</td>
<td>60</td>
</tr>
<tr>
<td>Figure 14</td>
<td>Alternate VHDL Code for Galois Field Inversion, $GF(2^4)$, $p(x)=x^4+x+1$</td>
<td>60</td>
</tr>
<tr>
<td>Figure 15</td>
<td>Block Diagram of a Generic Reed-Solomon Encoder</td>
<td>62</td>
</tr>
<tr>
<td>Figure 16</td>
<td>RS Encoder Timing Diagram</td>
<td>64</td>
</tr>
<tr>
<td>Figure 17</td>
<td>VHDL Code of Encoder Control Signals</td>
<td>65</td>
</tr>
<tr>
<td>Figure 18</td>
<td>VHDL Code of Encoder Parity Registers</td>
<td>66</td>
</tr>
<tr>
<td>Figure 19</td>
<td>VHDL Code of Encoder Output Signals</td>
<td>67</td>
</tr>
<tr>
<td>Figure 20</td>
<td>Reed-Solomon Encoder Maximum Speed</td>
<td>69</td>
</tr>
<tr>
<td>Figure 21</td>
<td>Reed-Solomon Encoder Area (in slices)</td>
<td>71</td>
</tr>
<tr>
<td>Figure 22</td>
<td>RS Encoder Testbench Structure</td>
<td>72</td>
</tr>
<tr>
<td>Figure 23</td>
<td>RS Encoder Simulation Waveforms</td>
<td>72</td>
</tr>
<tr>
<td>Figure 24</td>
<td>Syndrome Calculation Block Diagram</td>
<td>73</td>
</tr>
<tr>
<td>Figure 25</td>
<td>Syndrome Calculation Timing Diagram</td>
<td>74</td>
</tr>
<tr>
<td>Figure 26</td>
<td>Syndrome Calculation - VHDL Constant and Signal Definition</td>
<td>75</td>
</tr>
<tr>
<td>Figure 27</td>
<td>Syndrome Calculation - VHDL Code For Control Logic</td>
<td>77</td>
</tr>
<tr>
<td>Figure 28</td>
<td>Syndrome Calculation Simulation Waveforms</td>
<td>78</td>
</tr>
</tbody>
</table>
Figure 60. Chien Search Pipeline Block Diagram, $t=5$, $m_0 = 1$ ........................................ 115
Figure 61. Chien Search Pipeline Block Diagram, $t=5$, $m_0$ not 1 .................................... 115
Figure 62. Error Correction ................................................................................................. 115
Figure 63. Chien Search Constants and Signals ................................................................. 117
Figure 64. Internal Lambda and Omega Process and System Counter ......................... 118
Figure 65. Internal Control Signal Processes ..................................................................... 119
Figure 66. Present Location X1 and Powers of X1 Pipelines ............................................ 120
Figure 67. Evaluation of Omega and Lambda Prime Pipelines, and Calculation of the
Correction Factor .............................................................................................................. 121
Figure 68. Chien Sum Pipeline .......................................................................................... 122
Figure 69. Data Delay, and Error Correction Processes ................................................. 123
Figure 70. Reed-Solomon Decoder Maximum Speed ...................................................... 125
Figure 71. Reed-Solomon Decoder Area (in slices) ......................................................... 127
List of Tables

Table 1. BCH Codes Generated by Primitive Elements of Order Less Than 27............ 11
Table 2. Equivalent Gate Count for SSI / MSI / LSI / VLSI ........................................ 15
Table 3. RS Encoder/Decoder Core Generator Files.................................................. 45
Table 4. An Example of an RS Encoder/Decoder Core Generator Parameter File.......... 46
Table 5. Binary addition ............................................................................................. 55
Table 6. Binary multiplication ..................................................................................... 56
Table 7. Elements of GF(2^4), p(x)=x^4+x+1 ............................................................... 57
Table 8. Maximum Encoder Speed (MHz) vs. Error Correcting Ability ....................... 69
Table 9. Encoder Area (in slices) vs. Error Correcting Ability .................................... 70
Table 10. Chien Search Pipeline Lengths ..................................................................... 112
Table 11. Maximum Decoder Speed (MHz) vs. Error Correcting Ability ...................... 125
Table 12. Decoder Area (in slices) vs. Error Correcting Ability .................................... 126
Table 13. Device Utilization and Performance of Existing Encoder Cores ..................... 128
Table 14. Xilinx Virtex FPGA Family Members .......................................................... 130
Table 15. Accelerator Family Selection Guide............................................................. 130
Table 16. Device Utilization and Performance of Existing Decoder Cores ..................... 131
Table of Acronyms

<table>
<thead>
<tr>
<th>Acronym</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASCII</td>
<td>American Standard Code for Information Interchange</td>
</tr>
<tr>
<td>ASIC</td>
<td>Application Specific Integrated Circuit</td>
</tr>
<tr>
<td>BCH</td>
<td>Bose-Chaudhuri-Hocquenghem</td>
</tr>
<tr>
<td>CAD</td>
<td>Computer Aided Design</td>
</tr>
<tr>
<td>CPLD</td>
<td>Complex Programmable Logic Devices</td>
</tr>
<tr>
<td>DSP</td>
<td>Digital Signal Processing</td>
</tr>
<tr>
<td>FEC</td>
<td>Forward Error Correction</td>
</tr>
<tr>
<td>FIR</td>
<td>Finite Impulse Response</td>
</tr>
<tr>
<td>FPGA</td>
<td>Field Programmable Gate Array</td>
</tr>
<tr>
<td>IP</td>
<td>Intellectual Property</td>
</tr>
<tr>
<td>LCM</td>
<td>Least Common Multiple</td>
</tr>
<tr>
<td>LSI</td>
<td>Large Scale Integrated Circuit</td>
</tr>
<tr>
<td>MSI</td>
<td>Medium Scale Integrated Circuit</td>
</tr>
<tr>
<td>PAL</td>
<td>Programmable Array Logic</td>
</tr>
<tr>
<td>PCB</td>
<td>Printed Circuit Board</td>
</tr>
<tr>
<td>PLD</td>
<td>Programmable Logic Device</td>
</tr>
<tr>
<td>RAM</td>
<td>Random Access Memory</td>
</tr>
<tr>
<td>RS</td>
<td>Reed-Solomon</td>
</tr>
<tr>
<td>SSI</td>
<td>Small Scale Integrated Circuit</td>
</tr>
<tr>
<td>UART</td>
<td>Universal Asynchronous Receiver/Transmitter</td>
</tr>
<tr>
<td>VHDL</td>
<td>VHSIC Hardware Description Language</td>
</tr>
<tr>
<td>VHSIC</td>
<td>Very High Speed Integrated Circuit</td>
</tr>
<tr>
<td>VLSI</td>
<td>Very Large Scale Integrated Circuit</td>
</tr>
</tbody>
</table>
1 Introduction

Reed-Solomon codes are error-correcting codes that are used in many applications such as satellite communications, digital audio tape, and in CDROMs. Such diverse applications call for the use of many different Reed-Solomon codes. The modern digital systems designer is then faced with the problem of implementing these different error-correcting codes. The difficulty of this task has been greatly reduced through the use of Field Programmable Gate Arrays (FPGAs) or gate arrays and through the use of a hardware description language, such as VHDL. The approach is to write the functionality of the system in a top-down approach using VHDL, mapping the code into hardware elements through the process of synthesis, and finally to simulate the resulting netlist using a suitable testbench. The synthesized design can then be loaded into an FPGA and used, or a gate array may then be produced.

This approach is facilitated by the use of pre-written sections of design, called cores, and then just “plugging” them into the design. These cores, which provide the complete functionality for desired components, such as, for example, FIR filters, UARTs, and interrupt controllers, are typically available free of charge from FPGA vendors for many of the simpler functions, or at a cost from Intellectual Property (IP) vendors, for more complex functions, such as Reed-Solomon encoders/decoders. This thesis will describe a program that is a “core generator” for Reed-Solomon encoders and decoders. The user enters the desired parameters of the RS (Reed-Solomon) code, such as code word length,
error correcting capability, initial root of the code generator polynomial, the size of the Galois field elements, and the field generator polynomial. The program then will produce the VHDL code for either a RS encoder or decoder, as well as the VHDL code for the required testbench.

The rest of this chapter will provide an introduction to error-correcting codes, followed by a short history of digital design methodologies. Finally, the contents and contributions of this thesis will be presented.

1.1 Error-Correcting Codes

Consider the block diagram of a digital communication system as shown in Figure 1 [11]. In this system, a data source sends its digital data (bits) to the data sink over the communication channel. The channel can come in various forms, such as a pair of twisted wires, a fiber optic bundle, or space itself in the case of radio or satellite communications. It may even be a storage medium, such as a floppy disc, or a computer memory. In this case, the "reception" may happen at a much later time than the transmission. This is in fact communication in time as opposed to the usual communication that happens in the spatial domain. In any case, the desired result is to transmit this data as reliably as possible from source to destination (sink).
The source encoder block, also called the data compression unit, is used to remove redundant information. This has the benefit of having to transfer fewer bits for the same source, thus either requiring less bandwidth, or taking less time to transmit for a given transmission rate. At this point, the data stream may be encrypted, if necessary. The channel encoder adds redundant information to the bit stream in order to be able to detect and correct errors in transmission, through the use of a carefully chosen "error correcting code". This digital bit stream is then sent to the modulator/transmitter which converts the digital data stream into a signal that can be transmitted over the "channel". It is the channel that is not in the control of the communications engineer. The channel is the medium over which the signal is sent from point A to point B. As such, it is the block that introduces noise into signal.

The actions of the blocks described above on the transmitter side must be reversed at the receiver side. First the receiver/demodulator takes the analog signal from the channel and converts it into symbols that the channel decoder can understand. It is in the channel decoder that the decoding algorithm for the error correcting code is applied. This results in a corrected digital data stream, in the case of there being a number of errors less than or equal to the error correcting capability of the code, or it may result in a corrupted digital data stream if the number of errors exceeded the error correcting capability of the
code. The resultant data stream is then decrypted, if encryption was applied, and the decompressed.

Broadly speaking, error-correcting codes may be classified into block codes, and convolutional codes. When block codes are used, the data stream is broken down into a predetermined and fixed size blocks. A block code then adds a fixed number of additional bits to the block, resulting in a larger block, which is then transmitted. On the other hand, a convolutional code adds one or more redundant streams, which are transmitted in a time sequential manner, one bit from each stream at a time. This is a continuous process. The next few sections will discuss these two classifications in more detail. It should be noted that this is not an exhaustive treatment of the subject.

1.1.1 Linear Block Codes

Block codes take "k" bits from the data stream, and add "n-k" parity bits, resulting in "n" bits of encoded data. If the code is linear, the calculation of the codeword can be represented compactly by a matrix multiplication. Define the original message as a vector of k bits:

\[ m = \left( m_0, m_1, \ldots, m_{k-1} \right) \]  \hspace{1cm} (1.1)

Multiplying the vector \( m \) by a generator matrix, \( G \), forms the codeword.

\[ c = (c_0, c_1, \ldots, c_{n-1}) = m \cdot G \]  \hspace{1cm} (1.2)

This generator matrix depends on the code chosen, but is of the following form.

\[ G = [P \mid I_k] \]  \hspace{1cm} (1.3)
$I_k$ is the $k$ dimensional identity matrix, while $P$ is a $k \times (n-k)$ matrix describing the equations of the parity bits. The use of the identity matrix ensures that a direct copy of the original message appears in the codeword; this is called systematic encoding. The resulting $2^k$ codewords of length $n$ form a subspace over the $2^n$ possible vectors. The modulo-2 sum of any 2 codewords also results in a codeword. A parity-check matrix, $H$, constructed in the following manner can be used to test if a received vector is indeed a codeword.

$$H = [I_{n-k} : -P^T]$$  

(1.4)

This consists of a $(n-k)$ dimensional identity matrix, and the transpose of the submatrix $P$ given in the generator matrix $G$. $H$ is an $(n-k) \times n$ matrix. It can be shown [11] that any codeword $c$, multiplied by the transpose of $H$ results in a $(n-k)$ dimensional all zero vector.

$$c \cdot H^T = 0$$  

(1.5)

If the received codeword contains errors, then we can use the previous result to determine the error location. Since the received vector, $r$, is the sum of the original codeword $c$, and an error vector $v$. We calculate the syndrome $s$, of the received vector to determine error location.

$$s = r \cdot H^T = (c + e) \cdot H^T = c \cdot H^T + e \cdot H^T = 0 + e \cdot H^T = e \cdot H^T$$  

(1.6)

This is the basis of syndrome decoding. Of particular interest in the choice of a block code are the code rate and the error correcting capability of the code. The code rate is the ratio of unencoded bits to encoded bits.

$$\text{code rate} = \frac{k}{n}$$  

(1.7)
The error correcting capability of a code depends very much on the code chosen. Several examples of block codes will be shown in the following sections, including their decoding algorithms.

1.1.1.1 N-Repetition Codes

An "n" repetition code is the simplest of block codes. As the name suggests, it repeats every input bit a total of "n" times. Thus, this is a (n,1) block code. The algorithm that the channel decoder applies is called a majority vote. The number of 1's and 0's are counted, and the symbol with the highest count is chosen as the most likely candidate for the transmitted symbol. In order for no tie to occur, n is taken as an odd number. If we let \( n = 2k + 1 \), then this code can correct up to \( k \) errors in transmission. The code rate for such a code is a dismal \( 1/n \), thus precluding it from almost all practical use.

1.1.1.2 Single Error Detecting Code - Parity Codes

One of the first applications of error control codes was to detect if an error was present: so was born the single error detecting code, also called a parity code. A parity code takes "k" bits of information and appends 1 parity bit. This is a \((k+1,k)\) code. Two types of parity code are used, even parity and odd parity. In an even parity code, the parity bit is chosen so that the total number of 1's is even; in an odd parity code, the parity bit is chosen so that the total number of 1's is odd. If one error occurs in the transmission, then the number of 1's will change. The decoding algorithm is to count the number of 1's. If it is even (for an even parity code), then no error was detected. However, if the number of 1's is odd, then an error has been detected. Note that this code can only detect an error, it cannot correct the error. Instead, if an error is detected, a retransmission can be asked for from the sender.
1.1.1.3 Single Error Correcting Code – Hamming Codes

The first practical error correcting codes were invented by Hamming in 1948, and were published in 1950 [12]. These were a class of linear block codes capable of single error correction. The parameters of these codes are as follows:

- Codeword length: \( n = 2^m - 1 \)
- Number of information bits: \( k = 2^m - m - 1 \)
- Number of parity bits: \( n - k = m \)
- Error-correcting capability: \( t = 1 \)

The parity-check matrix of these codes is as follows:

\[
H = [I_m \mid Q]
\]  \hspace{1cm} (1.8)

\( I_m \) the \( m \times m \) identity matrix, and \( Q \) is made up of \( k \) columns of \( m \)-tuples, each having at least two 1's. The (7,4) Hamming code (\( m=3 \)) is a simple example of a single error correcting code. Let the parity check matrix be

\[
H = \begin{bmatrix}
1 & 0 & 0 & 0 & 1 & 1 & 1 \\
0 & 1 & 0 & 1 & 0 & 1 & 1 \\
0 & 0 & 1 & 1 & 1 & 0 & 1 \\
\end{bmatrix}
\] \hspace{1cm} (1.9)

The corresponding generator matrix is

\[
G = \begin{bmatrix}
0 & 1 & 1 & 1 & 0 & 0 & 0 \\
1 & 0 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 0 & 0 & 1 & 0 \\
1 & 1 & 1 & 0 & 0 & 0 & 1 \\
\end{bmatrix}
\] \hspace{1cm} (1.10)
Each separate single bit error corresponds to a unique syndrome value, as shown in the following table.

<table>
<thead>
<tr>
<th>error pattern</th>
<th>Syndrome value</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000000</td>
<td>000</td>
</tr>
<tr>
<td>0000001</td>
<td>111</td>
</tr>
<tr>
<td>0000010</td>
<td>110</td>
</tr>
<tr>
<td>0000100</td>
<td>101</td>
</tr>
<tr>
<td>0001000</td>
<td>011</td>
</tr>
<tr>
<td>0010000</td>
<td>001</td>
</tr>
<tr>
<td>0100000</td>
<td>010</td>
</tr>
<tr>
<td>1000000</td>
<td>100</td>
</tr>
</tbody>
</table>

Calculating the syndrome, and adding the corresponding error pattern to the received vector accomplish error correction.

1.1.1.4 Cyclic Codes

A very important class of blocks codes are cyclic codes. These codes have a rich inherent algebraic structure that simplifies their encoding and decoding. More than 1 error can be corrected by the choice of a proper cyclic code. Codes are defined as being cyclic if the cyclic shift of every codeword is also a codeword, as shown below, where all three vectors are codewords.

\[
\text{codeword}_1 = (c_{n-1}, c_{n-2}, \ldots, c_2, c_1, c_0)
\]
\[
\text{codeword}_2 = (c_{n-2}, \ldots, c_3, c_2, c_1, c_0, c_{n-1})
\]
\[
\text{codeword}_3 = (c_0, c_{n-1}, c_{n-2}, \ldots, c_2, c_1)
\]
If we consider the n dimensional codeword as the coefficients of a polynomial of degree n-1, c(x), with coefficients from GF(2), and the message vector similarly defined as polynomial of degree k-1, m(x), then their relationship can be expressed as follows:

\[ c(x) = m(x) \cdot g(x) \]  \hfill (1.12)

where g(x), called the generator polynomial, is a polynomial of degree n-k. There is a restriction placed on g(x), namely, it must be a factor of \( x^n - 1 \). [11] Once a generator polynomial is chosen, the parity polynomial h(x) can be determined by the equation:

\[ g(x) \cdot h(x) = x^n - 1 \]  \hfill (1.13)

The syndrome polynomial, \( s(x) \), is found by the following

\[ s(x) = r(x) \cdot h(x) \pmod{ x^n - 1 } \]  \hfill (1.14)

If \( s(x) \) is equal to the 0 polynomial, the conclusion is that there is no error.

Two important types of cyclic codes are BCH and Reed-Solomon codes, which are discussed below.

1.1.1.4.1 BCH Codes

BCH codes have the following parameters:

- Codeword length : \( n = 2^m - 1 \) (bits)
- Error-correcting capability : \( t \)
- Number of parity bits : \( n - k \leq m \cdot t \)
Central to the theory of BCH codes is the mathematics of Galois fields. Consider a set \( F \) of elements, which has 2 operations defined on it called addition “+” and multiplication “*”. The set \( F \) along with the two operations is called a field if [10]:

1. \( F \) is commutative under addition. The identity element for addition, called the zero element or the additive identity, is denoted by 0.
2. The set of non-zero elements in \( F \) is a commutative group under multiplication. The identity element for multiplication, called the unit element or the multiplicative identity, is denoted by 1.
3. Multiplication is distributive over addition; that is for any three elements \( a, b, \) and \( c \) in \( F \),

\[
a^*(b+c) = a^*b + a^*c
\]

The number of elements of a field is called the order of the field, and is designated by the letter \( q \). The order of an element \( a \) in \( F \) is the smallest positive integer \( n \) such that \( a^n = 1 \). An element from \( F \) is called a primitive element if its order is equal to \( q-1 \). Powers of primitive elements generate all the nonzero elements of \( F \). The minimal polynomial for an element \( a \) is the polynomial of smallest degree over \( GF(2) \) such that \( a \) is a root.

Let \( \alpha \) be a primitive element from \( GF(2^m) \). Then the generator polynomial \( g(x) \) for a BCH code is the lowest-degree polynomial over \( GF(2) \), which has 2t roots which are consecutive powers of \( \alpha \). If we designate \( \Phi_i(x) \) as the minimal polynomial of \( \alpha^i \), then \( g(x) \) is

\[
g(x) = LCM \{\Phi_1(x),\Phi_2(x),...,\Phi_{2t}(x)\}
\]

(1.15)

where LCM is the least common multiple [10]. Clearly, this a “t” error correcting code. However, we cannot increase t indefinitely. The following table (taken from [10]) illustrates some BCH codes and their error correcting capabilities.
<table>
<thead>
<tr>
<th>n</th>
<th>k</th>
<th>t</th>
<th>n</th>
<th>k</th>
<th>t</th>
</tr>
</thead>
<tbody>
<tr>
<td>7</td>
<td>4</td>
<td>1</td>
<td>127</td>
<td>120</td>
<td>1</td>
</tr>
<tr>
<td>15</td>
<td>11</td>
<td>1</td>
<td></td>
<td>113</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>7</td>
<td>2</td>
<td></td>
<td>106</td>
<td>3</td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>3</td>
<td></td>
<td>99</td>
<td>4</td>
</tr>
<tr>
<td>31</td>
<td>26</td>
<td>1</td>
<td>92</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td></td>
<td>21</td>
<td>2</td>
<td>85</td>
<td>6</td>
<td></td>
</tr>
<tr>
<td></td>
<td>16</td>
<td>3</td>
<td>78</td>
<td>7</td>
<td></td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>5</td>
<td>71</td>
<td>9</td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>7</td>
<td>64</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>63</td>
<td>57</td>
<td>1</td>
<td>57</td>
<td>11</td>
<td></td>
</tr>
<tr>
<td></td>
<td>51</td>
<td>2</td>
<td>50</td>
<td>13</td>
<td></td>
</tr>
<tr>
<td></td>
<td>45</td>
<td>3</td>
<td>43</td>
<td>14</td>
<td></td>
</tr>
<tr>
<td></td>
<td>39</td>
<td>4</td>
<td>36</td>
<td>15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>36</td>
<td>5</td>
<td>29</td>
<td>21</td>
<td></td>
</tr>
<tr>
<td></td>
<td>30</td>
<td>6</td>
<td>22</td>
<td>23</td>
<td></td>
</tr>
<tr>
<td></td>
<td>24</td>
<td>7</td>
<td>15</td>
<td>27</td>
<td></td>
</tr>
<tr>
<td></td>
<td>18</td>
<td>10</td>
<td>8</td>
<td>31</td>
<td></td>
</tr>
<tr>
<td></td>
<td>16</td>
<td>11</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>10</td>
<td>13</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>7</td>
<td>15</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 1. BCH Codes Generated by Primitive Elements of Order Less Than 27.

The syndrome polynomial can be calculated by polynomial multiplication. For large values of t, the error correcting capability of the code, the degree of the syndrome polynomial becomes proportionally large. As a consequence, the syndrome lookup table approach to error correction becomes impractical.
The error locator polynomial, $\sigma(x)$, is a polynomial whose roots are the locations of the errors. An iterative procedure, called the Massey-Berlekamp [3] algorithm, can be used to calculate the error locator polynomial from the syndrome polynomial. This algorithm requires $2t$ iterations to determine the coefficients of the error locator polynomial. Once this is done, each location in the received polynomial is tested, and if it is a root of the error locator polynomial, then the corresponding bit is reversed, thus correcting the error.

1.1.14.2 Reed-Solomon Codes

BCH codes are binary codes. Reed-Solomon codes are q-ary BCH codes. The parameters of these codes are:

- Codeword length: $n = 2^m - 1$ (symbols)
- Symbol length: $m$
- Error-correcting capability: $t$ (symbols)
- Number of parity symbols: $n - k = 2^t$ (symbols)

Note that for every additional error that can be corrected, only $2t$ additional parity symbols are needed. This is an improvement over BCH codes. Let $\alpha$ be a primitive element from $\text{GF}(2^m)$. Then the generator polynomial $g(x)$ is the lowest-degree polynomial over $\text{GF}(2)$ which has $2t$ roots which are consecutive powers of $\alpha$.

$$g(x) = (x + \alpha)(x + \alpha^2)\cdots(x + \alpha^{2t})$$ (1.16)

The syndrome polynomial can be computed as before, however, now we are dealing with operations over $\text{GF}(2^m)$ instead of $\text{GF}(2)$. Efficient shift register circuit configurations have been found to calculate the coefficients of the syndrome polynomial, as discussed in section 7.2.
Once the syndrome polynomial has been found, the error locator polynomial, $\Lambda(x)$, and the error evaluator polynomial, $\Omega(x)$, (also called the error magnitude polynomial) must be computed. The error evaluator polynomial is needed, because knowing the error location is not sufficient to correct an error. Each symbol is $m$ bits wide, thus one of the $2^m - 1$ possible error patterns must be added to the symbols at the error locations.

Several algorithms for the error locator polynomial have been devised, such as the Peterson-Gorenstein-Zieler algorithm [1], Euclid’s algorithm [12], and the non-binary Massey-Berlekamp algorithm [12]. The error evaluator polynomial can be computed as follows:

$$\Lambda(x)S(x) \equiv \Omega(x) \mod(x^2)$$  \hspace{1cm} (1.17)

Improvements have been made to the Massey-Berlekamp algorithm where both the error locator polynomial and the error evaluator polynomial are computed at the same time. This algorithm requires performing Galois field inversion, which is the most difficult of the basic operations of addition, multiplication and inversion. Recently, an inversionless Massey-Berlekamp algorithm was discovered [6], however, it only computes the error locator polynomial. An extension to this algorithm which calculates both polynomials at the same time without the use of inversion was discovered by this author [13], and is described in detail in section 2.6.3.

Once these two polynomials have been found, the roots of the error locator polynomial are found. An efficient method of performing this function is called the Chien search, and is described in section 2.3. The error value that must be added to the symbol at the error location is given by Forney’s algorithm as discussed in section 2.6.3.
1.1.2 Convolutional Codes

Convolutional codes work on a continuous stream of input bits, as opposed to a block of bits. Convolutional codes are described by three parameters \((n, k, m)\), where there are \(n\) output, \(k\) input, and memory size \(m\). Refer to the following figure which shows the structure of a \((4,3,4)\) convolutional encoder.

![Block Diagram of a \((4,3,4)\) Convolutional Encoder.](image)

A convolutional encoder generates \(n\) encoded bits for every \(k\) information bits. From the figure we see that the top chain does not contain any memory elements. This is the systematic part of the code. The remaining outputs are functions of the input and the memory elements in the \((m-1)\) memory chains. Thus an additional piece of information that is required is which of the inputs and memory elements are added together (modulo) to form each output. Note that each input bit at time \(t\) can possibly affect the outputs up to \(m\) time units later.

Decoding of convolutional codes is done by performing the Viterbi algorithm. This algorithm is a maximum likelihood decoding algorithm, and is equivalent to a dynamic programming solution to the problem [10]. The details of this algorithm are beyond the
scope of this thesis, but the reader is referred to texts in error control coding such as [10] and [11] for further information.

1.2 Digital Design Implementation Technologies

The implementation of digital designs has changed greatly over the last 25 years. In this section, we shall explore the development of the methods of implementing a design. This ranges from the use of integrated circuits in individual packages, to large ASICs capable of handling an entire design, and finally to the recent innovations in programmable logic.

1.2.1 Integrated Circuits (SSI, MSI, LSI, VLSI)

At the beginning of the electronics age, after transistors were first crafted in silicon, designers had to create designs out of basic building blocks such as AND gates and flip-flops. These functions were found inside integrated circuits, which are grouped into 4 categories, namely

<table>
<thead>
<tr>
<th>Technology type</th>
<th>Equivalent gate count, C</th>
</tr>
</thead>
<tbody>
<tr>
<td>Small Scale Integrated (SSI) circuits</td>
<td>$C \leq 12$</td>
</tr>
<tr>
<td>Medium Scale Integrated (MSI) circuits</td>
<td>$12 &lt; C &lt; 99$</td>
</tr>
<tr>
<td>Large Scale Integrated (LSI) circuits</td>
<td>$100 \leq C \leq 999$</td>
</tr>
<tr>
<td>Very Large Scale Integrated (VLSI) circuits</td>
<td>$C \geq 1000$</td>
</tr>
</tbody>
</table>

*Table 2. Equivalent Gate Count for SSI / MSI / LSI / VLSI.*

The basic functions such as AND gates, OR gates, inverters, and single or dual flip-flops are found in Small Scale Integrated (SSI) circuits. The line between MSI and LSI is more of a blur, but MSI was typically made up of shift registers, multipliers, adders, quad, hex and octal flip-flops, counters, multiplexers and demultiplexers and decoders. LSI, with its higher gate count and functionality includes FIFOs, memory (SRAM and
ROM), and bit-slice processors. VLSI is the realm of microprocessors and their support chips.

The interconnection of these elements was determined by hand and written onto schematics. Reduction of Boolean equations to a minimal set was done by hand. Each pin of device had to be accounted for. Once a schematic was produced, a printed circuit board (PCB) was made. This was the physical implementation of the design interconnection. If a design change was required, the process started by modifying the schematic, and then making a new PCB. Needless to say, this was a costly and time consuming cycle. An example of a small portion of schematic is shown in the following figure.

![Schematic Diagram Example](image)

**Figure 3. Schematic Diagram Example.**

### 1.2.2 Application Specific Integrated Circuits (ASICs)

The evolution of digital design implementation has culminated in the Application Specific Integrated circuits (ASICs). In these devices, a designer may implement an entire design consisting of many thousands, or even millions of gates. ASICs may be partitioned into 3 groups.

In order of increasing performance they are:
• standard cell
• full custom.
• gate arrays

With full custom ASICs, the designer creates the geometries which make up transistors. Very high densities and speeds can be achieved, but a disadvantage of full custom is that they are very costly and require a very long design time. Non-recurring costs can easily run into the hundreds of thousands or a million dollars. With the standard cell, a library of standard cells is used, and an automatic place and route tool performs the layout. Design time is usually faster than full custom for this reason. In addition, the manufacturing process is the same as for full custom, so the cost is still high. With gate arrays, a library of standard cells is mapped onto an array of transistors already present on a wafer, which can be produced in quantity ahead of time. A routing tool is used to create the masks for routing. Design time is fast, and manufacturing costs a less than for full custom or standard cell. Performance can approach that of standard cell and full custom.

When ASICs were first introduced, the design was entered by schematics, and tools would make the necessary mapping from logic to transistors. For large designs, the schematics could be hundreds of pages long. While the clutter of a large design could be mitigated by performing a top-down design approach, eventually, the burden of detail could overwhelm the designer. To accommodate the increasing complexity of designs, an alternative approach was invented.

Thus was born VHDL or VHSIC Hardware Description Language, where VHSIC means Very High Speed Integrated Circuit. Using this method, a descriptive language is used to specify the functionality of a digital circuit at a higher level. Synthesis and place and route tools are used to perform Boolean minimizations, state machine encoding and other low level tasks, and to map the design to the technology to be used. Actions to be performed are written in processes, with many levels of if-then-else possible. This makes
a complicated behavior possible in just a few lines of code. Here is an example of a counter in VHDL. References [24] and [25] provide excellent background to the VHDL language.

```vhdl
Signal counter : std_logic_vector (7 downto 0);

Process (clk)
Begin
    If (clk'event and clk='1') then
        If (reset_n='0') or (count_reset='1') then
            Counter <= (others => '0');
        Elsif (count_enable='1') then
            Counter <= Counter + 1;
        End if;
    End if;
End process;
```

Figure 4. VHDL Example For a Counter.

1.2.3 Programmable Logic

Programmable logic was invented to give the designer added flexibility, and the ability to integrate many small functions into one device. In addition, a company could stock pile one or two types of programmable logic, and entirely stop using much of the SSI and MSI functions, since they could be replicated within the programmable logic.

The first programmable devices were called PALs (programmable array logic) and PLDs (programmable logic devices). These could implement about 100 gates. They are arranged as an array of logic in sum of products form. The designer would make the design using a low level hardware language, such as ABEL or CUPL, which specifies the desired logic in sum of products form. The output of the logic could also be registered or not as needed.
As the complexity of designs increased, the need for larger programmable devices became apparent. This resulted in the development of Complex Programmable Logic Devices (CPLDs). These are essentially several PALs on one chip, with an interconnect matrix between them.

The next development was the introduction of Field Programmable Gate Arrays (FPGAs). In this technology, there are a large number of programmable gates, and a programmable interconnect between the gates. Some FPGAs have included other features such as built-in hardware multipliers, embedded memory, and even microprocessor. FPGAs have now reached the level of performance coming close to that of ASICs. For example, the largest Xilinx FPGA at present is the XC2V8000 which has the equivalent of 8 million system gates, 3024 Kbits of RAM (SelectRAM) and 168 (18 bit x 18 bit) hardware multipliers [14]. Internal clock speeds for register to register transfers can exceed 200 MHz.

1.3 Contributions and Contents of the Thesis

The structure of the thesis report is as follows. Chapter 2 will provide the necessary background into the theory of Reed-Solomon codes. This chapter will feature an improvement to the efficiency of the key equation solver by the use of an “extended inversionless Massey-Berlekamp algorithm”. Since Galois field inverters are very complex, several times more so than a Galois field multiplier, a key equation solver that eliminates the need for such an inverter can yield savings in decoder latency. Such an algorithm is the inversionless Massey-Berlekamp algorithm by Reed, Shih, and Truong [6]. However, the drawback to this algorithm is that it only computes the error-locator polynomial. The error-evaluator polynomial is computed then by multiplying the error-locator polynomial by the syndrome polynomial. This results in extra latency of the order of the error-locator polynomial, and as a result an extra number of gates and registers are needed, as will be shown. The extended inversionless Massey-Berlekamp algorithm overcomes this extra latency by computing both the error-locator polynomial
and the error-evaluator polynomial at the same time. The description and proof of this improved algorithm are included in this thesis.

Chapter 3 discusses the structure of the program that generates the VHDL code for an encoder or a decoder. Chapter 4 discusses the basic operations of addition, multiplication and inversion in Galois fields, and how they are mapped into VHDL code. Chapter 5 examines in detail the structure and VHDL implementation for an arbitrary RS encoder. This is followed by a description of the synthesis results for the encoder in Chapter 6.

The design of a RS decoder is broken down into decoders for 3 or more errors, and decoders for 1 or 2 errors. The reason for such a breakdown is due to a simplification in the decoding process for 1 or 2 error correcting codes. Chapter 7 will describe the structure and VHDL implementation for a RS decoder for 1 or 2 errors. It will be shown that a simplification to the decoding process is possible, namely the removal of the key equation solver, and a resulting small increase in complexity in the Chien search and error correction block.

Chapter 8 describes the structure and VHDL implementation for a RS decoder for 3 or more errors: the major blocks are the syndrome calculation, key equation solver, and the Chien search and error correction block. Chapter 9 will describe the synthesis results, in terms of area and speed, for all of the RS decoders described in Chapters 7 and 8 generated by the "RS core generator". Chapters 11 will present the conclusions. The target technology to be examined is Xilinx FPGAs, a very common class of devices used to implement DSP and communications related algorithms.

1.4 Previous Work

This section will examine previous that has been done in the area of VHDL code generation and high-level synthesis for FEC circuits. Two of the sources are Master's
theses [26, 27]. Reference 26, "High-level Synthesis and Its Application in the Design of Reed-Solomon Decoders", describes the use of high-level synthesis in the design of Reed-Solomon decoders. A Computer-Aided-Design (CAD) tool is implemented showing the advantages of high-level design techniques as applied to the versatile time-domain RS(n,k) decoder [32]. Techniques for optimizing performance are presented. An implementation of these methods to the versatile RS decoder is discussed and the area and speed metrics are shown. The basic Galois field operations are discussed in detail, since these functions form the basis of RS decoding.

Reference 27, "The Design of a VHDL Based Synthesis Tool For BCH Codecs", describes a VHDL code generator for Bose, Chaudhuri and Hocquenghem (BCH) codes. These codes are binary and are less complex in nature. Since they are binary, only the error location needs to be found. The decoding procedure is very close to that required to decode Reed-Solomon codes. First, the syndromes must be calculated from the incoming codeword. The Massey-Berlekamp algorithm can be used to find the error locator polynomial. Finally, the Chien search is used to find the roots of the error locator polynomial, and hence, the error locations. The corresponding bit is then inverted to correct the error.

In a sense, this document is closest to the topic of this thesis. Fast and efficient algorithms are developed for the basic operations, including a dual-polynomial basis multiplier, a parallel polynomial basis multiplier and a circuit for raising field elements to the third power. As in this thesis, the user enters the relevant parameters for the BCH code desired, and the core generator, written in C, produces VHDL code that is synthesizable. It presents different implementations for 1 error, 2 error, and multiple error correcting BCH codes, in order to optimize performance. Area and speed metrics are shown for BCH decoders of varying error-correcting capability.

Reference 28, "Very High-Level Synthesis of Control and Datapath Structures for Reconfigurable Logic Devices", describes an approach to the synthesis of inner loops that manipulate array variables using affine index access functions written in C directly to
VHDL. The approach uses analysis techniques that allow a variety of design implementations for space or execution time reduction. The approach is automatic, and it implements the data path and the control path separately. This separation allows the generation of several control strategies. The design of the VHDL core generator for Reed-Solomon encoders and decoders also applies this methodology of separating the data path from the control path. This permits the generation of VHDL code that is clearer and easier to debug.

The paper also describes two approaches to control design, namely, a state table driven approach, where the control signals are defined by simple setting or clearing depending on the state, and secondly, a counter based approach, where control signals are set or cleared based on the value of a counter which counts up from 0 when a “start” signal is observed. These two approaches are used in the VHDL core generator for Reed-Solomon encoders and decoders. The state table approach is used in the design of the syndrome calculation for RS decoders. The counter based approach is used in the other modules of the RS decoder, and is used in the design of the RS encoder.

Reference 29, “Code Generation Tools For Hardware Implementation of FEC Circuits”, is a brief paper which describes a VHDL code generator for Reed-Solomon decoders using Euclid’s algorithm. The VHDL code is synthesizable, and the area and speed metrics for several decoder designs is presented, targeted to Xilinx Field Programmable Gate Arrays. The core generator also generates the vectors for the test bench. The code generator is capable of generating VHDL code for most binary FEC decoders, such as Hamming, Cyclic Redundancy Check (CRC), and BCH codes. In the design of the RS decoder, a bank of Galois field multipliers is used, as opposed to a dedicated multiplier where it is needed. This results in less area, but requires tristate buses and reliable scheduling of the steps to avoid collisions.

Reference 30, “A Soft IP Compiler For Reed-Solomon Decoder”, a program that generates synthesizable VHDL code for Reed-Solomon decoders is presented. Erasure decoding is supported. The method of solving the key equation is the inversionless
Massey-Berlekamp algorithm, or a modified version called the reformulated inversionless Massey-Berlekamp algorithm. The choice of the algorithm is user controlled. The area and speed metrics are provided for several designs. The average throughput achieved is 500 Mbits/second in an ASIC. This is comparable to the results achieved by the core generator of this thesis when applied to Xilinx FPGAs.

Reference 31, "Design of a Reed Solomon Decoder using Partial Dynamic Reconfiguration of XILINX Virtex FPGAs – A Case Study", describes a methodology and design flow for VIRTEX FPGAs which enable designers to efficiently use the VIRTEX pRTR (partial run-time reconfigurability) features. The methodology and results are demonstrated on the design of a Reed-Solomon Coder/Decoder.
2 Reed-Solomon Encoding and Decoding Algorithms

2.1 RS Encoding

A Reed-Solomon code is a $q^m$-ary BCH code of length $q^m-1$. Code symbols are elements of the corresponding Galois field $GF(q^m)$. In this thesis, we will be dealing with extensions of binary codes, i.e., $q=2$. This field is generated by an primitive irreducible polynomial of degree $m$. Let $\alpha$ be a primitive element from $GF(2^m)$. The generator polynomial for the code capable of correcting "t" errors is of degree $2t$, and has as its roots, $2t$ consecutive powers of $\alpha$, that is:

$$g(x) = \prod_{i=0}^{2t-1} (x + \alpha^{m_i})$$  \hfill (2.1)

where $m_0$ is the log of the initial root. The message of $k*m$ bits is divided into $k$ symbols, each symbol being an $m$-bit element from $GF(2^m)$. Each symbol is regarded as a coefficient of a $k$-1 degree polynomial, $m(x)$. A total of $2t$ parity symbols are appended to the message, thus making a systematic encoding format. Such an RS code is denoted by 2 parameters, $n$ and $k$, and is written as RS($n,k$). The distance of RS codes is:

$$\text{distance} = n - k + 1 = 2t+1$$

Here there are $k$ message symbols, and $n-k = 2t$ parity symbols, for a total of $n$ symbols. RS codes may be shortened to RS($n',k'$), where $n'=n-1$, and $k'=k-1$. In this case, the distance property $d = 2t+1$ still holds. The encoded message, $c(x)$, is formed as follows [10]:

1. multiply the message $m(x)$ by $x^{2t}$
2. form the parity symbols, $b(x)$, by dividing the above result by the generator polynomial.
3. $c(x) = m(x)*x^{2t} + b(x)$

Thus, in order to specify an RS code completely, the following items must be described.
1. The degree of the field generator polynomial, m, and its coefficients.
2. The error-correcting capability, t, of the code.
3. The number of message symbols, k.
4. The log of the initial root of the code generator polynomial, m_0.

Note that there is a restriction placed on these parameters. First, we can calculate n, the total number symbols as n=k+2t. This number must be less than or equal to the maximum number of allowable symbols, namely (2^m-1). The higher order symbols are usually transmitted first.

2.2 Syndrome Calculation

The syndrome calculation is the first step in the decoding process. For a t-error correcting RS code, there are 2t syndromes that must be calculated. They are calculated as follows:

\[ S(j) = r(\alpha^{m_0-1}) \quad \text{where } j \in \{0..2t-1\} \]  

(2.2)

In the equation above, r(x) is the received codeword polynomial, and m_0 is the log of the initial root of the code generator polynomial. Since the received codeword polynomial is the sum of the message polynomial and the error polynomial, we can reduce this to

\[ S(j) = c(\alpha^{m_0-1}) + e(\alpha^{m_0-1}) = e(\alpha^{m_0-1}) = \sum_{i=0}^{\infty} Y_i \cdot X_{i \cdot m_0} \quad \text{where } j \in \{0..2t-1\} \]  

(2.3)

where the Y_i are the error values at the error locations X_i. This information will be used later in the decoding algorithms for 1 and 2-error correcting codes.

2.3 Chien Search

Another crucial step in the decoding process is to find the location of the errors in the received codeword. An iterative process, called the Chien search is the most efficient means of doing this. Consider the error locator polynomial, \( \sigma(x) \), which is a t degree polynomial whose roots are the error locations, X_i.
\[
\sigma(x) = \prod_{i=1}^{t} (x + X_i) = x' + \sigma_1 + \sigma_2 + \cdots + \sigma_t \times x + \sigma_t
\]  

(2.4)

Now, it can be shown [5], that the coefficients of the error-locator polynomial are related to the syndrome values, \(S(j)\), through a set of relations called Newton’s identities:

\[
S(t + j) + \sigma_1 \cdot S(t + j - 1) + \cdots + \sigma_j = 0 \quad \forall j
\]  

(2.5)

This fact will be called upon later in the next 2 sections when the equations for the error values for 1-error and 2-error correcting codes are derived. Any root of \(\sigma(x)\) will also satisfy the equation

\[
\sigma_1 \times x^{-1} + \sigma_2 \times x^{-2} + \cdots + \sigma_t x^{-t} = 1
\]  

(2.6)

The circuit shown in Figure 5 will implement such a search, one at a time (per clock cycle). Initially, the values of the registers are loaded with the coefficients of the error locator polynomial. A sum of the registers is formed, resulting in the Chien sum, and if it is 0, then an error has been located.

---

**Figure 5. Chien Search Block Diagram**
Since the received message polynomial, \( r(x) \), is received one symbol at a time, starting with the symbol for the first position, a pipelined approach to the decoder is possible. This is the approach taken in this thesis.

### 2.4 Decoding for 1 Error

When error correcting capability of a Reed-Solomon code is limited to one error, that is, \( n - k = 2 \), the process of computing the error-locator polynomial and the error-evaluator polynomial is not necessary. Instead, one can proceed with the Peterson-Gorenstein-Zierler decoding algorithm [1]. The derivation for a general 1-error correcting RS code will now be shown.

The generator polynomial for a 1-error correcting code is given by

\[
g(x) = \prod_{\alpha^0} (x + \alpha^{m-n})
\]

Consider now the syndrome equations for such an RS code. There are \( 2t = 2 \) syndromes, given by:

\[
S(0) = r(\alpha^{m_0}) \quad \text{and} \quad S(1) = r(\alpha^{1-m_0})
\]

(2.7)

where \( m_0 \) is the log of the initial root of the RS field generator polynomial. From this, we proceed as follows. Let the error-locator polynomial be given by

\[
\sigma(x) = \prod_{\alpha^n} (x + X_i) = x + \sigma_i
\]

where \( X_i \) is the location of the single error. The syndrome equations can be related to the error-locator polynomial via some algebraic manipulations, to arrive at the so called Newton’s identities [5]. In general, Newton’s identities are in the form of:

\[
S(t + j) + \sigma_i \cdot S(t + j + 1) + \ldots + S(j) \cdot \sigma_i = 0, \forall j
\]

(2.8)
For the case of a 1-error correcting code, t=1, we get: \( S(1) + \sigma_1 \cdot S(0) = 0 \). Thus \( \sigma(x) = x + \sigma_1 \) is the error locator polynomial, where \( \sigma_1 = -\frac{S(1)}{S(0)} \). So far we have calculated the position of the error, using just the syndrome values. Now, we must also calculate the error value from the syndromes, as follows:

\[
S(0) = r(\alpha^{m}) = Y_1 \cdot X_1^{-\kappa_1} \Rightarrow Y_1 = S(0) \cdot X_1^{-\kappa_1} \tag{2.9}
\]

where \( Y_1 \) is the error value, and \( X_1 \) is the error location.

The entire algorithm for a single error correcting code is then stated as follows:

1. Calculate the syndromes \( S(0) = r(\alpha^{m}) \) and \( S(1) = r(\alpha^{1+m}) \).

2. Let \( D_1 = S(0) \). If \( D_1 = 0 \), then there is no error, and STOP. If \( D_1 \neq 0 \) then there is an error.

3. Perform a Chien search on the entire received message. NOTE: the Chien search requires both syndrome values. If the element at position \( X_1 \) is found to be the location of an error, that is, when the Chien sum is 0, then
   a. calculate the error value: \( Y_1 = S(0) \cdot X_1^{-\kappa_1} \)
   b. correct the error: corrected_element = received_element + \( Y_1 \)

4. STOP

Note that for Galois fields of the form GF(2^m), that is, those that are extensions of the binary field, -a = a, and so we may modify the equation for the error location to

\[
X_1 = \frac{S(1)}{S(0)} \cdot X_1^{-\kappa_1} \cdot X_1^\kappa_1 = \frac{S(1)}{S(0)} \]

The equation for the error value remains the same.

It should be noted that equations have been derived to calculate the coefficients of the error locator polynomial and the error value directly from the syndrome values and the
error locations, as can be found in [5]. These equations, however, are only valid for a single value of $m_n$. Moreover, the size of these equations increases exponentially with the error correcting capability, and quickly become too cumbersome to use, since all of the error locations must be known before applying the error correction formulae. In addition, the Chien search is suited for use in a stream of data corresponding to a codeword, as has been assumed in this thesis. As an example, consider the use of the Chien search for the 1-error case. Referring to Figure 5, the 1-error correction case reduces to the following.

![Diagram](image)

**Figure 6. Reduced Form of Chien Search For Single Error Correcting Code**

In this figure, 2 registers are used. One is loaded with the value of “1”, and the other register is loaded with the value calculated for the error location, namely, $X_i = \frac{S(1)}{S(0)}$.

Every clock cycle, this register is multiplied by $\alpha$, and added to the register containing “1”, resulting in the Chien sum. If the Chien sum is “0”, then the location is the error location. A further simplification can be done by simply removing the register containing “1”, and testing the remaining register for the value of “1”. These are equivalent systems, and the second method is that used in the design of the single error correcting decoders in this thesis.
Also note, that this method is actually simpler than using a counter to determine the position of the error. This is due to the fact that error location calculated is the Galois field representation of the position. Therefore, one would have to perform a logarithm operation to determine the correct error position in terms of integer values, and then to use a counter to count to this value. Calculating the logarithm of finite field element is more complex than multiplying by \(\alpha\).

A block diagram of a pipelined single error correcting RS decoder is shown in Figure 7 below. An input strobe, \(RS\_data\_in\_start\) is used to signify the beginning of a new input codeword. The codeword enters the decoder in bitwise parallel format, one symbol at a time. It first enters into the Calculate_Syndrome block, where the 2 syndromes are calculated. In addition, a control signal Error_Present is also shown, and is set to '0' (inactive) if both syndromes are equal to 0, otherwise it is set to '1' (active). Finally, a strobe indicating the completion of the syndrome calculation is sent to the Chien_Search block. A delay block for the incoming RS symbols is necessary to compensate for the delays in the Calculate_Syndrome block, and any startup delays in the Chien_Search block. The Chien_Search block performs the function of finding the error locations, and correcting the data when the Chien sum is 0. The corrected data and an output data strobe are the outputs of the RS decoder.

![Figure 7. Single Error Correcting RS Decoder Block Diagram](image)
2.5 Decoding for 2 Errors

When the error correcting capability of a Reed-Solomon code is limited to two errors, the process of computing the error-locator polynomial and the error-evaluator polynomial is also not necessary. As in the previous section, one can proceed with the Peterson-Gorenstein-Zierler decoding algorithm [1]. The derivation for a general 2-error correcting RS code will now be shown.

The generator polynomial for a 2-error correcting code is given by

\[ g(x) = \prod_{i=0}^{3} (x + \alpha^{m_0 + i}) \]

Consider now the syndrome equations for such an RS code.

There are 2t=4 syndromes, given by:

\[
\begin{align*}
S(0) &= r(\alpha^{m_0}) \\
S(1) &= r(\alpha^{1+ m_0}) \\
S(2) &= r(\alpha^{2+ m_0}) \\
S(3) &= r(\alpha^{3+ m_0})
\end{align*}
\]  

(2.10)

where \( m_0 \) is the log of the initial root of the RS field generator polynomial. From this, we proceed as follows. Let the error-locator polynomial be given by

\[
\sigma(x) = \prod_{i=t}^{n} (x + X_i) = x^2 + \sigma_1 x + \sigma_2,
\]

where \( X_i \) is the location of the \( i \)th error. Once again, we use Newton's identities:

\[
S(t + j) + \sigma_1 \cdot S(t + j - 1) + \ldots + S(j) \cdot \sigma_i = 0, \forall j.
\]

In matrix form, the 2 equations in 2 variables are:

\[
\begin{bmatrix}
S(0) & S(1) \\
S(1) & S(2)
\end{bmatrix}
\begin{bmatrix}
\sigma_2 \\
\sigma_1
\end{bmatrix}
= 
\begin{bmatrix}
-S(2) \\
-S(3)
\end{bmatrix}
\]  

(2.11)
Let \( D_2 = S(0) \cdot S(2) + S(1)^2 \). This is the determinant of the 2x2 matrix given above. If \( D_2 \) is not 0, then 2 errors are present, and we proceed by solving for the coefficients of the error locator polynomial, \( \sigma(x) = x^2 + \sigma_1 \cdot x + \sigma_2 \). This is accomplished by solving this system of 2 equations in 2 variables. Using Cramer’s rule, the result is:

\[
\sigma_1 = \frac{\begin{vmatrix} S(0) & S(2) \\ S(1) & S(3) \end{vmatrix}}{D_2} = \frac{S(0) \cdot S(3) + S(1) \cdot S(2)}{D_2} = \frac{S(0) \cdot S(3) + S(1) \cdot S(2)}{S(0) \cdot S(2) + S(1)^2}
\]

\[
\sigma_2 = \frac{\begin{vmatrix} S(2) & S(1) \\ S(3) & S(2) \end{vmatrix}}{D_2} = \frac{S(2)^2 + S(1) \cdot S(3)}{D_2} = \frac{S(2)^2 + S(1) \cdot S(3)}{S(0) \cdot S(2) + S(1)^2}
\]

(2.12)

So far we have calculated the position of the error, using just the syndrome values. Now, we must also calculate the error value from the syndromes, as follows. The syndrome equations for the first 2 syndromes are:

\[
S(0) = r(\alpha^{m}) = Y_1 \cdot X_1^{m} + Y_2 \cdot X_2^{m}
\]

\[
S(1) = r(\alpha^{m+1}) = Y_1 \cdot X_1^{m+1} + Y_2 \cdot X_2^{m+1}
\]

(2.13)

where \( Y_1 \) is the error value, and \( X_1 \) is the error location of the first error, and \( Y_2 \) is the error value, and \( X_2 \) is the error location of the second error. Putting this into matrix form, we get

\[
\begin{bmatrix}
X_1^{m} & X_2^{m} \\
X_1^{m+1} & X_2^{m+1}
\end{bmatrix}
\begin{bmatrix}
Y_1 \\
Y_2
\end{bmatrix}
= \begin{bmatrix}
S(0) \\
S(1)
\end{bmatrix}
\]

(2.14)

We can now solve for the \( Y_i \)'s in term of the syndrome values \( S(0) \) and \( S(1) \), and the error locations \( X_1 \) and \( X_2 \). First, we define the determinant.
Let $\Delta = X_1^{m_1} \cdot X_2^{m_2} + X_1^{m_2} \cdot X_2^{m_1} = X_1^{m_1} \cdot X_2^{m_2} \cdot (X_1 + X_2)$. Then the $Y_i$'s are found to be:

$$
\begin{bmatrix}
Y_1 \\
Y_2
\end{bmatrix} = \frac{\begin{bmatrix}
S(0) \cdot X_2^{m_2} + S(1) \cdot X_1^{m_1} \\
S(1) \cdot X_1^{m_1} + S(0) \cdot X_1^{m_1}
\end{bmatrix}}{\Delta} = \frac{\begin{bmatrix}
S(0) \cdot X_1^{m_1} + S(1) \cdot X_2^{m_2} \\
S(1) \cdot X_1^{m_1} + S(0) \cdot X_1^{m_1}
\end{bmatrix}}{X_1^{m_1} \cdot X_2^{m_2} \cdot (X_1 + X_2)}
$$

(2.15)

The values of error locations are found during the Chien search. The Chien search goes through every location and tests it if it is an error location [5]. This gives us the value of $X_1$, at any one location. Since there are 2 errors, the error locator polynomial is a quadratic, and we may solve for the remaining $X_2$.

$$
\sigma(x) = x^2 + \sigma_1 \cdot x + \sigma_2 = (x + X_1) \cdot (x + X_2) = x^2 + (X_1 + X_2) \cdot x + X_1 \cdot X_2
$$

$$
\Rightarrow \sigma_i = X_1 + X_2, \text{or solving for } X_2, \quad X_2 = \sigma_i - X_1
$$

(2.16)

Given this, the equation for the error at location $X_1$ during the Chien search is:

$$
Y_1 = \frac{S(0) \cdot (X_1 + \sigma_i) + S(1)}{X_1^{m_1} \cdot \sigma_i}
$$

(2.17)

where $X_1$ is the present location, $S(0)$ and $S(1)$ are the first 2 syndrome values, and $\sigma_i$ is the coefficient of the first power of $x$ in the error locator polynomial, which has been previously calculated as $\sigma_i = \frac{S(0) \cdot S(3) + S(1) \cdot S(2)}{S(0) \cdot S(2) + S(1)^2}$.

This is the result for 2 errors. If the determinant $D_2$ was indeed 0, then we proceed as in the case for a single error, as discussed in section 2.4.
The entire algorithm for a double error correcting code is then stated as follows:

1. Calculate the 4 syndromes:
   \[ S(0) = r(\alpha^{m_i}) \]
   \[ S(1) = r(\alpha^{1-m_i}) \]
   \[ S(2) = r(\alpha^{2-m_i}) \]
   \[ S(3) = r(\alpha^{3-m_i}) \]

2. Based on the syndromes, calculate the determinant \( D_2 = S(0) \cdot S(2) + S(1)^2 \). If this value is 0, goto step 6.

3. Calculate the coefficients of the error locator polynomial from the syndromes:
   \[ \sigma_1 = \frac{S(0) \cdot S(3) + S(1) \cdot S(2)}{S(0) + S(2) + S(1)^2}, \]
   \[ \sigma_2 = \frac{S(2)^2 + S(1) \cdot S(3)}{S(0) + S(2) + S(1)^2} \]

4. Perform a Chien search on the entire received message. If the element at position \( (X_i) \) is found to be the location of an error, that is, when the Chien sum is 0, then
   a. calculate the error value:
   \[ Y_i = \frac{S(0) \cdot (X_i + \sigma_1) + S(1)}{X_i^{m_i} \cdot \sigma_1} \]
   b. correct the error: 
   \[ \text{corrected_element} = \text{received_element} + Y_i \]

5. STOP

6. Let \( D_1 = S(0) \), the value of the first syndrome. If \( D_1 = 0 \), then there is no error, and we must STOP. If \( (D_1 \neq 0) \) then there is an error, and we continue.

7. Perform a Chien search on the entire received message. If the element at position \( (X_i) \) is found to be the location of an error, that is, when the Chien sum is 0, then
   a. calculate the error value:
   \[ Y_i = S(0) \cdot X_i^{-m_i} \]
   b. correct the error: 
   \[ \text{corrected_element} = \text{received_element} + Y_i \]

8. STOP

Note that it was assumed that we are dealing with Galois fields of the form \( GF(2^m) \), that is, those that are extensions of the binary field. These are the fields of any practical use. A block diagram of a pipelined double error correcting RS decoder is shown in Figure 8 below. As before, an input strobe, \( \text{RS\_data\_in\_start} \) is used to signify the beginning of a
new input codeword. The codeword enters the decoder in bitwise parallel format, one symbol at a time. It first enters into the Calculate_Syndrome block, where the 4 syndromes are calculated. The control signal Error_Present is also generated. Finally, a strobe indicating the completion of the syndrome calculation is sent to the Chien_Search block. A delay block for the incoming RS symbols is necessary to compensate for the delays in the Calculate_Syndrome block, and any startup delays in the Chien_Search block. The Chien_Search block performs the function of finding the error locations, and correcting the data when the Chien sum is 0. Note that the Chien_Search block in this case is more complex than the for the single error correcting case, as has been described in the paragraphs above. The corrected data and an output data strobe are the outputs of the RS decoder.

![Diagram](image)

**Figure 8. Double Error Correcting RS Decoder Block Diagram**
2.6 Decoding For 3 or More Errors

This section is devoted to the case of RS decoding for 3 or more errors. In this case, the method outlined in Sections 2.4 and 2.5 are not appropriate for a pipelined design. For 3 or more errors, the resulting “t” x “t” sets of simultaneous equations becomes cumbersome and inefficient. A more elegant and efficient way to solve the key equations is the Massey-Berlekamp algorithm.

In this algorithm, the error-locator and the error-evaluator polynomial can be simultaneously computed. Once this is done, the Chien search is performed, and the error can be calculated using Forney’s algorithm [8]. One drawback of the Massey-Berlekamp algorithm is the need to perform Galois field inversion every iteration. There are two approaches to mitigate this. First, one may pipeline the algorithm, so that only either a (multiplication+addition) operation is performed, or an inversion is performed. This will reduce the maximum time needed between clock cycles, but increase the total number of clock cycles, thus increasing latency. The other approach is to put the inversion in the same clock cycle as the other operations, resulting in a slower clock speed.

An improvement to the Massey-Berlekamp algorithm is the inversionless Massey-Berlekamp algorithm [6]. In this algorithm, the need for the Galois field inversion has been eliminated by a clever modification to the equations used in every iteration. This results in an algorithm that requires the same number of iterations as the original, yet, since it does not perform inversion, can be speeded up. Unfortunately, the output of this algorithm is only one of the required polynomials, namely, the error-locator polynomial. The error-evaluator polynomial is calculated afterwards by the following equation:

\[ \lambda(x)S(x) \equiv \Omega(x) \mod(x^{2t}) \]  \hspace{1cm} (2.18)

Extra clock cycles are needed to perform such a polynomial multiplication, once again, resulting in a longer latency.

36
One of the important results of this thesis is an extension of the inversionless Massey-Berlekamp algorithm, which computes both the error-locator polynomial and the error-evaluator polynomial simultaneously. Thus latency is reduced, resulting in a faster decoder. The three algorithms will now be described, with a proof given for the extended inversionless Massey-Berlekamp algorithm.

### 2.6.1 Original Massey-Berlekamp Algorithm

Consider an RS code capable of correcting up to \( t \) errors, and without loss of generality, let \( m_0 = 0 \). The original Massey-Berlekamp algorithm is defined as follows [7]. First, initializations are made, let:

\[
\Lambda(x)^{0_1} = 1, B(x)^{0_1} = 1, \Gamma(x)^{0_1} = 0, A(x)^{0_1} = x^{-1}, L^{0_1} = 0
\]  

(2.19)

In the equations listed above, \( \Lambda(x) \) is the error locator polynomial, \( B(x) \) is the error-locator support polynomial, \( \Gamma(x) \) is the error-evaluator polynomial, and \( A(x) \) is the error-evaluator support polynomial. \( L \) is an integer variable. The algorithm proceeds iteratively, and the superscripts define the iteration level. Let the syndromes be represented by the syndrome polynomial:

\[
S(x) = \sum_{j=0}^{2t-1} S_j \cdot x^j
\]  

(2.20)

The algorithm iterates for \( 2t \) steps. At the \( (k+1)^{st} \) step, calculate the following term:

\[
\Delta^{(k+1)} = \sum_{j=0}^{l(k)} \Lambda_j^{(k)} \cdot S_{k-j}
\]  

(2.21)
Then, let
\[
\Lambda(x)^{(k+1)} = \Lambda(x)^{(k)} - \Delta^{(k+1)} \cdot B(x)^{(k)} \cdot x
\] (2.22)
\[
\Gamma(x)^{(k+1)} = \Gamma(x)^{(k)} - \Delta^{(k+1)} \cdot A(x)^{(k)} \cdot x
\] (2.23)

If \( \Delta^{(k+1)} \neq 0 \) or \( 2L^{(k)} > k \) then,
\[
B(x)^{(k+1)} = x \cdot B(x)^{(k)}
\] (2.24)
\[
A(x)^{(k+1)} = x \cdot A(x)^{(k)}
\] (2.25)
\[
L^{(k+1)} = L^{(k)}
\] (2.26)

otherwise
\[
B(x)^{(k+1)} = \frac{\Lambda(x)^{(k)}}{\Delta^{(k+1)}}
\] (2.27)
\[
A(x)^{(k+1)} = \frac{\Gamma(x)^{(k)}}{\Delta^{(k+1)}}
\] (2.28)
\[
L^{(k+1)} = k + 1 - L^{(k)}
\] (2.29)

Galois field inversion is required in steps (2.27) and (2.28), however it is the inverse of the same value, \( \Delta^{(k+1)} \). Thus, only one inverter is needed.

### 2.6.2 Inversionless Massey-Berlekamp Algorithm

An improvement to the Massey-Berlekamp algorithm was made in [6], by removing the need for performing the Galois field inversion. The original inversionless Massey-Berlekamp algorithm calculates the error-locator polynomial. The error-evaluator polynomial is calculated after. The algorithm can be stated as follows. Let:

\[
l^{(0)} = 0 \quad \text{and} \quad \gamma^{(k)} = 1 \quad \text{if} \quad k \leq 0
\]
\[
\mu(x)^{(0)} = 1, \lambda(x)^{(0)} = 1
\] (2.30)
This inversionless algorithm also iterates for 2t steps. At the \((k+1)\)th step, calculate the following term:

\[
\delta^{(k+1)} = \sum_{i=0}^{t} \mu_i^{(k+1)} \cdot s_{k-i}
\]  

(2.31)

Then, let

\[
\mu(x)^{(k+1)} = \gamma^{(k+1)} \cdot \mu(x)^{(k+1)} - \delta^{(k+1)} \cdot \lambda(x)^{(k+1)} \cdot x
\]

(2.32)

If \(\delta^{(k+1)}>0\) or \(2l^{(k)}>k\) then,

\[
\lambda(x)^{(k+1)} = x \cdot \lambda(x)^{(k+1)}
\]

(2.33)

\[
l^{(k+1)} = l^{(k)}
\]

(2.34)

\[
\gamma^{(k+1)} = \gamma^{(k+1)}
\]

(2.35)

otherwise

\[
\lambda(x)^{(k+1)} = \mu(x)^{(k+1)}
\]

(2.36)

\[
l^{(k+1)} = k + 1 - l^{(k+1)}
\]

(2.37)

\[
\gamma^{(k+1)} = \delta^{(k+1)}
\]

(2.38)

As can be seen, in this algorithm it is not required to perform inversion. When the algorithm terminates, the error-locator polynomial is \(\mu(x)\). At this point, the error-evaluator polynomial, \(\Omega(x)\), can be calculated from the syndrome polynomial and the error-locator polynomial as shown below:

\[
S(x) \cdot \mu(x) \equiv \Omega(x) \mod x^{2r}
\]

(2.39)

This operation takes place after the error locator polynomial is found, and thus, it adds to the latency of the decoder. The next section describes a new result, the extended inversionless Massey-Berlekamp algorithm, which computes both polynomials in tandem, thus saving the final step.
2.6.3 Extended Inversionless Massey-Berlekamp Algorithm

The inversionless Massey-Berlekamp algorithm can be extended to yield both the error-locator and the error-evaluator polynomial at the same time, also without the use of Galois field inverters. This eliminates the need for performing the required polynomial multiplication to determine the error-evaluator polynomial. The extension to the inversionless Massey-Berlekamp algorithm consists of the addition of the following steps to the inversionless algorithm:

\[
\Omega(x)^{(k+1)} = \gamma^{(k)} \cdot \Omega(x)^{(k)} - \delta^{(k+1)} \cdot a(x)^{(k)} \cdot x
\]  

(2.40)

If \( \delta^{(k+1)} = 0 \) or \( 2l^{(k)} > k \) then,

\[
a(x)^{(k+1)} = x \cdot a(x)^{(k)}
\]  

(2.41)
otherwise

\[
a(x)^{(k+1)} = \Omega(x)^{(k)}
\]  

(2.42)

The initial conditions are the same as those in the original Massey-Berlekamp algorithm, that is:

\[
\Omega(x)^{(0)} = 0, a(x) = x^{-1}
\]  

(2.43)

In the equations above, \( \Omega(x) \) is the error-evaluator polynomial, and \( a(x) \) is the error-evaluator support polynomial. These are similar equations as that shown in (2.32), (2.33), and (2.36) of the inversionless Massey-Berlekamp algorithm. The following theorem is postulated, and then proved.

**Theorem:** The polynomials and scalars computed in the extended inversionless Massey-Berlekamp algorithm and the original Massey-Berlekamp algorithm are related as follows:

\[
\mu(x)^{(k)} = \prod_{\ell=1}^{k-1} \gamma^{(\ell)} \cdot \Lambda(x)^{(k)}
\]  

(2.44)
\[ \Omega(x)^{(k)} = \prod_{i=1}^{k-1} \gamma^{(i)} \cdot \Gamma(x)^{(k)} \quad (2.45) \]

\[ \lambda(x)^{(k)} = \gamma^{(k)} \cdot B(x)^{(k)} \quad (2.46) \]

\[ a(x)^{(k)} = \gamma^{(k)} \cdot A(x)^{(k)} \quad (2.47) \]

\[ l^{(k)} = L^{(k)} \quad (2.48) \]

The proof for (2.43), (2.45), and (2.47) has been shown in the paper describing the inversionless Massey-Berlekamp algorithm which determines the error-locator polynomial [6]. The proof for (2.44), and (2.46) is presented here. The proof proceeds along similar lines as in [6], and is by induction. Clearly, for \( k=0 \), equations (2.44) and (2.46) hold. Consider the \( k^{th} \) iteration, the value calculated for \( \delta \).

\[ \delta^{(k+1)} = \sum_{j=1}^{k+1} \mu_{i,j} \cdot S_{i,j} \quad (2.49) \]

Substituting for \( \mu(x) \) from (2.43), and interchanging the summation and product, we find

\[ \delta^{(k+1)} = \prod_{i=1}^{k} \gamma^{(i)} \cdot \sum_{j=0}^{k} \lambda_{i,j} \cdot S_{i,j} \quad (2.50) \]

However, by examining (2.20), this is equal to

\[ \delta^{(k+1)} = \prod_{i=1}^{k} \gamma^{(i)} \cdot \Delta^{(i+1)} \quad (2.51) \]

By definition (2.39), we have

\[ \Omega(x)^{(k+1)} = \gamma^{(k)} \cdot \Omega(x)^{(k)} \cdot a(x)^{(k)} \cdot x \quad (2.52) \]

Substituting for \( \Omega(x) \) from (2.44), \( \delta^{(k+1)} \) from (2.50), \( a(x) \) from (2.46) we get:

\[ \Omega(x)^{(k+1)} = \prod_{i=1}^{k} \gamma^{(i)} \cdot \Gamma(x)^{(k)} \cdot \left( \prod_{i=1}^{k} \gamma^{(i)} \cdot \Delta^{(k+1)} \cdot A(x)^{(k)} \cdot x \right) \]

\[ \Omega(x)^{(k+1)} = \prod_{i=1}^{k} \gamma^{(i)} \cdot \left[ \Gamma(x)^{(k)} \cdot \Delta^{(k+1)} \cdot A(x)^{(k)} \cdot x \right] \quad (2.53) \]

but from (2.22) the term \( \left( \Gamma(x)^{(k)} \cdot \Delta^{(k+1)} \cdot A(x)^{(k)} \cdot x \right) \) is equal to \( \Gamma(x)^{(k+1)} \), thus
\[ \Omega(x)^{i_{i-1}} = \prod_{i=1}^{k} \gamma^{i_{i+1}} \cdot \Gamma(x)^{i_{i+1}} \] (2.54)

This concludes the proof for equation (2.44). We now proceed on to the proof of equation (2.46). There are 2 cases to consider. Let us consider first the case where \( \delta^{(k+1)} = 0 \). From (2.50) we see that \( \delta^{(k+1)} = 0 \) if \( \Delta^{(k+1)} = 0 \), and that \( \delta^{(k+1)} \neq 0 \) if \( \Delta^{(k+1)} \neq 0 \), because \( \gamma^{i_{i}} \neq 0 \) for \( i = -1, 0, \ldots, k-1 \). Hence, if \( \delta^{(k+1)} = \Delta^{(k+1)} = 0 \) or if \( 2l^{(k)} = 2L^{(k)} > k \), then by equation (2.40)

\[ a(x)^{i_{i-1}} = x \cdot a(x)^k \] (2.55)

Substituting for \( a(x)^k \) from (2.46), and noting that for this case \( \gamma^{i_{i+1}} = \gamma^{i_{i}} \) (see 2.34) and \( A(x)^{i_{i+1}} = x \cdot A(x)^k \) (see 2.24) we get

\[ a(x)^{i_{i-1}} = \gamma^{i_{i+1}} \cdot A(x)^{i_{i+1}} \] (2.56)

This proves equation (2.46) for the case \( \delta^{(k+1)} = 0 \) or if \( 2l^{(k)} > k \). Consider now the case where \( \delta^{(k+1)} \neq 0 \) and \( 2l^{(k)} \leq k \). From (2.41)

\[ a(x)^{i_{i-1}} = \Omega(x)^{i_{k}} \] (2.57)

Substituting for \( \Omega(x) \) from (2.44)

\[ a(x)^{i_{i-1}} = \prod_{i=1}^{i_{i-1}} \gamma^{i_{i}} \cdot \Gamma(x)^{i_{i}} \] (2.58)

Substituting for \( \gamma^{i_{i}} \) from (2.50) we get

\[ a(x)^{i_{i-1}} = \frac{\delta^{(k+1)}}{\Delta^{(k+1)}} \cdot \Gamma(x)^{i_{i}} \] (2.59)

For this case, from (2.38) we see that \( \delta^{(k+1)} = \gamma^{i_{i+1}} \). Also, from (2.27) we have

\[ A(x)^{i_{i+1}} = \frac{\Gamma(x)^{i_{k}}}{\Delta^{(i_{i+1})}} \] (2.60)

Thus (2.58) is reduced to:

\[ a(x)^{i_{i-1}} = \gamma^{i_{i+1}} \cdot A(x)^{i_{i+1}} \] (2.61)
This proves equation (2.46) for the case \( \delta^{(k+1)} = 0 \) or if \( 2^k \geq k \). Thus the theorem is proved. This algorithm has been simulated and verified in the C language, and implemented successfully in VHDL.

At this point, both the error-locator and the error-evaluator polynomials have been computed. The Chien search will find the zeros of the error-locator polynomial, i.e., the error locations. The next step is to correct the errors. This is done using Forney’s algorithm [8], which, for extensions of binary fields is:

\[
e_{i,i} = \alpha^{-1 \cdot i \cdot m_i} \frac{\Omega(\alpha^{-m_i})}{\Lambda'(\alpha^{-m_i})}
\]

In this equation, \( \alpha^{-m_i} \) is the present location of the Chien search, \( X_i \). The entire algorithm for a multi-error (t>2) correcting code is then stated as follows:

1. Calculate the 2t syndromes:
   \[
   S(j) = r(\alpha^{-m_i}), \quad j \in \{0 \ldots (2t-1)\}
   \]
   (2.63)

2. If all of the syndromes are 0, then there is no error, and we STOP.
3. Calculate the error-locator and the error-evaluator polynomials using the extended inversionless Massey-Berlekamp algorithm.
4. Perform a Chien search on the entire received message. If the element at position \( (X_i) \) is found to be the location of an error, that is, when the Chien sum is 0, then
   a. calculate the error value \( Y_i \) using Forney’s algorithm
   b. correct the error: corrected_element = received_element + \( Y_i \)

A block diagram of a pipelined multi-error (t>2) correcting RS decoder is shown in Figure 9 below. As before, an input strobe, RS_data_in_start is used to signify the beginning of a new input codeword. The codeword enters the decoder in bitwise parallel format, one symbol at a time. It first enters into the Calculate_Syndrome block, where the 2t syndromes are calculated. The control signal Error_Present is also generated. Finally, a strobe indicating the completion of the syndrome calculation is sent to the Key_Equation_Solver block, which implements the extended inversionless Massey-
Berlekamp algorithm. A delay block for the incoming RS symbols is necessary to compensate for the delays in the Calculate Syndrome and Key Equation Solver blocks, and any startup delays in the Chien Search block. The Key Equation Solver block calculates the error-locator and the error-evaluator polynomials. The Chien Search block performs the function of finding the error locations, and correcting the data, using Forney's algorithm, when the Chien sum is 0. The evaluation of the polynomials needed for error correction can be pipelined. The corrected data and an output data strobe are the outputs of the RS decoder.

Figure 9. Multi-Error (>2) Correcting RS Decoder Block Diagram
3 Structure of the RS Encoder/Decoder Core Generator

This Chapter describes the contents of the source files in terms of the procedures and functions within the files.

3.1 File Structure

The entire functionality of the RS Encoder/Decoder Core Generator is contained in 5 files, as shown in the table below. These files contain 9864 lines of code.

<table>
<thead>
<tr>
<th>filename</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Rs_enc.pas</td>
<td>Core generator for an RS encoder</td>
</tr>
<tr>
<td>Rs_dec_t1_t2.pas</td>
<td>Core generator for an RS decoder for 1 or 2 errors.</td>
</tr>
<tr>
<td>Rs_dec.pas</td>
<td>Core generator for an RS decoder for more than 2 errors.</td>
</tr>
<tr>
<td>Rs_tb.pas</td>
<td>Core generator for an RS encoder or RS decoder testbench.</td>
</tr>
<tr>
<td>rs_utils.pas</td>
<td>Contains Functions and Procedures used by all other 4 files.</td>
</tr>
</tbody>
</table>

Table 3. RS Encoder/Decoder Core Generator Files.

The parameters of the desired encoder/decoder are entered via an ASCII file. The structure of this input file is as follows. The user enters the size, m, of the Galois field of the basic field elements from GF($2^m$) on the first line. The second line contains the coefficients of the irreducible polynomial, p(x), used to generate the field. These coefficients are from GF(2). These describe the specific RS(n,k) code to be designed for. The third line contains the value of n, and the fourth line contains the error correcting capability of the code, t. Note that k = n - 2t. The fifth line contains the value of the log of the first root, m_0, of the generator polynomial, g(x), formed by:

$$ g(x) = \prod_{i=0}^{2t-1} (x + \alpha^{m_0 \cdot i^t}) $$

An example of such an input file is shown in the table below.
<table>
<thead>
<tr>
<th>Line</th>
<th>Value</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Line 1:</td>
<td>4</td>
<td>Galois field is GF($2^4$)</td>
</tr>
<tr>
<td>Line 2:</td>
<td>10011</td>
<td>Field generator polynomial is: P(x) = $x^4 + x^1$</td>
</tr>
<tr>
<td>Line 3:</td>
<td>12</td>
<td>N=12</td>
</tr>
<tr>
<td>Line 4:</td>
<td>2</td>
<td>T=2 =&gt; RS(12,8)</td>
</tr>
<tr>
<td>Line 5:</td>
<td>0</td>
<td>$m_0 = 0$, $g(x) = \prod_{i=0}^{1} (x + \alpha^i)$</td>
</tr>
</tbody>
</table>

Table 4. An Example of an RS Encoder/Decoder Core Generator Parameter File.

3.2 VHDL Core Generator for a RS Encoder

FILE : rs_enc.pas

procedure EncoderProcess;

There is only one procedure in this file. This procedure writes VHDL code for an RS encoder, using the algorithms specified in Section 2.1. The design of the code in VHDL is discussed in Section 5.2.

3.3 VHDL Core Generator for a RS Decoder for 1 or 2 Errors

The following table describes the procedures in file rs_dec_t1_t2.pas. This file is used to generate the VHDL code for a 1-error or a 2-error correcting RS decoder. The design of the code in VHDL is discussed in Section 7.
<table>
<thead>
<tr>
<th>procedure</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td>InputProcess</td>
<td>This procedure writes VHDL code for the syndrome calculation portion of an RS decoder, using the algorithms specified in Section 2.2.</td>
</tr>
<tr>
<td>ChienSearchProcess_T1</td>
<td>This procedure writes VHDL code for the Chien search portion of a 1-error correcting RS decoder, using the algorithms specified in Section 2.3.</td>
</tr>
<tr>
<td>RSTopProcess_T1</td>
<td>This procedure writes VHDL code which interconnects InputProcess and ChienSearchProcess_T1. In addition, internal signals from the InputProcess are brought out to be verified by the testbench, as well as the outputs of the ChienSearchProcess_T1.</td>
</tr>
<tr>
<td>RSTopProcess_T1_syn</td>
<td>This procedure writes VHDL code which interconnects InputProcess and ChienSearchProcess_T1. No other internal signals are brought out, making this the VHDL code used for synthesis for a 1-error correcting RS decoder.</td>
</tr>
<tr>
<td>ChienSearchProcess_T2</td>
<td>This procedure writes VHDL code for the Chien search portion of a 2-error correcting RS decoder, using the algorithms specified in Section 2.3.</td>
</tr>
<tr>
<td>RSTopProcess_T2</td>
<td>This procedure writes VHDL code which interconnects InputProcess and ChienSearchProcess_T2. In addition, internal signals from the InputProcess are brought out to be verified by the testbench, as well as the outputs of the ChienSearchProcess_T2.</td>
</tr>
<tr>
<td>RSTopProcess_T2_syn</td>
<td>This procedure writes VHDL code which interconnects InputProcess and ChienSearchProcess_T2. No other internal signals are brought out, making this the VHDL code used for synthesis for a 2-error correcting RS decoder.</td>
</tr>
</tbody>
</table>
3.4 VHDL Core Generator for a RS Decoder for More Than 2 Errors

The following table describes the procedures in file `rs_dec.pas`. This file is used to generate the VHDL code for a RS decoder capable of correcting more than 2 errors. The design of the code in VHDL is discussed in section 8.

<table>
<thead>
<tr>
<th>procedure</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td>InputProcess</td>
<td>This procedure writes VHDL code for the syndrome calculation portion of an RS decoder, using the algorithms specified in Section 2.2.</td>
</tr>
<tr>
<td>MBSolverProcess</td>
<td>This procedure writes VHDL code for the extended inversionless Massey-Berlekamp portion of the RS decoder, using the algorithms specified in Section 2.6.3.</td>
</tr>
<tr>
<td>ChienSearchProcess</td>
<td>This procedure writes VHDL code for the Chien search portion of the RS decoder, using the algorithms specified in Section 2.3.</td>
</tr>
<tr>
<td>RenameProcess</td>
<td>This procedure renames internal signals for verification by the testbench.</td>
</tr>
<tr>
<td>RSTopProcess</td>
<td>This procedure writes VHDL code which interconnects InputProcess, MBSolverProcess and ChienSearchProcess. In addition, internal signals from the InputProcess and MBSolverProcess are brought out to be verified by the testbench, as well as the outputs of the ChienSearchProcess.</td>
</tr>
<tr>
<td>RSTopProcessSyn</td>
<td>This procedure writes VHDL code which interconnects InputProcess, MBSolverProcess and ChienSearchProcess. No other internal signals are brought out, making this the VHDL code used for synthesis for a greater than 2-error correcting RS decoder.</td>
</tr>
</tbody>
</table>
3.5 Test Bench

FILE : rs_tb.pas

There are no procedures within this file. It calls the appropriate testbench procedure within the utility file rs Utils.pas, depending on whether an RS encoder or a RS decoder has been selected. These procedures generate test vectors for the VHDL testbench. The testbench has been written in a generic manner: all that needs to be specified are the values of number of bits in the Galois field (GFPower), the size of the codeword, RS_N, and the error-correcting capability of the code, RS_T. These are written to an ASCII file, named generics.txt, which is read in by the VHDL simulator. An example of such a file is shown below:

```
assign 8 GFPower
assign 22 RS_N
assign 2 RS_T
```

Figure 10. Example of a Generics.txt File Used by the Testbench.

The structure of the testbenches are discussed in section 6.4.

3.6 Utility Functions and Procedures

The file rs Utils.pas contains all of the utility functions and procedures used by the rest of the program. They include the basic low level functions such performing Galois field addition, multiplication and inversion, and all the way to procedures which generate the Galois field, or generate testbench vectors. The following section lists the functions and procedures in rs Utils.pas, along with a brief description.
function add(a, b : integer) : integer;  This function adds 2 Galois field elements. The Galois field must have been specified as described in Section 3.1.

function antilog(a : integer) : integer;  This function finds the antilog of an integer, a, returning the Galois field element which is \( \alpha^a \).

function bin(ix : integer; length : integer) : string;  This function returns a string of the binary representation of an integer value for a given length. This is used to generate std_logic_vector (standard logic vector) values in the VHDL code.

function gfdiv(a, b : integer) : integer;  This function returns the result of a Galois field division.

function hex(a : integer) : char;  This function returns a hex character for an integer in the range of 0 to 15:

function hex2(a : integer) : string;  This function returns 2 hex characters for an integer in the range of 0 to 255:

function inv(a : integer) : integer;  This function returns the inverse of a Galois field element.

function log(a : integer) : integer;  This function returns the logarithm of a Galois field element.

function mul(a, b : integer) : integer;  This function returns the result of a Galois field multiplication.

function rand_code : integer;  This function generates a random Galois field element.

function rand_error : integer;  This function generates a random error value.
function rand_index(N:integer) : integer;  This function generates a random index. Since an RS codeword is represented as an array of integers, this function returns an integer that points to one of the Galois field elements in the codeword.

function StrAlpha(a:integer) : string;  This function returns a string representing the power to a Galois field element. For example, for the value of \( \alpha^7 \), the function returns "a27".

procedure AddRoot(root : integer);  This procedure adds a root to an existing polynomial. This is used to generate the code generator polynomial.

procedure CalculateSyndrome(rx : codeword; var syndrome : syndrome_type);  This procedure calculates the syndrome of a codeword.

procedure ChienSearch;  This procedure performs the Chien search for a codeword. The algorithm used is discussed in section 2.3.

procedure decode (inp : codeword; var out : codeword);  This procedure decodes a codeword. First the syndrome is calculated. If there are errors, the key equation is solved using the extended inversionless Massey-Berlekamp algorithm. Finally, a Chien search is used to find the error locations, and Forney’s algorithm is used to correct the errors.

procedure encode (inp : codeword; var out : codeword);  This procedure encodes an incoming set of k Galois field elements into a (n,k) Reed-Solomon codeword. In the process, the 2t = n-k parity symbols are appended.

procedure ExtendedInversionlessMasseyBerlekamp(synd : syndrome_type);  This procedure solves the key equation from the syndromes of an incoming codeword using the extended inversionless Massey-Berlekamp algorithm, as discussed in section 2.6.3.
procedure GenerateField(GFP : integer; FP : field_polynomial_type); This procedure generates the Galois field based on the field generator polynomial. In the process, the log and antilog tables are generated. These tables are used to implement Galois field multiplication, inversion, and division.

procedure GetPolynomial; This procedure calculates the code generator polynomial using the power of the initial root, m₀, and the error correcting capability of the code, t.

procedure get_parameters(filename : string); This procedure reads in the parameters of the Reed-Solomon code, as discussed in section 3.1.

procedure inject_RS_errors(N:integer); This procedure injects a given number of errors in a codeword. This is used in generating test vectors for the RS decoder.

procedure tb_decoder; This procedure generates the input test vectors and expected values for a RS decoder.

procedure tb_encoder; This procedure generates the input test vectors and expected values for a RS encoder.

procedure VHDL_1_GFConstant(var f1 : text; power : integer); This procedure generates the VHDL code corresponding to the constant definition of a particular Galois field value.

procedure VHDL_GFConstants(var f1 : text; min_element, max_element:integer); This procedure generates the VHDL code corresponding to the constant definition of a series of sequential Galois field values.

procedure VHDL_GFConstants_0_1(var f1 : text); This procedure generates the VHDL code corresponding to the constant definition of 0 and 1.
procedure VHDL_GFConstants_from_generator_polynomial(var f1 : text);  This procedure writes VHDL comments describing the RS code generator polynomial. Each coefficient if printed out in power form.

procedure VHDL_add(var f1 : text);  This procedure generates the VHDL code for the Galois field addition of 2 elements, as discussed in Section 4.1.

procedure VHDL_mul(var f1 : text);  This procedure generates the VHDL code for the Galois field multiplication of 2 elements, as discussed in Section 4.2.

procedure VHDL_inv(var f1 : text);  This procedure generates the VHDL code for the Galois field inversion of an element, as discussed in Section 4.3.
4 VHDL Implementation of Galois Field Operations

This Chapter will discuss the VHDL implementation of the basic Galois field operations that are needed in either a Reed-Solomon encoder or decoder. These basic operations consist of addition, multiplication, and inversion over GF(2^m). The information required to implement these operations are the value of m, since the operations are performed over the Galois field GF(2^m), and the field generator polynomial p(x). The field generator polynomial p(x) is an irreducible polynomial of degree m-1, with coefficients from GF(2). The equations for each of these operations will be discussed for a general value of m and p(x), and VHDL code examples will be given for specific cases. The discussions of the VHDL code, for all of the code presented in this thesis, will feature code walkthroughs, and block diagrams.

4.1 VHDL Implementation of a Galois Field Adder

Addition and subtraction are the simplest of Galois field operators over GF(2^m). Let a and b be two elements from GF(2^m), that is,

\[ a = a_{n-1}x^{n-1} + \cdots + a_1x + a_0 = \sum_{i=0}^{n-1} a_i x^i, \quad \text{and} \quad b = b_{n-1}x^{n-1} + \cdots + b_1x + b_0 = \sum_{i=0}^{n-1} b_i x^i \]

Let c be the result of the addition of a and b; c is just the polynomial addition of the polynomial representations of a and b on a term-by-term basis.

\[ c = a + b = c_{n-1}x^{n-1} + \cdots + c_1x + c_0 = \sum_{i=0}^{n-1} c_i x^i = \sum_{i=0}^{n-1} (a_i + b_i) x^i \quad (3.1) \]

where the addition is performed over GF(2). The addition table for 2 elements A and B from GF(2) is shown in Table 5 below:
<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>C=A+B</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Table 5. Binary addition.

In terms of a Boolean equation, we see that C = A xor B, where xor is the binary exclusive-or operation. This is performed on a bitwise basis. An example of an addition function over GF(2^8) written in VHDL is shown below.

```vhdl
Constant GFPower : integer := 8;
subtype Galois_Field_element is std_logic_vector(GFPower-1 downto 0);

function add (b, c : in Galois_Field_element) return Galois_Field_element is
  variable d : Galois_Field_element;
begin
  d(0) := b(0) xor c(0);
  d(1) := b(1) xor c(1);
  d(2) := b(2) xor c(2);
  d(3) := b(3) xor c(3);
  d(4) := b(4) xor c(4);
  d(5) := b(5) xor c(5);
  d(6) := b(6) xor c(6);
  d(7) := b(7) xor c(7);
  return d;
end add;
```

Figure 11. VHDL Code for Galois Field Addition, GF(2^8)

Since we are dealing with elements from GF(2^m), subtraction and addition are equivalent, and hence the same VHDL code can be used for both addition and subtraction.

4.2 VHDL Implementation of a Galois Field Multiplier

The generation of VHDL code for the multiplication of 2 elements over GF(2^m) is accomplished by first forming the product of the polynomial representations of a and b
modulo the primitive irreducible polynomial. and the converting the expression into combinatorial equations for binary addition and multiplication. Let a and b be two elements from GF(2^m), that is,

\[ a = a_{n-1}x^{n-1} + \cdots + a_1x + a_0 = \sum_{i=0}^{n-1} a_i x^i, \]  
\[ b = b_{n-1}x^{n-1} + \cdots + b_1x + b_0 = \sum_{i=0}^{n-1} b_i x^i \]

Let c be the result of the multiplication of a and b.

\[ c = a\cdot b = c_{2n-2}x^{2n-2} + \cdots + c_1x + c_0 = \sum_{i=0}^{2n-2} c_i x^i, \]  
with \[ c_i = \sum_{0 \leq j \leq m \leq n-1} a_j b_k \]  
(3.2)

where the additions and multiplications are performed over GF(2). The resulting polynomial is then reduced modulo p(x). The addition table for 2 elements A and B from GF(2) was shown previously. The multiplication table for 2 elements A and B from GF(2) is shown in Table 6 below:

<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
<th>C=AB</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

Table 6. Binary multiplication.

In terms of a Boolean equation, C = A and B. An example of this process will clarify the method. Consider a field GF(2^4) formed by the primitive irreducible polynomial \( p(x) = x^4 + x + 1 \). The elements of this field are shown in the following table in vector form, and both the exponential (power) and polynomial forms.
Elements from $GF(2^4)$, $p(x)=x^4+x+1$

<table>
<thead>
<tr>
<th>power</th>
<th>Polynomial</th>
<th>vector = $(a_3, a_2, a_1, a_0)$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0 0 0 0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0 0 0 1</td>
</tr>
<tr>
<td>$\alpha$</td>
<td>$\alpha$</td>
<td>0 0 1 0</td>
</tr>
<tr>
<td>$\alpha^2$</td>
<td>$\alpha^2$</td>
<td>0 1 0 0</td>
</tr>
<tr>
<td>$\alpha^3$</td>
<td>$\alpha^3$</td>
<td>1 0 0 0</td>
</tr>
<tr>
<td>$\alpha^4$</td>
<td>$\alpha + 1$</td>
<td>0 0 1 1</td>
</tr>
<tr>
<td>$\alpha^5$</td>
<td>$\alpha^2 + \alpha$</td>
<td>0 1 1 0</td>
</tr>
<tr>
<td>$\alpha^6$</td>
<td>$\alpha^1 + \alpha^2$</td>
<td>1 1 0 0</td>
</tr>
<tr>
<td>$\alpha^7$</td>
<td>$\alpha^3 + \alpha + 1$</td>
<td>1 0 1 1</td>
</tr>
<tr>
<td>$\alpha^8$</td>
<td>$\alpha^3 + 1$</td>
<td>0 1 0 1</td>
</tr>
<tr>
<td>$\alpha^9$</td>
<td>$\alpha^1 + \alpha$</td>
<td>1 0 1 0</td>
</tr>
<tr>
<td>$\alpha^{10}$</td>
<td>$\alpha^5 + \alpha + 1$</td>
<td>0 1 1 1</td>
</tr>
<tr>
<td>$\alpha^{11}$</td>
<td>$\alpha^1 + \alpha^2 + \alpha$</td>
<td>1 1 1 0</td>
</tr>
<tr>
<td>$\alpha^{12}$</td>
<td>$\alpha^3 + \alpha^2 + \alpha + 1$</td>
<td>1 1 1 1</td>
</tr>
<tr>
<td>$\alpha^{13}$</td>
<td>$\alpha^3 + \alpha^2 + 1$</td>
<td>1 1 0 1</td>
</tr>
<tr>
<td>$\alpha^{14}$</td>
<td>$\alpha^3 + 1$</td>
<td>1 0 0 1</td>
</tr>
</tbody>
</table>

Table 7. Elements of $GF(2^4)$, $p(x)=x^4+x+1$

Let $a$ and $b$ be two elements from $GF(2^4)$, that is, $a = a_3 \cdot x^3 + a_2 \cdot x^2 + a_1 \cdot x + a_0$, and $b = b_3 \cdot x^3 + b_2 \cdot x^2 + b_1 \cdot x + b_0$. Let $c$ be the result of the multiplication of $a$ and $b$.

$$c = a \cdot b = c_6 \cdot x^6 + c_5 \cdot x^5 + c_4 \cdot x^4 + c_3 \cdot x^3 + c_2 \cdot x^2 + c_1 \cdot x + c_0,$$

with $c_i = \sum_{0 \leq j \leq 3} a_j b_{i-j}$, that is,
\[
\begin{align*}
    c_0 &= a_0 \cdot b_0 \\
    c_1 &= a_0 \cdot b_1 + a_1 \cdot b_0 \\
    c_2 &= a_0 \cdot b_2 + a_1 \cdot b_1 + a_2 \cdot b_0 \\
    c_3 &= a_0 \cdot b_3 + a_1 \cdot b_2 + a_2 \cdot b_1 + a_3 \cdot b_0 \\
    c_4 &= a_1 \cdot b_1 + a_2 \cdot b_2 + a_3 \cdot b_3 \\
    c_5 &= a_2 \cdot b_3 + a_3 \cdot b_2 \\
    c_n &= a_1 \cdot b_1 
\end{align*}
\]

In order to reduce the polynomial \( c = a \cdot b = c_n \cdot x^n + c_{n-1} \cdot x^{n-1} + c_{n-2} \cdot x^{n-2} + c_0 \cdot x + c_0 \) modulo the primitive irreducible polynomial \( p(x) = x^4 + x + 1 \), we note the following:

1. \( x^4 = x + 1 \)
2. \( x^3 = x^2 + x \)
3. \( x^5 = x^3 + x^2 \)

Thus, the polynomial \( c = a \cdot b = c_n \cdot x^n + c_{n-1} \cdot x^{n-1} + c_{n-2} \cdot x^{n-2} + c_0 \cdot x + c_0 \) can be reduced to \( c = a \cdot b = (c_n + c_{n-1}) \cdot x^n + (c_{n-2} + c_n + c_{n-2}) \cdot x^{n-1} + (c_{n-3} + c_n + c_{n-2}) \cdot x^{n-2} + (c_{n-1} + c_n + c_{n-2}) \cdot x + (c_0 + c_{n-1}) \), or

\[
c = (a_0 \cdot b_1 + a_1 \cdot b_0 + a_2 \cdot b_2 + a_3 \cdot b_3 + a_4 \cdot b_4) \cdot x^n \\
+ (a_0 \cdot b_2 + a_1 \cdot b_1 + a_2 \cdot b_0 + a_3 \cdot b_3 + a_4 \cdot b_4) \cdot x^{n-1} \\
+ (a_0 \cdot b_3 + a_1 \cdot b_2 + a_2 \cdot b_1 + a_3 \cdot b_3 + a_4 \cdot b_4) \cdot x^{n-2} \\
+ (a_0 \cdot b_4 + a_1 \cdot b_3 + a_2 \cdot b_2 + a_3 \cdot b_3 + a_4 \cdot b_4) \cdot x^{n-3} \\
+ (a_0 + a_1 + a_2 + a_3 + a_4) \cdot x^{n-4} \\
= (a_0 + a_1 + a_2 + a_3 + a_4) \cdot x^{n-4} \\
+ (a_0 + a_1 + a_2 + a_3 + a_4) \cdot x^{n-3} \\
+ (a_0 + a_1 + a_2 + a_3 + a_4) \cdot x^{n-2} \\
+ (a_0 + a_1 + a_2 + a_3 + a_4) \cdot x^{n-1} \\
+ (a_0 + a_1 + a_2 + a_3 + a_4) \cdot x^n \\
\]

These are the bitwise combinatorial equations for multiplication over \( \text{GF}(2^4) \), with the primitive irreducible polynomial \( p(x) = x^4 + x + 1 \). The resulting VHDL code is shown below, with the binary "xor" used for bitwise addition, and the binary "and" used for bitwise multiplication:
constant GFPower : integer := 4;
subtype Galois_Field_element is std_logic_vector(GFPower downto 3);

function mul (b, c : in Galois_Field_element) return Galois_Field_element is
  variable d : Galois_Field_element;
begin
  d(0) := (b(0) and c(0)) xor (b(1) and c(1)) xor (b(2) and c(2))
           xor (b(3) and c(3));
  d(1) := (b(0) and c(1)) xor (b(1) and c(0)) xor (b(1) and c(3))
           xor (b(2) and c(2)) xor (b(3) and c(1)) xor (b(2) and c(3))
           xor (b(3) and c(2));
  d(2) := (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor
           (b(2) and c(3)) xor (b(3) and c(2));
  d(3) := (b(0) and c(3)) xor (b(1) and c(2)) xor (b(2) and c(1)) xor
           (b(3) and c(0)) xor (b(3) and c(3));
  return d;
end mul;

Figure 12. VHDL Code for Galois Field Multiplication, GF(2^4), p(x)=x^4+x+1

4.3 VHDL Implementation of a Galois Field Inverter

Unlike Galois field multiplication, Galois field inversion is generally difficult, if not impossible, to reduce into reasonable sized Boolean equation. The easiest approach is to list the table of inversion, and let the synthesis tool generate the required equations. This results in a lookup table of size 2^m words with m bits per word, where m is the degree of the field generator polynomial. For inversion over GF(2^4), with p(x)=x^4+x+1, the VHDL code using “case” statements is shown in Figure 13. An alternate implementation using “if-then-else” statements is shown in Figure 14. Both methods yield identical simulation results, but the “if-then-else” method yielded better synthesis results for Xilinx FPGAs.
function inv (b : in std_logic_vector (3 downto 0)) return std_logic_vector is
variable d : std_logic_vector (3 downto 0);
begin
  d := "0000";
  case b is
    when "0000" => d := "0000";
    when "0001" => d := "0001";
    when "0010" => d := "1001";
    when "0011" => d := "1110";
    when "0100" => d := "1101";
    when "0101" => d := "1011";
    when "0110" => d := "0111";
    when "0111" => d := "0110";
    when "1000" => d := "1111";
    when "1001" => d := "0010";
    when "1010" => d := "1000";
    when "1011" => d := "0101";
    when "1100" => d := "1010";
    when "1101" => d := "0011";
    when "1110" => d := "1100";
    when "1111" => d := "1000";
    when others => d := "0000";
  end case;
  return d;
end inv;

Figure 13. VHDL Code for Galois Field Inversion, GF(2^4), p(x)=x^4+x+1

function inv (b : in std_logic_vector) return std_logic_vector is
variable d : std_logic_vector (3 downto 0);
begin
  d := "0000";
  if b="0001" then d := "0001";
  elsif b="0010" then d := "1001";
  elsif b="0011" then d := "1110";
  elsif b="0100" then d := "1101";
  elsif b="0101" then d := "1011";
  elsif b="0110" then d := "0111";
  elsif b="0111" then d := "0010";
  elsif b="1000" then d := "1100";
  elsif b="1001" then d := "0010";
  elsif b="1010" then d := "0111";
  elsif b="1011" then d := "1100";
  elsif b="1100" then d := "1001";
  elsif b="1101" then d := "0100";
  elsif b="1110" then d := "0011";
  elsif b="1111" then d := "1000";
  end if;
  return d;
end inv;

Figure 14. Alternate VHDL Code for Galois Field Inversion, GF(2^4), p(x)=x^4+x+1
5 VHDL Design of a RS Encoder

The topic of this thesis was to write a program that would generate synthesizable VHDL code for any arbitrary Reed-Solomon encoder or decoder. The previous sections have concentrated on the theoretical aspects of encoding and decoding. One of the goals of the program was also to generate "generic" VHDL code as much as possible, and rely on the definition of constants and signals as the primary method of creating a specific encoder or decoder. This section will discuss how the different parts of a RS encoder are implemented in VHDL.

The parameters that must be entered, in order to specify an RS encoder, are:
1. the size of the extension of the Galois field GF(2) into GF(2^m), i.e., the value m.
2. the irreducible primitive polynomial used to generate the field, p(x), by specifying the coefficients of the polynomial from GF(2)
3. the log of the initial root of the code generator polynomial, m₀.
4. the error-correcting capability of the code.

The number of message symbols, k, will be made programmable. In the following sections, a brief discussion of the encoder structure will be followed by a detailed exposure of the VHDL code for the various parts of the encoder.

5.1 Encoder Overview

Figure 15 illustrates a block diagram of an RS encoder. The 2t storage elements are m-bit registers, labeled b₀ through to b_{2t-1}, again where m is the degree of the field generator polynomial, and t is the error correcting capability of the code. These are also called the parity registers. The circuit performs polynomial division of the message polynomial, m(x) by the field generator polynomial, g(x). The remainder of the division, b(x) is
stored in the 2t parity registers. The codeword polynomial, c(x), is the concatenation of m(x) followed by b(x).

Figure 15. Block Diagram of a Generic Reed-Solomon Encoder

The operation of the circuit, as can be found in any textbook in error control coding [10], is described as follows:

1. The initial state of the registers is 0 for each register.
2. Without lack of generality, we assume that the message is divided equally into m-bit words, where each word is now considered a Galois field element. Each m-bit word is then associated with an increasing power of x, starting with x^0=1, and ending in x^{k-1}, thus forming a polynomial, m(x) over GF(2^m). The message polynomial is of degree k-1, that is, there are k coefficients. Each coefficient enters the circuit one coefficient every clock cycle, with the most significant coefficient entering first.
3. For the first k clock cycles, corresponding to when the message is entering and the remainder is being calculated, switches 1 and 2 are in position B. The encoded data thus corresponds with the message polynomial for the first k clock cycles. During these first k clock cycles, the remainder is being calculated in the registers.
4. When the message has finished entering into the encoder, switches 1 and 2 are set to position A. Since the output of switch 1 is 0, the resulting multiplications are
0, and the resulting additions, and consequently the inputs to the registers are just the values of the preceding register. In the case of the first register, its input is 0, since it sees the output of the switch. The output of switch 2 is now the output of the last register. Thus the entire remainder polynomial, \( b(x) \) is shifted out one element at a time. Once all the register contents have been shifted out, the codeword generation is complete.

5.2 VHDL Implementation of an RS Encoder

5.2.1 Encoder Timing Diagram.

Consider now the implementation of the above algorithm in VHDL. The design is partitioned into data flow and control signals. The timing diagram showing the input/output interface of the RS encoder is shown in Figure 16. A signal called input_strobe is used to start the encoding process. It is a single clock cycle pulse that precedes the message data, called data_in, by one clock cycle. The data consists of the \( k \) coefficients of the message polynomial. Another requirement for the encoder is the size of the message: this encoder can accommodate different sized messages. The message size is conveyed through the signal data_size. The timing diagram shows two control signals that are used internally, namely, do_calc and count. These will be discussed in more detail in the next section. Finally, there are two output signals, output_strobe, and data_out, whose names clearly describe their function. The signal output_strobe is active for 1 clock cycle, and it occurs 1 clock cycle before the output data.
5.2.2 Encoder Control Logic

Control of the encoder is accomplished through the `do_calc` control signal and the counter `count`. These signals, along with the registers are used to perform the polynomial division. The signal `input_strobe` starts off the encoding process. When it arrives, the counter `count` is set to the value 1, and then is incremented every clock cycle, as long as `do_calc` is ‘1’. otherwise it is set to the value 0. In addition, the `input_strobe` causes the signal `do_calc` to be set to ‘1’. The signal `do_calc` is then only set to ‘0’ when the counter `count` reaches the value prescribed by `data_size`. A separate process is used for each of these signals. The VHDL code for these two processes is shown in Figure 17 for the case specific case of m=4, p(x)=x^4+x+1. Notice that the code for these two processes is written in such a manner that it can be reused for any size of Galois field element, only the initial definitions require adjustment for specific codes. Also notice that all sequential signals are coded to perform a synchronous reset when the signal `reset_n` is ‘0’. This is true for all sequential signals in either the encoder or decoder design. The initialization section defines the signals needed, as well as constants, types and subtypes. In particular, the
size of the Galois field elements, m, is denoted by the constant \( \text{GFPower} \), indicating the power of the Galois field. The subtype \text{Galois Field element} is then defined using the constant \( \text{GFPower} \). The constant \( \text{RS}_T \) defines the error-correcting capability of the code. Other constants include Galois field equivalents of 0, 1 and the field generator polynomial coefficients. Finally, the parity registers are defined as an array of Galois field elements.

```vhdl
constant GFPower : integer := 4;
subtype Galois_Field_element is std_logic_vector((GFPower-1) downto 0);
constant RS_T : integer := 1;
constant zero : Galois_Field_element := "0000";
constant one : Galois_Field_element := "0001";
constant alpha : Galois_Field_element := "1100";
constant alpha4 : Galois_Field_element := "1001";
signal DoCalc : std_logic;
signal count : std_logic_vector ((GFPower-1) downto 0);
signal sum : Galois_Field_element;
type parity_reg_type is array(0 to (2*RS_T-1)) of Galois_Field_element;
signal parity_reg : parity_reg_type;

process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      DoCalc <= '0';
      elsif (input_strobe = '1') then
        DoCalc <= '1';
      elsif (count=data_size) then
        DoCalc <= '0';
      end if;
    end if;
  end process;

process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      count <= (others => '0');
      elsif (input_strobe = '1') then
        count <= one;
      elsif (DoCalc = '1') then
        count <= count + 1;
      else count <= (others => '0');
    end if;
  end if;
end process;
```

Figure 17. VHDL Code of Encoder Control Signals
5.2.3 Parity Registers and Output Signals

The encoder consists of a set \((2^t)\) of parity registers, each being \(m\) bits wide. When the signal `input_strobe` arrives it sets the parity registers to 0. Then, when the `do_calc` signal is '1', the parity registers perform the required multiplications and additions using the combinatorial signal `sum`, which is the sum of the input data and the last parity register, as shown in Figure 15. This actions is the heart of the polynomial division process. When the division is complete, the parity registers are shifted out one at a time. The VHDL code for the single error correcting code is shown in Figure 18.

```
process (parity_reg, data_in)
begin
  sum <= add(parity_reg(2*RS_T-1), data_in);
end process;

process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      for i in 0 to (2*RS_T-1) loop
        parity_reg(i) <= 'others' => '0';
      end loop;
    elsif ('input_strobe' = '1') then
      for i in 0 to (2*RS_T-1) loop
        parity_reg(i) <= 'others' => '0';
      end loop;
    elsif ('DoCalc' = '1') then
      parity_reg(0) <= add(parity_reg(0), mul(sum, alphal));
      parity_reg(0) <= mul(sum, alphal);
    else
      for i in (2*RS_T-1) downto 1 loop
        parity_reg(i) <= parity_reg(i-1);
      end loop;
      parity_reg(0) <= 'others' => '0';
    end if;
  end if;
end process;
```

Figure 18. VHDL Code of Encoder Parity Registers

Figure 19 shows the VHDL code for the output data (`data_out`), and the signal `output_strobe`. The output data is set to the input data when the signal `do_calc` is '1', otherwise it is set to the output of the last parity register. This has the effect of concatenating the message polynomial with the parity elements, thus forming the complete codeword. The signal `output_strobe` is simply a delayed version of the input strobe, thus satisfying the requirement of the output interface.
process(clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      data_out <= (others => '0');
    elsif (DoCalc='1') then
      data_out <= data_in;
    else
      data_out <= parity_reg((2*RS_T-1));
    end if;
  end if;
end process;

process(clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      output_strobe <= '0';
    else
      output_strobe <= input_strobe;
    end if;
  end if;
end process;

Figure 19. VHDL Code of Encoder Output Signals

The complete VHDL code for one RS encoder is shown in Appendix A. This includes the entity-architecture definitions as well as the functional code.
6 Synthesis and Test Results for RS Encoders

6.1 General Remarks

This Chapter will discuss the synthesis results for RS encoders in terms of speed and area. The technology that was chosen for synthesis is that of the Xilinx Virtex series of Field Programmable Gate Arrays (FPGAs), specifically the XCV1000E. This series of FPGAs is a good candidate for the implementation of communications related algorithms. They consist of parts that have an equivalent gate count of 1 million gates, and register-to-register speeds of well over 150 MHz. The speed is given by the maximum clock frequency in MHz, while the area is measured by the number of slices. Slices are the basic building block of the FPGA, consisted of 2 flip-flops and 2 5-bit LUTs for implementing combinatorial logic. The choice of synthesis target is arbitrary; in fact ASICs or others FPGAs, such as Actel FPGAs are also valid targets. Thus, it is not the absolute performance to be considered, although in terms of absolute performance the suggested target technology must be capable of implementing a typical RS encoder or decoder. Rather, it is the relative performance that will be discussed, taking into perspective the parameters of the particular encoder/decoder.

RS encoder designs were synthesized for different error-correcting ability codes across 3 values of the degree of the field generator polynomial, m, namely, 4, 6 and 8. The results are discussed in the next 2 sections.

6.2 RS Encoder Synthesis Speed Results

Several RS encoder designs were chosen as candidate designs. The error-correcting capability ranged from 1 to 7. This was applied to codes from GF(2^4), GF(2^6), and GF(2^8). The synthesis flow consisted of the Synplicity synthesis tool (from Synplicity) and Xilinx software for the final place-and-route and timing analysis. The speed results are tabulated in Table 8 and shown graphically in Figure 20.
Table 8. Maximum Encoder Speed (MHz) vs. Error Correcting Ability

<table>
<thead>
<tr>
<th>Error-correcting ability</th>
<th>$GF(2^4)$</th>
<th></th>
<th>$GF(2^5)$</th>
<th></th>
<th>$GF(2^8)$</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Clock</td>
<td>Bit</td>
<td>Clock</td>
<td>Bit</td>
<td>Clock</td>
<td>Bit</td>
</tr>
<tr>
<td></td>
<td>speed (MHz)</td>
<td>Rate (Mbps)</td>
<td>speed (MHz)</td>
<td>Rate (Mbps)</td>
<td>speed (MHz)</td>
<td>Rate (Mbps)</td>
</tr>
<tr>
<td>1</td>
<td>192.5</td>
<td>770.0</td>
<td>188.4</td>
<td>1130.4</td>
<td>169.2</td>
<td>1353.6</td>
</tr>
<tr>
<td>2</td>
<td>168.1</td>
<td>672.5</td>
<td>168.1</td>
<td>1008.7</td>
<td>140.4</td>
<td>1123.1</td>
</tr>
<tr>
<td>3</td>
<td>211.1</td>
<td>844.6</td>
<td>168.1</td>
<td>1008.7</td>
<td>140.4</td>
<td>1123.1</td>
</tr>
<tr>
<td>4</td>
<td>196.4</td>
<td>785.5</td>
<td>140.4</td>
<td>842.3</td>
<td>140.4</td>
<td>1123.1</td>
</tr>
<tr>
<td>5</td>
<td>185.8</td>
<td>743.2</td>
<td>157.0</td>
<td>941.9</td>
<td>159.6</td>
<td>1276.5</td>
</tr>
<tr>
<td>6</td>
<td>168.1</td>
<td>672.5</td>
<td>159.6</td>
<td>957.4</td>
<td>159.6</td>
<td>1276.5</td>
</tr>
<tr>
<td>7</td>
<td>180.3</td>
<td>721.1</td>
<td>159.6</td>
<td>957.4</td>
<td>149.4</td>
<td>1194.9</td>
</tr>
</tbody>
</table>

Figure 20. Reed-Solomon Encoder Maximum Speed

The results show that for a given field generator polynomial, the maximum speed does not vary with the error-correcting capability. There is also very little variation with respect to the field generator polynomial, although, it can be said that on average, a code
with a larger field generator polynomial runs slightly slower than one with a smaller field generator polynomial. This can be attributed to the fact that the Galois field multipliers become increasingly more complex and as a result, increasingly more slower with larger field generator polynomials. It can be seen that encoder bit rates of $10^9$ bits per second are achievable.

### 6.3 RS Encoder Synthesis Area Results

The area results are tabulated in Table 9 and shown graphically in Figure 21. It is clear from the results that area is a linear function of the error-correcting capability. This is to be expected, since all that is being added for increasing error-correcting capability is the number of parity registers. The encoder area increases with increasing error-correcting capability, again, as a consequence of ever more complex Galois field multipliers.

<table>
<thead>
<tr>
<th>Error-correcting ability</th>
<th>$\text{GF}(2^4)$</th>
<th>$\text{GF}(2^5)$</th>
<th>$\text{GF}(2^6)$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>15</td>
<td>24</td>
<td>32</td>
</tr>
<tr>
<td>2</td>
<td>25</td>
<td>47</td>
<td>67</td>
</tr>
<tr>
<td>3</td>
<td>31</td>
<td>59</td>
<td>84</td>
</tr>
<tr>
<td>4</td>
<td>36</td>
<td>64</td>
<td>102</td>
</tr>
<tr>
<td>5</td>
<td>41</td>
<td>78</td>
<td>112</td>
</tr>
<tr>
<td>6</td>
<td>46</td>
<td>82</td>
<td>132</td>
</tr>
<tr>
<td>7</td>
<td>74</td>
<td>97</td>
<td>149</td>
</tr>
</tbody>
</table>

Table 9. Encoder Area (in slices) vs. Error Correcting Ability
6.4 Encoder Testbench

The performance of the encoders described above was verified with a VHDL testbench. The testbench has the structure shown in Figure 22. The clock and stimulus generator provides the 4 input signals to the encoder. The clock clk is just a square wave of 10 MHz. The other 3 signals are generated by reading an ASCII text file containing the logic levels for the signals. The encoding algorithm was written in a high level language, which was then used to generate the stimulus test vectors (input_strobe, data_size, and data_in), and the expected vectors. The response verifier reads the ASCII text file corresponding to the expected values of the output signal data_out, and compares it to actual output data. Any discrepancy is flagged, and an error message is displayed on the simulator. Figure 23 shows a typical encoder simulation, showing inputs, outputs, and internal control signals. The VHDL code for the encoder testbench is shown in Appendix E.
**Figure 22. RS Encoder Testbench Structure**

![Testbench Structure Diagram]

**Figure 23. RS Encoder Simulation Waveforms**

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td>b</td>
<td>a</td>
<td>i</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- clk: Clock signal
- reset_n: Reset signal
- input_strobe: Strobe signal for input
- data_size: Size of data being input
- data_in: Input data
- output_strobe: Strobe signal for output
- data_out: Output data
7 VHDL Design of a General RS Decoder (1 or 2 errors)

7.1 Decoder Overview

As was discussed in Section 2.4, a major simplification is possible if the error-correcting capability of a code is either 1 or 2. The resulting simplification is the removal of the key equation solver, and simpler Chien search block. The syndrome calculation is still required. These 2 blocks are discussed below.

7.2 Syndrome Calculation

The syndrome calculation for an RS decoder capable of correcting 1 or 2 errors is identical to that of the syndrome calculation for 3 or more errors, except that the number of syndromes values is less, since the error-correcting capability is correspondingly less. For a 1 error-correcting code, 2 syndrome values are produced, while 4 syndrome values are produced for a 2 error-correcting code. A block for a generic syndrome value, $S(j)$, is shown in Figure 24, and represents the calculation of the syndrome as per the equations in section 2.2.

![Figure 24. Syndrome Calculation Block Diagram](image-url)
7.2.1 Syndrome Calculation Timing Diagram

Figure 25 shows the timing diagram for a syndrome calculation block. An input strobe called rs_data_in_start occurs at the start of the N symbols of rs_data. When all the N rs_data symbols have been received, the syndrome calculation has been completed. An output strobe called syndrome_calc_done occurs immediately after the data has entered. One clock cycle later, the value of the 2t syndromes and a status signal called errors_present, which is active high, are ready.

![Syndrome Calculation Timing Diagram]

Figure 25. Syndrome Calculation Timing Diagram

7.2.2 Syndrome Calculation Constants, Signals, and Control Logic

A number of constants and signals are needed in the syndrome calculation block. Figure 26 shows the VHDL code for a RS(22,18) code using GF(2^7). This is specific code, but the generic case will be discussed as needed. First, the value of m=7 is defined as the constant GFPower. This is then used to define a subtype called Galois_Field_element which is used to define all of the data related signals, namely, the syndrome values IntS. The name IntS is used to because these are the internal syndrome values. The constant for N, t, and m0 are next defined. A constant
called \texttt{Two\_T\_minus\_1} is defined and will be used in the functional code later. The constant for 0 and 1 (\texttt{zero} and \texttt{one}) are next defined and are specific to the RS code. In this case, the error correcting capability is $t=2$. Thus there are 4 constants required in the syndrome calculation, the powers of alpha from 33 to 36, since $m=33$. A counter \texttt{IntCount} is required in the control of the syndrome calculation. Its size is based on the largest possible value, proportional to $N$. All of the other signals listed are needed in the syndrome calculation block, but can written in the manner shown for any particular RS code.

\begin{verbatim}
constant GFPower : integer := 7;
subsubtype Galois_Field_element is std_logic_vector((GFPower-1) downto 0);
constant RS_T : integer := 2;
constant RS_N : integer := 22;
constant RS_M0 : integer := 33;
constant Two_T_minus_1 : integer := 2*RS_T-1;
constant zero : Galois_Field_element := "0000000";
constant one : Galois_Field_element := "0000001";
constant alpha33 : Galois_Field_element := "0001100";
constant alpha34 : Galois_Field_element := "0011000";
constant alpha35 : Galois_Field_element := "0110000";
constant alpha36 : Galois_Field_element := "1100000";
type IntS_type is array(0 to Two_T_minus_1) of Galois_Field_element;
type IntEP_type is array(0 to Two_T_minus_1) of std_logic;
type state_type is (Idle, RxData, XferData);
signal present_state : state_type;
signal IntCount : std_logic_vector (4 downto 0);
signal internal_errors_present : std_logic;
signal CountEnable : std_logic;
signal CountReset : std_logic;
signal StartXfer : std_logic;
signal Xfer Syndrome : std_logic;
signal DoCalc : std_logic;
signal IntS : IntS_type;
signal IntEP : IntEP_type;
\end{verbatim}

Figure 26. Syndrome Calculation - VHDL Constant and Signal Definition

Figure 27 shows the VHDL code for the control logic of the syndrome calculation block for the example given above. A state machine is used to control the data flow; the state of the state machine is called \texttt{present\_state}. The state machine also controls a counter \texttt{IntCount} using the control signals \texttt{CountEnable} and \texttt{CountReset}. The data flow is controlled through the signals \texttt{DoCalc} and \texttt{Xfer Syndrome}. Initially, the state machine is in the \texttt{Idle} state, and all control signals are in their inactive (low) state, except for the \texttt{CountReset}, which is resetting the counter.

75
When the input strobe \texttt{rs\_data\_in\_start} arrives, the state machine switches to the \texttt{RxData} state. In this state, the \texttt{DoCalc} signal is active (high), enabling the syndrome calculation, and the \texttt{CountEnable} is active, incrementing the counter every clock cycle. All other control signals are inactive. When the counter reaches the value 19 (decimal) $= N - 3$, the \texttt{StartXfer} signal is set to 1. The next clock cycle, the state machine changes to the \texttt{XferData} state. During this state, the output strobe \texttt{syndrome\_calc\_done} is set to active for 1 clock cycle. The state machine then goes back to the \texttt{Idle} state immediately thereafter. Note that the VHDL code is written in such a manner as to minimize the changes needed for specific RS codes. Indeed, the only line of code that needs changing is the check for the maximum counter value.
InputControlSD_Idle : process (clk)
begin
if (clk'event and clk = '1') then
if (reset_n = '0') then
  Xfer Syndrome <= '0'; DoCalc <= '0';
  CountReset <= '1'; CountEnable <= '0';
  present_state <= Idle;
else
  case present_state is
  when Idle =>
    if (rs_data_in_start = '1') then
      DoCalc <= '1'; CountReset <= '0';
      CountEnable <= '1'; present_state <= RxData;
    else
      present_state <= Idle;
    end if;
  when RxData =>
    if (StartXfer = '1') then
      Xfer Syndrome <= '1'; DoCalc <= '0';
      CountReset <= '1'; CountEnable <= '0';
      present_state <= XferData;
    else
      present_state <= RxData;
    end if;
  when XferData =>
    Xfer Syndrome <= '0'; DoCalc <= '0';
    CountReset <= '1'; CountEnable <= '0';
    present_state <= Idle;
  when others =>
    Xfer Syndrome <= '0'; DoCalc <= '0';
    CountReset <= '1'; CountEnable <= '0';
    present_state <= Idle;
  end case;
end if;
end if;
end process;

process (clk)
begin
if (clk'event and clk = '1') then
  if (((reset_n = '0') or (CountReset = '1'))) then
    IntCount <= (others => '0');
  elsif (CountEnable = '1') then
    IntCount <= IntCount + 1;
  end if;
end if;
end process;

process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    StartXfer <= '0';
  elsif (IntCount = "10011") then -- RS(22,18) : max_count = 19
    StartXfer <= '1';
  else
    StartXfer <= '0';
  end if;
end if;
end process;
syndrome_calc_done <= Xfer Syndrome;

Figure 27. Syndrome Calculation - VHDL Code For Control Logic
### Syndrome Registers

Figure 29 shows the VHDL code for the syndrome registers, and the determination of if there is an error present. The intermediate syndrome values are stored in the IntS register array. On reset, they are set to zero. When the input strobe `rs_data_in_start` arrives, these registers are set to the input data. Thereafter, while the control signal `DoCalc` is 1, the syndromes are calculated as per previous discussions. When the control signal `XferSyndrome` is 1, i.e., at the end of the input data, the output syndrome registers and the `errors_present` control bit are set to their correct values. The signal `errors_present` is not used by the Chien search block, but is an output for the testbench. The unused outputs are removed during the synthesis process, so that resources are not used for unwanted signals. Once again, note that only a few lines will differ depending on the RS code.

Figure 28 shows the simulation waveforms for the example cited above.

![Simulation Waveforms](image)

**Figure 28. Syndrome Calculation Simulation Waveforms**
Figure 29. Syndrome Calculation - VHDL Code for Syndrome Registers
7.3 Chien Search and Error Correction for 1 Error

This section will describe the details of the VHDL implementation of the Chien search and the error correction for an RS decoder capable of correcting 1 error. The relevant theory has been described previously in Section 2.4. The input/output timing diagram of the Chien search block is shown in Figure 30. The internal signals are not shown here. Refer to the simulation waveforms below to see their relationship with the input and output signals.

![Figure 30. Chien Search Timing Diagram – 1 error](image)

7.3.1 Constants and Signal Definition

The value of \( m \), \( \text{GF}(2^m) \), is defined as an integer \( \text{GFPower} \). The subtype \( \text{Galois\_Field\_element} \) is then defined from \( \text{GFPower} \). The constant for the error correcting capability, \( \text{RS\_T} \), the encoded message size in symbols, \( \text{RS\_N} \), and the log of the initial root of the code generator polynomial, \( \text{m0} \), are defined as integers. These are then used to define the remaining constants. The delay through the Chien search pipeline is defined. It has a value of 3, since the delay starts from 0 and ends at 3, giving a 4 clock delay, as shown in Figure 30. These together with \( \text{RS\_N} \) define the size of the delay block \( \text{shift\_register\_delay, SR\_max} \). A counter \( \text{Count} \) is used to aid the generation of control signals. The maximum count value for the control signals are defined as
constants. The signals Count, CountEnable, InitChien, InitChien_d1, DoChien, and OutputEnable are control signals, while the remaining signals are related to the data path.

```
constant GFPower : integer := 6;
subtype Galois_Field_element is std_logic_vector(GFPower-1 downto 0);
constant RS_T := integer := 1;
constant RS_N := integer := 22;
constant RS_M0 := integer := 55;
constant ChienSearch_pipeline_delay : integer := 33;
constant OutputEnableStartCount : std_logic_vector(GFPower downto 0) :=
  CONV_STD_LOGIC_VECTOR(ChienSearch_pipeline_delay-1, GFPower+1);
constant RS_Data_out_StartCount : std_logic_vector(GFPower downto 0) :=
  CONV_STD_LOGIC_VECTOR(ChienSearch_pipeline_delay, GFPower+1);
constant OutputEnableStopCount : std_logic_vector(GFPower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N-ChienSearch_pipeline_delay-1, GFPower+1);
constant DoChienCountMax : std_logic_vector(GFPower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N-ChienSearch_pipeline_delay-2, GFPower+1);
constant CountMax : std_logic_vector(GFPower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N-ChienSearch_pipeline_delay, GFPower+1);
constant zero : Galois_Field_element := "000000";
constant one : Galois_Field_element := "000001";
constant Templ_initial_value : Galois_Field_element := "111010";
constant x0a_mul : Galois_Field_element := "101110";
constant alphal : Galois_Field_element := "00010";
constant x0a_initial_value : Galois_Field_element := "111010";

type shift_register_delay_type is array(0 to RS_max) of Galois_Field_element;

signal shift_register_delay : shift_register_delay_type;
signal RS_data_delayed : Galois_Field_element;
signal CORR_FACTOR : Galois_Field_element;
signal Count : std_logic_vector(GFPower downto 0);
signal CountEnable : std_logic;
signal InitChien : std_logic;
signal InitChien_d1 : std_logic;
signal DoChien : std_logic;
signal OutputEnable : std_logic;
signal Inv_S0 : Galois_Field_element;
signal x0a : Galois_Field_element;
signal Templ : Galois_Field_element;
type syndrome_type is array(0 to 2*RS_T-1) of Galois_Field_element;
signal S : syndrome_type;
```

Figure 31. Chien Search For 1 Error – VHDL Constants and Signals

### 7.3.2 Control Signals

Figure 33 shows the code for the control signals. A counter Count is set to 0 when the input strobe StartChien is active. Thereafter, it increments every clock cycle. When it reaches the constant CountMax, the enable to the counter is removed, and the counter stops counting. The signal StartChien is just a delayed version (1 clock delay) of the input strobe, and is only used to control the counter. The signal InitChien is just a delayed version (1 clock delay) of StartChien, and is used to initialize the data path. The signal OutputEnable is used to enable the output data while the data stream is
leaving, otherwise the output data is set to 0. The signal DoChien is used to control the path as well. As can be seen, these 2 control signals are controlled by the input strobe and the counter value. Note that the code is written in such a manner so that no change is required for different RS codes. The parameterization is accomplished through the use of constants.

The code for the output strobe is shown in Figure 32. The output strobe is controlled by the counter, and is set to 1 for only one clock period. It is timed to coincide with the first output data symbol.

```vhdl
RS_Data_out_Start_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      RS_Data_out_Start <= '0';
    elsif (Count = RS_Data_out_StartCount) then
      RS_Data_out_Start <= '1';
    else
      RS_Data_out_Start <= '0';
    end if;
  end if;
end process;
```

Figure 32. Chien Search For 1 Error - Output Strobe
System_Counter : process (clk)
beg
  if (clk'event and clk = 'l') then
    if ((reset_n = '0') or (StartChien = '1')) then
      Count <= (others => '0');
    elsif (CountEnable = '1') then
      Count <= Count + 1;
    end if;
  end if;
end process;

System_Count_Enable : process (clk)
beg
  if (clk'event and clk = 'l') then
    if (reset_n = '0') then
      CountEnable <= '0';
    elsif (StartChien = '1') then
      CountEnable <= '1';
    elsif (Count = CountMax) then
      CountEnable <= '0';
    end if;
  end if;
end process;

InitChien_Control : process (clk)
beg
  if (clk'event and clk = 'l') then
    if (reset_n = '0') then
      InitChien <= '0';
      InitChien_d1 <= '0';
    else
      InitChien <= StartChien;
      InitChien_d1 <= InitChien;
    end if;
  end if;
end process;

OutputEnable_Control : process (clk)
beg
  if (clk'event and clk = 'l') then
    if (reset_n = '0') then
      OutputEnable <= '0';
    elsif (Count = OutputEnableStartCount) then
      OutputEnable <= '1';
    elsif (Count = OutputEnableStopCount) then
      OutputEnable <= '0';
    end if;
  end if;
end process;

DoChien_Control : process (clk)
beg
  if (clk'event and clk = 'l') then
    if ((reset_n = '0') or (Count = DoChienCountMax)) then
      DoChien <= '0';
    elsif (InitChien_d1 = '1') then
      DoChien <= '1';
    end if;
  end if;
end process;

Figure 33. Chien Search For 1 Error - Internal Control Signals
7.3.3 Delay Block

Figure 34 shows the VHDL code for the data delay. The required delay is equal to the delay through the Syndrome_Calculation block (N) and the Chien search pipeline delay. This length determines the constant SR_max. The delay is implemented by an array from 0 to SR_max of back-to-back registers. Input data is sent to the 0th array element, and the delayed data is taken from the SR_maxth array element. Note that the code is written in such a manner so that no change is required for different RS codes. The parameterization is accomplished through the use of constants.

```
Delay_RS_Data : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      for i in 0 to SR_max loop
        shift_register_delay(i) <= zero;
      end loop;
    else
      for i in SR_max downto 0 loop
        shift_register_delay(i) <= shift_register_delay(i-1);
      end loop;
      shift_register_delay(0) <= RS_Data_In;
    end if;
  end if;
end process;
RS_dataDelayed <= shift_register_delay(SR_max);
```

Figure 34. Chien Search For 1 Error - Data Delay

7.3.4 Chien Search, Error Calculation and Error Correction

A block diagram of the data flow of the Chien search block is shown in Figure 35. The corresponding VHDL code is shown in Figure 36. When the StartChien arrives, the Temp1 register is loaded with the initial value \( \alpha^{-(N-1)} \), the inverse of the present location. It is then multiplied by the syndrome S(1) then next clock cycle (InitChien=1), and by the inverse of syndrome S(0) the following clock cycle, when InitChien_d1 is 1. Every clock cycle thereafter, the Temp1 register is multiplied by the constant \( \alpha \). In this manner, every location is stepped through. When the location X=S(1)/S(0) is processed, the value of Temp1 will be the symbol corresponding to 1. In this manner, the location of the error is found.
The register x0a is part of the error correction calculation. When the InitChien_d1 signal is 1, it is set to its initial value of $\alpha^{X_2 Y_1 - Y_1 m_0}$. Every clock cycle thereafter, its value is multiplied by $\alpha^{m_0}$. The register x0a is multiplied by the syndrome $S(0)$, to yield a streaming set of correction factors of the form $Y_i = S(0) \cdot X_i^{m_0}$ as specified in section 2.4. The correction factor is applied to the delayed data only when the Chien search has found the error, i.e., when Temp1 is 1. This results in corrected data being sent out of the decoder.

Figure 35. Chien Search For 1 Error - Data Flow Block Diagram
$S(1) \leftarrow \text{syndromes}(1 \text{ downto } 6)$;
$S(0) \leftarrow \text{syndromes}(5 \text{ downto } 0)$;

$\text{inv}_S0 \_\text{process} : \text{process}(S)$
begin
    $\text{inv}_S0 \leftarrow \text{inv}(S(1))$;
end process;

$\text{Calc}_\text{Temp}_\text{Registers} : \text{process}(\text{clk})$
begin
    if ('clk'\_event and clk = '1') then
        if (reset \_n = '0') then
            Templ <= zero;
        elsif (StartChien = '1') then
            Templ <= Tempi \_initial \_value;
        elsif (InitChien = '1') then
            Templ <= mul(S(1),Tempi);
        elsif (InitChien \_dl = '1') then
            Templ <= mul(inv \_S0,Tempi);
        elsif (DoChien = '1') then
            Templ <= mul(alphai,Tempi);
        end if;
    end if;
end process;

$\text{x0a}_\text{process} : \text{process}(\text{clk})$
begin
    if ('clk'\_event and clk = '1') then
        if (reset \_n = '0') then
            x0a <= zero;
        elsif (InitChien \_dl = '1') then
            x0a <= x0a \_initial \_value;
        else
            x0a <= mul(x0a,x0a \_mul);
        end if;
    end if;
end process;

$\text{Calc}_\text{Correction}_\text{Factor} : \text{process}(\text{clk})$
begin
    if ('clk'\_event and clk = '1') then
        if (reset \_n = '0') then
            CORR \_FACTOR <= zero;
        elsif (Rs \_enable = '1') and (Templ \_one) then
            CORR \_FACTOR <= mul(S(0),x0a);
        else
            CORR \_FACTOR <= zero;
        end if;
    end if;
end process;

$\text{Correct}_\text{RS}_\text{Data} : \text{process}(\text{clk})$
begin
    if ('clk'\_event and clk = '1') then
        if (reset \_n = '0') then
            Rs \_data \_out <= zero;
        elsif (Output \_enable = '1') then
            Rs \_data \_out <= add(CORR \_FACTOR,Rs \_data \_delayed);
        else
            Rs \_data \_out <= zero;
        end if;
    end if;
end process;

Figure 36. Chien Search For 1 Error – Chien Search, and Correction
7.4 Chien Search and Error Correction for 2 Errors

This section will describe the details of the VHDL implementation of the Chien search and the error correction for an RS decoder capable of correcting 2 errors. The input/output timing diagram of the Chien search block is shown in Figure 37. The internal signals are not shown here. Refer to the simulation waveforms below to see their relationship with the input and output signals.

![Timing Diagram](image)

**Figure 37. Chien Search Timing Diagram – 2 errors**

7.4.1 Constants and Signal Definition

The code for the VHDL constant s is shown in Figure 38, while the signal definitions are shown in Figure 39. As before, the constant GPA is defined, along with constants for \(m_0\) (RS_M0), N (RS_N) and t (RS_T). The ChienSearch_pipeline_delay is defined. These basic constants are then used to define constants for control signal (CountMax to RS_Data_out_StartCount). The remaining constants, Temp1_mul_factor to zero are related to the data flow logic. Here is where the characteristics specific to a RS code required in the Chien search are defined.
constant GFPower : integer := 7;
custom RS_M0 : integer := 13;
custom RS_N : integer := 32;
custom RS_T : integer := 2;
custom ChienSearch_pipeline_delay : integer := 7;
custom SR_max : integer := RS_N + ChienSearch_pipeline_delay;
subtype Galois_Field_element is std_logic_vector(GFPower downto 0);
type shift_register_delay_type is array (0 to SR_max) of Galois_Field_element;
type syndrome_type is array (0 to 2^RS_T-1) of Galois_Field_element;
custom CountMax : std_logic_vector(GFPower downto 0) :=
CONV_STD_LOGIC_VECTOR(RS_N + ChienSearch_pipeline_delay, GFPower+1);
custom DoChienCountMax : std_logic_vector(GFPower downto 0) :=
CONV_STD_LOGIC_VECTOR(RS_N + ChienSearch_pipeline_delay-1, GFPower+1);
custom OutputEnableStartCount : std_logic_vector(GFPower downto 0) :=
CONV_STD_LOGIC_VECTOR(ChienSearch_pipeline_delay-1, GFPower+1);
custom OutputEnableStopCount : std_logic_vector(GFPower downto 0) :=
CONV_STD_LOGIC_VECTOR(RS_N + ChienSearch_pipeline_delay-1, GFPower+1);
custom RS_Data_out_StartCount : std_logic_vector(GFPower downto 0) :=
CONV_STD_LOGIC_VECTOR(ChienSearch_pipeline_delay, GFPower+1);
custom Templ_mul_factor : Galois_Field_element := "0111000"; -- alpha^(-1)*RS_N+1
constant Temp2_mul_factor : Galois_Field_element := "0011010"; -- alpha^(-1)*RS_N+2
constant alphal : Galois_Field_element := "0000010";
custom alpha2 : Galois_Field_element := "0000100";
custom one : Galois_Field_element := "0000001";
custom x0a_initial_value : Galois_Field_element := "0101101"; -- alpha^(-1)*RS_M0
constant x0a_mul : Galois_Field_element := "0001100"; -- alpha(RS_M0)
custom xl_initial_value : Galois_Field_element := "1101101"; -- alpha(RS_N-1)
custom xl_mul : Galois_Field_element := "1000110"; -- alpha(-1)
custom xlm_initial_value : Galois_Field_element := "1101110"; -- alpha((RS_N-1)*RS_M0)
custom xlm_mul : Galois_Field_element := "0011110"; -- alpha(-RS_M0)
custom zero : Galois_Field_element := "0000000";

Figure 38. Chien Search For 2 Error – VHDL Constants

The signals shown in Figure 39 can be separated into two categories. The control logic related signals are grouped from Count to there_is_one_error. The data flow related signals are those from RS_data_delayed to xlm.
signal Count : std_logic_vector(GFpow downto 0);
signal CountEnable : std_logic;
signal DoChien : std_logic;
signal InitChien : std_logic;
signal InitChien_d1 : std_logic;
signal InitChien_d2 : std_logic;
signal InitChien_d3 : std_logic;
signal InitChien_d4 : std_logic;
signal InitChien_d5 : std_logic;
signal OutputEnable : std_logic;
signal there_are_two_errors : std_logic;
signal there_is_one_error : std_logic;
signal RS_data_delayed : Galois_Field_element;
signal CORR_FACTOR : Galois_Field_element;
signal CORR_FACTOR_1_error : Galois_Field_element;
signal CORR_FACTOR_2_errors : Galois_Field_element;
signal ChienSum : Galois_Field_element;
signal D1 : Galois_Field_element;
signal D2 : Galois_Field_element;
signal Temp1 : Galois_Field_element;
signal Temp1_num_1_error : Galois_Field_element;
signal Temp1_num_2_errors : Galois_Field_element;
signal Temp2 : Galois_Field_element;
signal Temp2_num_2_errors : Galois_Field_element;
signal denom : Galois_Field_element;
signal inv_D2 : Galois_Field_element;
signal inv_denom : Galois_Field_element;
signal inv_s0 : Galois_Field_element;
signal num : Galois_Field_element;
signal num_d1 : Galois_Field_element;

Figure 39. Chien Search For 2 Error2 – VHDL Signals

7.4.2 Control Signals

Figure 40 shows the VHDL code for some internal control signals. A counter Count is reset when the signal StartChien is 1, i.e., at the beginning. The enable for the counter is set to 1 (active) also when StartChien is 1. The counter increments every clock cycle as long as the enable is active. The enable is deactivated when the counter has reached the CountMax value defined in the constants section. Figure 41 shows the VHDL code for the determination of a single, double or no error case. The result is as per the discussions in Section 2.5. Figure 42 shows the code for the control signal OutputEnable and the output strobe RS_Data_out_Start. Both of these signals are controlled by the counter and constants defined previously. Note the code for all of these control signals is completely general through the use of constants.
System_Counter : process (clk)
begin
  if (clk'event and clk = '1') then
    if ('reset_n = '0') or (StartChien = '1') then
      Count <= '0';
    elsif (CountEnable = '1') then
      Count <= Count + 1;
    end if;
  end if;
end process;

System_Count_Enable : process (clk)
begin
  if (clk'event and clk = '1') then
    if ('reset_n = '0') then
      CountEnable <= '0';
    elsif (StartChien = '1') then
      CountEnable <= '1';
    elsif (Count = CountMax) then
      CountEnable <= '0';
    end if;
  end if;
end process;

InitChien_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if ('reset_n = '0') then
      InitChien <= '0';
      InitChien_d1 <= '0';
      InitChien_d2 <= '0';
      InitChien_d3 <= '0';
      InitChien_d4 <= '0';
      InitChien_d5 <= '0';
    else
      InitChien <= StartChien;
      InitChien_d1 <= InitChien;
      InitChien_d2 <= InitChien_d1;
      InitChien_d3 <= InitChien_d2;
      InitChien_d4 <= InitChien_d3;
      InitChien_d5 <= InitChien_d4;
    end if;
  end if;
end process;

DoChien_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if ('reset_n = '0') or (Count = DoChienCountMax) then
      DoChien <= '0';
    elsif (InitChien_d4 = '1') then
      DoChien <= '1';
    end if;
  end if;
end process;

Figure 40. Chien Search Internal Control Signals
```vhdl
Error_Count_process : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      there_is_one_error <= '0';
      there_are_two_errors <= '0';
    elsif ((InitChien = '1') and (IsNotZero(D1)='1')) then
      there_is_one_error <= '1';
      there_are_two_errors <= '0';
    elsif ((InitChien_D1 = '1') and (IsNotZero(D2)='1')) then
      there_is_one_error <= '0';
      there_are_two_errors <= '1';
    end if;
  end if;
end process;
```

Figure 41. Single/Double Error Determination

```vhdl
OutputEnable_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') then
      OutputEnable <= '0';
    elsif (Count = OutputEnableStartCount) then
      OutputEnable <= '1';
    elsif (Count = OutputEnableStopCount) then
      OutputEnable <= '0';
    end if;
  end if;
end process;
```

```vhdl
RS_Data_out_Start_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') then
      RS_Data_out_Start <= '0';
    elsif (Count = RS_Data_out_StartCount) then
      RS_Data_out_Start <= '1';
    else
      RS_Data_out_Start <= '0';
    end if;
  end if;
end process;
```

Figure 42. Output Control Signals

7.4.3 Delay Block

The VHDL code for the delay block for a 2-error correcting RS code is identical to that of the 1-error correcting code, except that the Chien search pipeline delay is larger (7). This makes the constant SR_max correspondingly larger. The VHDL code is shown in Figure 34.
7.4.4 Chien Search, Error Calculation and Error Correction

Figure 43 shows the code for the calculation of the determinants $D_1$ and $D_2$, as well as the inverse of $D_2$, as defined in section 2.5. Only the code for the remapping of the syndromes bus and the determinant $D_1$ require modification for other RS codes. However, this change is relatively trivial. The calculation of these values is predicated by their position in the overall pipeline, using the control signals defined above.

```vhdl
syndrome(3) <= syndromes(27 downto 21);
syndrome(2) <= syndromes(20 downto 14);
syndrome(1) <= syndromes(13 downto 7);
syndrome(0) <= syndromes(6 downto 0);

D1 <= syndromes(6 downto 0);

Calculate_D2 : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      D2 <= zero;
    elsif (InitChien = '1') then
      D2 <= add(mul(syndrome(0), syndrome(2)), mul(syndrome(1), syndrome(1)));
    end if;
  end if;
end process;

Calculate_inv_D2 : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      inv_D2 <= zero;
    elsif (InitChien_d1 = '1') then
      inv_D2 <= inv(D2);
    end if;
  end if;
end process;
```

**Figure 43. Determinant Calculation**

Figure 44 shows the VHDL code for the calculation of the error corrections factor, according to the equations defined in Section 2.5. The calculation for both the single error and the double error case are shown. The x0a_process calculate the single error correction factor $Y_1 = S(0) \cdot X_1^{-n_0}$. This is done by first initializing the x0a register to $S(0) \cdot \alpha^{-(N-1)\cdot n_0}$. This corresponds to a single error correction factor for the first position of the Chien search. Every clock cycle thereafter, the register is multiplied by $\alpha^{n_0}$, thus generating a sequence of single error correction factors for each location.
The x1_process, x1m_process and the CORR_FACTOR_2_errors_process show the code for calculating the correction factors for double errors, as described in 2.5, namely

\[ Y_1 = \frac{S(0) \cdot (X_1 + \sigma_1) + S(1)}{X_1^{m_1} \cdot \sigma_1} \quad (6.1) \]

The x1_process generates successive values of \( X_1 \). This is done by initializing the x1 register to \( \alpha^{(N-1)} \), and then multiplying the register by every clock cycle thereafter. This produces the sequence \( \alpha^{(N-1)}, \alpha^{(N-2)}, \alpha^{(N-3)}, ..., \alpha^1, \alpha \). In a similar manner, the x1m_process generates successive values of \( X_1^{m_1} \). The register x1m is initialized to \( \alpha^{(N-1)m_1} \), and is then multiplied by \( \alpha^{-m_1} \) every clock cycle.

The CORR_FACTOR_2_errors_process completes the calculation of the double error correction factor. In parallel, it calculates the numerator \( S(0) \cdot (X_1 + \sigma_1) + S(1) \), and the denominator \( X_1^{m_1} \cdot \sigma_1 \). The denominator is then inverted, and multiplied with the numerator, to yield the value \( Y_1 \), as defined above.

Figure 45 shows the code for choosing which error correction factor to use, single or double. This is done by examining two control signals there_is_one_error and there_are_two_errors, whose name describe their function. The generation of these two control signals is based on the values of the determinants \( D_1 \) and \( D_2 \), and is shown in Figure 41. The correction factor is then applied when the Chien sum is 0, that is, when an error location has been found. As can be seen, the VHDL code written for these processes is completely generic, and does not require modification for other RS codes.
```verilog
x0a_process : process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    x0a <= zero;
  elsif (InitChien_d5 = '1') then
    x0a <= mul(x0a_initial_value, syndrome(0));
  else
    x0a <= mul(x0a, x0a_mul);
  end if;
end if;
end process;
CORR_FACTOR_1_error <= x0a;

x1_process : process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    x1 <= zero;
  elsif (InitChien_d2 = '1') then
    x1 <= x1_initial_value;
  else
    x1 <= mul(x1, x1_mul);
  end if;
end if;
end process;

x1m_process : process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    x1m <= zero;
  elsif (InitChien_d2 = '1') then
    x1m <= x1m_initial_value;
  else
    x1m <= mul(x1m, x1m_mul);
  end if;
end if;
end process;

CORR_FACTOR_2_errors_process : process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    num <= zero;
    num_d1 <= zero;
    denom <= zero;
    inv_denom <= zero;
    CORR_FACTOR_2_errors <= zero;
  else
    num <= add(mul(add(x1, sigma1), syndrome(0)), syndrome(1));
    num_d1 <= num;
    denom <= mul(x1m, sigma1);
    inv_denom <= inv(denom);
    CORR_FACTOR_2_errors <= mul(num_d1, inv_denom);
  end if;
end if;
end process;
```

Figure 44. Correction Factor Calculation
Figure 45. Error Correction

Figure 46 and Figure 47 show the VHDL code that implements the actual Chien search. In Figure 46, the calculation of the coefficients of the error-locator polynomial is shown. The Chien search uses two registers Temp1 and Temp2 to perform the root test. For the single error case, Temp2 is set to zero, while Temp1 is set to \( \frac{S(1)}{S(0)} \alpha^{-N-1} \), since the error-locator polynomial is linear. For the double error case, Temp2 is set to \( \frac{S(2)^2 + S(1) \cdot S(3)}{S(0) \cdot S(2) + S(1)^2 \alpha^{-N-1}} \), while Temp1 is set to \( \frac{S(0) \cdot S(3) + S(1) \cdot S(2)}{S(0) \cdot S(2) + S(1)^2} \alpha^{-N-1} \), since the error-locator polynomial is quadratic. These equations are those derived in Section 2.5. The Temp1 register is then multiplied by \( \alpha \), and Temp2 is multiplied by \( \alpha^2 \) every clock cycle. Finally, the Chien is formed by adding the registers Temp1, Temp2 and the constant one. An error location is found when the Chien search is 0.

95
Figure 46. Chien Search Initialization Values Calculation
Figure 47. Chien Search Process

Figure 48 shows the simulation waveforms of a Chien search for 1 error, while Figure 49 shows the waveforms for 2 errors.
Figure 49. Simulation Waveforms – 2 errors
8 VHDL Design of a General RS Decoder (3 or more errors)

8.1 Decoder Overview
A generic RS decoder for 3 or more errors can be broken down in three major sections, namely the syndrome calculation block, the key equations solver, for which the extended inversionless Massey-Berlekamp is chosen, and finally, the Chien search. The exact algorithmic description has been given in Sections 2.2, 2.3, and 2.6. The implementation of the syndrome calculation block in VHDL has been discussed in detail in Section 7.2 and will not be repeated here.

8.2 Key Equation Solver
The VHDL implementation of the extended inversionless Massey-Berlekamp will be described in this section. Figure 50 shows the timing diagram. The block algorithm starts when the input strobe XferSyndrome arrives. The outputs of this block are the coefficients of the error-locator polynomial (lambda_poly) and the error-evaluator polynomial (omega_poly). The latency is (2t+1) where “t” is the error correcting capability of the RS code.

![Figure 50. Extended Inversionless Massey-Berlekamp Timing Diagram](image)
Figure 51 shows the VHDL code for the constants, signal definition, and 3 functions, convolution_term_mul, is_not_0 and is_0. The value of m, (GF(2^m)), defined as an integer GFPower. The subtype Galois_Field_element is then defined from GFPower. The constant for the error correcting capability, RS_T, the encoded message size in symbols, RS_N, and the log of the initial root of the code generator polynomial, m0, are defined as integers. These are then used to define the remaining constants. The control logic related signals are those listed from current_state to StoreNewPolys. The data flow related signals are those listed from N to Convolution_Term_Multiplier.

The function convolution_term_mul is used to help in the calculation of the signal delta, as will be described below. The function is_not_0 is used to determine if a Galois field element is not 0. This is done by OR'ing all of the bits of the element. The function is_0 is simply the negation of is_not_0.

Figure 52 shows the state machine that generates the control signals to perform the iteration of the extended inversionless Massey-Berlekamp algorithm. It generates four control logic signals, Initialize, StoreNewPolys, StartChien, and CountEnable. This state machine implements the looping from 0 to N-1 for the extended inversionless Massey-Berlekamp algorithm, through the use of the counter N. The mapping between the internal array Mu into a single bus called lambda_poly for the error-locator polynomial coefficients is shown, along with the mapping of the internal array Zomega into a single bus called omega_poly for the error-evaluator polynomial coefficients.
Figure 51. Constant, Signal Definitions, and Functions
process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      Initialize <= '0'; StoreNewPolys <= '0';
      StartChien <= '0'; CountEnable <= '0';
      current_state <= Idle;
    else
      case current_state is
        when Idle =>
          if (XferSyndrome = '1') then
            Initialize <= '1'; StoreNewPolys <= '0';
            StartChien <= '0'; CountEnable <= '0';
            current_state <= Init;
          else
            current_state <= Idle;
          end if;
        when ChienSearchStart =>
          Initialize <= '0'; StoreNewPolys <= '0';
          StartChien <= '0'; CountEnable <= '0';
          current_state <= Idle;
        when Init =>
          if (ErrorsPresent = '0') then
            Initialize <= '0'; StoreNewPolys <= '0';
            StartChien <= '0'; CountEnable <= '1';
            current_state <= Synchronize;
          else
            Initialize <= '0'; StoreNewPolys <= '1';
            StartChien <= '0'; CountEnable <= '1';
            current_state <= Update_Polys;
          end if;
        when Synchronize =>
          if (N = "0111") then
            Initialize <= '0'; StoreNewPolys <= '0';
            StartChien <= '1'; CountEnable <= '0';
            current_state <= ChienSearchStart;
          else
            current_state <= Synchronize;
          end if;
        when Update_Polys =>
          if (N = "0111") then
            Initialize <= '0'; StoreNewPolys <= '0';
            StartChien <= '1'; CountEnable <= '0';
            current_state <= ChienSearchStart;
          else
            current_state <= Update_Polys;
          end if;
        when others =>
          Initialize <= '0'; StoreNewPolys <= '0';
          StartChien <= '0'; CountEnable <= '0';
          current_state <= Idle;
      end case;
    end if;
  end if;
end process;

omega_poly <= ZOmega(4) & ZOmega(3) & ZOmega(2) & ZOmega(1);
lambda_poly <= Mu(4) & Mu(3) & Mu(2) & Mu(1) & Mu(0);

Figure 52. Control State Machine
Figure 53 shows the VHDL code for the calculation of the signal delta. This corresponds to equation 2.31, and is accomplished in four processes. The RS code chosen in this example is a 4-error correcting code; hence, the signal L, which controls the length of summation, can range from 0 to 4. First, the SReg array, which is 3t elements long, is loaded with 0 for the first t elements, and with the 2t syndrome elements for the remainder. All 5 convolution terms are formed from the Mu and the SReg arrays. These terms are then multiplied by the 5 convolutional term multipliers, which depend on the value of signal L, as shown in then third process called Convolution_Term_Multiplier_process. This results in the Post_Convolution_Term array, which is 5 elements long, some of which are 0, and some of which correspond to the convolution terms. These 5 terms are then added to yield the delta signal. The SReg array is shifted over every iteration of the algorithm, thus forming the convolution from a different starting point.

Figure 54 shows the VHDL code for the variables L, N and for the Gamma array. The process L_process implements the function of equations 2.34 and 2.37. The process TwoL_process generates the value of 2L that is needed in the L_process: it is simply a multiply by 2 function. The Gamma_process implements Equations 2.35 and 2.38. The signal KeepOldL is used by both the Gamma_process and the L_process. It is set to ‘1’ if δ(1)≠0 or 2L(k)>k, otherwise it is set to ‘0’. The N_Process implements the counter from 0 to (2t-1)=7 via the CountEnable control signal from the state machine described earlier.

Figure 55 shows the code for the Lambda_Process and the B_process. The Lambda_process implements the equations 2.33 and 2.36, while the B_process implements equations 2.41 and 2.42, where there has been a renaming f the variable. Figure 56 shows the VHDL code for the Zomega_process which implements equation 2.40. The GammaMu_process, DeltaLambda_process and the Mu_process implement equation 2.32. This corresponds to the error-locator polynomial. When there is no error, as indicated by the input signal ErrorsPresent,
the all of the coefficients of the error-locator polynomial are set to 0 except the first coefficient, which is set to 1. This will prevent the Chien search from finding a root, and thus the data is never changed. Figure 57 shows the simulation waveforms for the extended inversionless Massey-Berlekamp algorithm.

```
SReg_Process : process (clk)
begin
  if (clk'event and clk='1') then
    if (reset_n='0') then
      for i in 0 to (3*RS_T-1) loop
        SReg(i) <= zero;
      end loop;
    elsif (Initialize = '1') then
      for i in 0 to (RS_T-1) loop
        SReg(i) <= zero;
      end loop;
      for i in 0 to (2*RS_T-1) loop
        SReg(i+RS_T) <= syndrome_poly(((i+1)*GFPower)-1) downto (i*GFPower));
      end loop;
    elsif (StoreNewPolys = '1') then
      for i in 0 to (3*RS_T-2) loop
        SReg(i) <= SReg(i-1);
      end loop;
      SReg((3*RS_T-1)) <= zero;
    end if;
  end if;
end process SReg_Process;

Convolution_Term_Process : process (SReg, Mu)
begin
  for i in 0 to RS_T loop
    Convolution_Term(i) := mul(SReg(RS_T-i), Mu(i));
  end loop;
end process Convolution_Term_Process;

Convolution_Term_Multiplier_Process : process (L)
begin
  case L is
    when "0000" => Convolution_Term_Multiplier <= "00001";
    when "0001" => Convolution_Term_Multiplier <= "00111";
    when "0010" => Convolution_Term_Multiplier <= "001111";
    when "0100" => Convolution_Term_Multiplier <= "011111";
    when others => Convolution_Term_Multiplier <= "111111";
  end case;
end process Convolution_Term_Multiplier_Process;

Post_Convolution_Term_Process : process (Convolution_Term_Multiplier, Convolution_Term)
begin
  for i in 0 to RS_T loop
    Post_Convolution_Term(i) <= convolution_term_mul(Convolution_Term(i), Convolution_Term_Multiplier(i));
  end loop;
end process Post_Convolution_Term_Process;

Delta_Process : process (Post_Convolution_Term)
begin
  delta <= add(Post_Convolution_Term(0),
               add(Post_Convolution_Term(1),
                add(Post_Convolution_Term(2),
                 add(Post_Convolution_Term(3), Post_Convolution_Term(4)))));
end process Delta_Process;
```

Figure 53. Delta Process
Figure 54. Gamma and Variables L and N Process
Lambda_Process : process (clk) begin
  if (clk'event and clk='1') then
    if (reset_n='0') then
      for i in 0 to RS_T loop
        Lambda(i) <= (others=>'0');
      end loop;
    elsif (Initialize = '1') then
      Lambda(0) <= one;
      for i in 1 to RS_T loop
        Lambda(i) <= zero;
      end loop;
    else
      for i in 0 to RS_T loop
        Lambda(i) <= Mu(i);
      end loop;
    end if;
  end if;
end process Lambda_Process;

B_Process : process (clk) begin
  if (clk'event and clk='1') then
    if (reset_n='0') then
      for i in 0 to RS_T loop
        B(i) <= (others=>'0');
      end loop;
    elsif (Initialize = '1') then
      B(0) <= one;
      for i in 1 to RS_T loop
        B(i) <= zero;
      end loop;
    else
      for i in 0 to RS_T loop
        B(i) <= ZOmega(i);
      end loop;
    end if;
  end if;
end process B_Process;

Figure 55. Lambda and Polynomial B Process
Figure 56. Z Omega and Polynomial Mu Process
Figure 57. Extended Inversionless Massey-Berlekamp Simulation Waveforms
8.3 Chien Search and Error Correction

When the error correcting capability of an RS code is greater than two, the calculation of the correction factor requires evaluation of the derivative of the error-locator polynomial and the error-evaluator polynomial. The Chien search requires the evaluation of the error-locator polynomial. The evaluations are best handled in a pipeline for polynomial evaluation. This next section discusses the pipelines that are needed and their size. The pipeline for the error-locator polynomial and the error-evaluator polynomial will be called the EVAL_LAMBDA_PRIME and EVAL_OMEGA pipelines in the VHDL code. The Chien search pipeline is called the EVAL_CHIENUM pipeline in the VHDL code.

Given a t-error correcting RS code, the error-locator polynomial, \( \Lambda(x) \), and the error-evaluator polynomial, \( \Omega(x) \) obtained from the extended inversionless Massey-Berlekamp algorithm are defined as follows:

\[
\Lambda(x) = \sum_{i=0}^{t-1} \Lambda_i \cdot x^i
\]

(7.1)

and

\[
\Omega(x) = \sum_{i=0}^{t-1} \Omega_i \cdot x^i
\]

(7.2)

The derivative of the error-locator polynomial, \( \Lambda'(x) \), is used in Forney’s algorithm to compute the error correction factor.

\[
\Lambda'(x) = \sum_{i=0}^{t-1} \Lambda_{2i+1} \cdot x^{2i}
\]

(7.3)

for \( t = 2k \) (that is, \( t \) is even)

\[
\Lambda'(x) = \sum_{i=0}^{t-1} \Lambda_{2i+1} \cdot x^{2i}
\]

(7.4)

for \( t = 2k + 1 \) (that is, \( t \) is odd).
Clearly then, the EVAL_LAMBDA_PRIME and EVAL_OMEGA pipeline stages for computing the error term must necessarily have a different relationship with respect to one another, when t is even or odd.

Specifically, for t odd, \( t = 2k + 1 \), the EVAL_OMEGA pipeline will be of length \( t-1 = 2k \), while the EVAL_LAMBDA_PRIME pipeline will be of length \( t-3 \), and with \( 1 \) stage for the K factor multiplication and \( 1 \) stage for the inversion, brings up the total to \( t-1 \) stages for the denominator. However, since the EVAL_LAMBDA_PRIME pipeline starts one pipeline stage after the EVAL_OMEGA pipeline, a \( 1 \) delay stage is added to the EVAL_OMEGA pipeline to compensate. Note that, if no K factor multiplication stage is needed, then it is replaced by a pure delay stage, in order to match the pipeline delay of the EVAL_OMEGA pipeline.

For t even, \( t = 2k \), the EVAL_OMEGA pipeline will be of length \( t-1 \), as before, while the EVAL_LAMBDA_PRIME pipeline will be of length \( t-2 \), and with \( 1 \) stage for the K factor multiplication and \( 1 \) stage for the inversion, brings up the total to \( t \) stages for the denominator. As an example, for \( t=8 \), we have:

\[
\Lambda(x) = \Lambda_x x^6 + \Lambda_7 x^5 + \Lambda_6 x^4 + \Lambda_5 x^3 + \Lambda_4 x^2 + \Lambda_3 x + \Lambda_2 + \Lambda_1 x + \Lambda_0 \tag{7.5}
\]

\[
\Lambda'(x) = \Lambda_x x^6 + \Lambda_7 x^5 + \Lambda_6 x^4 + \Lambda_5 x^3 + \Lambda_2 x^2 + \Lambda_1 x + \Lambda_0 \tag{7.6}
\]

\[
\Omega(x) = \Omega_7 x^7 + \Omega_6 x^6 + \Omega_5 x^5 + \Omega_4 x^4 + \Omega_3 x^3 + \Omega_2 x^2 + \Omega_1 x + \Omega_0 \tag{7.7}
\]

whereas for \( t=7 \), we have:

\[
\Lambda(x) = \Lambda_7 x^7 + \Lambda_6 x^6 + \Lambda_5 x^5 + \Lambda_4 x^4 + \Lambda_3 x^3 + \Lambda_2 x^2 + \Lambda_1 x + \Lambda_0 \tag{7.8}
\]

\[
\Lambda'(x) = \Lambda_x x^6 + \Lambda_7 x^5 + \Lambda_6 x^4 + \Lambda_5 x^3 + \Lambda_2 x^2 + \Lambda_1 x + \Lambda_0 \tag{7.9}
\]

\[
\Omega(x) = \Omega_6 x^6 + \Omega_5 x^5 + \Omega_4 x^4 + \Omega_3 x^3 + \Omega_2 x^2 + \Omega_1 x + \Omega_0 \tag{7.10}
\]

Note that the degree of \( \Lambda^-'(x) \) is the same in both cases.
The following table summarizes the pipeline lengths and any delays needed for several values of \( t \), and for general \( t \).

<table>
<thead>
<tr>
<th>Pipeline</th>
<th>( t=8 )</th>
<th>( t=7 )</th>
<th>( t=6 )</th>
<th>( t ) even ( t=2k )</th>
<th>( t ) odd ( t=2k+1 )</th>
</tr>
</thead>
<tbody>
<tr>
<td>( X ) delayed ( (X_1, X_1 D_1, X_1 D_2, ..., X_1 D_5) )</td>
<td>pipeline starts at A</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>length of pipeline B</td>
<td>5</td>
<td>4</td>
<td>3</td>
<td>t-3</td>
</tr>
<tr>
<td></td>
<td>pipeline ends at ( A+B-1 )</td>
<td>6</td>
<td>5</td>
<td>4</td>
<td>t-2</td>
</tr>
<tr>
<td>( P )owers of ( X ) ( (X_2, X_3, ..., X_7) )</td>
<td>pipeline starts at A</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>length of pipeline B</td>
<td>7</td>
<td>6</td>
<td>5</td>
<td>t-1</td>
</tr>
<tr>
<td></td>
<td>pipeline ends at ( A+B-1 )</td>
<td>7</td>
<td>6</td>
<td>5</td>
<td>t-1</td>
</tr>
<tr>
<td>EVAL_CHOIENUMSUM</td>
<td>pipeline starts at A</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>length of pipeline B</td>
<td>9</td>
<td>8</td>
<td>7</td>
<td>t+1</td>
</tr>
<tr>
<td></td>
<td>delay stage C</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>pipeline ends at ( \text{Sum}(A, C)-1 )</td>
<td>9</td>
<td>9</td>
<td>8</td>
<td>7</td>
</tr>
<tr>
<td>EVAL_OMEGA</td>
<td>pipeline starts at A</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>length of pipeline B</td>
<td>8</td>
<td>7</td>
<td>6</td>
<td>t</td>
</tr>
<tr>
<td></td>
<td>delay stage C</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>pipeline ends at ( \text{Sum}(A, C)-1 )</td>
<td>9</td>
<td>9</td>
<td>8</td>
<td>7</td>
</tr>
<tr>
<td>EVAL_LAMBDAprime</td>
<td>pipeline starts at A</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>length of pipeline B</td>
<td>6</td>
<td>6</td>
<td>4</td>
<td>t-2</td>
</tr>
<tr>
<td></td>
<td>delay stage C</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>inversion stage E</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td></td>
<td>pipeline ends at ( \text{Sum}(A, E)-1 )</td>
<td>9</td>
<td>9</td>
<td>8</td>
<td>7</td>
</tr>
</tbody>
</table>

**Table 10. Chien Search Pipeline Lengths.**

**NOTES:**

1) In all cases, the three pipelines, EVAL_CHOIENUMSUM, EVAL_OMEGA, and EVAL_LAMBDAprime must end at the same time. See the rows labeled "pipeline ends at". The rule for the delay stages are as follows:

a. EVAL_CHOIENUMSUM delay stage = 1 if \( t \) is odd, and a K_val stage is needed, otherwise it is 0.
b. EVAL_OMEGA delay stage = 2 if \( t \) is odd, and a K_val stage is needed, otherwise it is 1.

c. EVAL_LAMBDA_PRIME delay stage = 1 if \( t \) is even, and a K_val stage is not needed, otherwise it is 0.

2) The K_val stage corresponds to the multiplication of the EVAL_LAMBDA_PRIME pipeline by \( \alpha^{-(t-m_0)} \). It is needed if \( m_0 \), the log of the initial root is not 1. Forney's algorithm for the error, \( e_i \), at position \( \alpha^i \) is:

\[
e_i = \frac{\Omega(\alpha^i)}{\Lambda(\alpha^i)} = \frac{\Omega(\alpha^i)}{\alpha^{-(t-m_0)} \Lambda(\alpha^i)}
\]

Clearly, when \( m_0 \) is 1, the first term in the denominator becomes \( \alpha^0 = 1 \), and so in this case:

\[
e_i = \frac{\Omega(\alpha^i)}{\Lambda(\alpha^i)}
\]

and thus the K_val stage is not needed.

The initial value of K_val is determined as follows. The initial value would be \( \alpha^{*(N-1)+1-m_0} \), if there were no pipeline delay. The register for K_val is at the same pipeline stage as the last register of the EVAL_LAMBDA_PRIME pipeline. For \( t \) even, the final register is at stage \( t-1 \), and so the initial value of K_val is

\[
\alpha^{*(N-1)+1-m_0} = \alpha^{*(N-1)+1-m_0}
\]

(7.12)

For \( t \) odd, the final register is at stage \( t \), and so the initial value of K_val is

\[
\alpha^{*(N-1)+1-m_0} = \alpha^{*(N-2)+1-m_0}
\]

(7.13)

The coefficients of the error-locator polynomial, \( \Lambda(x) \), are used in the Chien search. For a given value of \( t \), the EVAL_CHIENSUM pipeline is of length \( t \). To illustrate the various pipeline lengths, Figure 58, Figure 59, Figure 60, and Figure 61 show the Chien
various Chien search pipelines for \( t=4 \) and 5, with \( m_0=0 \) and 1. Figure 62 shows the block diagram for the error correction.

Figure 58. Chien Search Pipeline Block Diagram, \( t=4, m_0=1 \)

Figure 59. Chien Search Pipeline Block Diagram, \( t=4, m_0 \) not 1
Figure 60. Chien Search Pipeline Block Diagram, t=5, m_0 = 1

Figure 61. Chien Search Pipeline Block Diagram, t=5, m_0 not 1

Figure 62. Error Correction
Figure 63 shows the VHDL constants and signal definitions for the Chien search block. In this case, \( m = 8 \), \( N = 22 \), \( t = 4 \), and \( m_0 = 1 \). The control signals are those listed from \texttt{Count} to \texttt{OutputEnable}. The other signals are data flow related. Note that the only changes needed for other RS codes are the constants mentioned above, and the \( 2t \) powers of \( \alpha \) that are needed in the actual Chien search. The remainder of the code can be written as is.

Figure 64 shows the VHDL code for the initialization of the local signals \texttt{IntLambda} and \texttt{IntOmega}. These signals are set to the incoming Lambda and Omega polynomial coefficients when the input strobe \texttt{StartChien} arrives. Also shown is the code for the system counter and the counter enable. As can be seen, the system counter, \texttt{Count}, is set to 0 when input strobe \texttt{StartChien} arrives, and increments every clock cycle as long as the \texttt{CountEnable} is active. The control signal is set to '1' (active) when \texttt{StartChien} arrives, and is set to '0' (inactive) when the counter reaches the maximum value as defined by the constant \texttt{CountMax}, which is equal to the sum of the total number of symbols, \( N \), and the Chien search pipeline delay, \( t+1 \).

Figure 65 shows the code for two control signals, \texttt{InitChien} and \texttt{DoChien}, and the output strobe, \texttt{RS_Data_out_Start}. The signal \texttt{InitChien} is a one clock cycle delayed version of \texttt{StartChien}, and \texttt{DoChien} is a one clock strobe that happens after \texttt{InitChien}. The output strobe is a one clock period pulse that occurs when the counter reaches the value defined by \texttt{RS_Data_out_StartCount}, namely, the Chien search pipeline delay, \( t+1 \).
constant GFpower : integer := 8;
subtype Galois_Field_element is std_logic_vector((GFpower-1) downto 0);
constant RS_T : integer := 4;
constant RS_N : integer := 22;
constant RS_M0 : integer := 1;
constant key_equation_solver_delay : integer := 2*RS_T+2;
constant ChienSearch_pipeline_delay : integer := RS_T+1;
constant SR_max : integer := RS_N * key_equation_solver_delay +
  ChienSearch_pipeline_delay;
constant OutputEnableStartCount : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N+ChienSearch_pipeline_delay-1, GFpower+1);
constant OutputEnableStopCount : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N+ChienSearch_pipeline_delay-1, GFpower+1);
constant DoChienCountMax : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N+ChienSearch_pipeline_delay-1, GFpower+1);
constant CountMax : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N+ChienSearch_pipeline_delay, GFpower+1);
constant zero : Galois_Field_element := "00000000";
constant alpha1 : Galois_Field_element := "00000010";
constant alpha2 : Galois_Field_element := "00001000";
constant alpha3 : Galois_Field_element := "00000100";
constant alpha4 : Galois_Field_element := "00010000";
constant alpha213 : Galois_Field_element := "11111011";
constant alpha2101 : Galois_Field_element := "10110010";
constant alpha186 : Galois_Field_element := "01101110";
constant alpha199 : Galois_Field_element := "01110011";

type Poly1Type is array(0 to RS_T) of Galois_Field_element;
subtype Poly1Type is array(0 to RS_T-1) of Galois_Field_element;
subtype Poly2Type is array(1 to RS_T) of Galois_Field_element;
subtype Poly1Type is array(0 to RS_T-1) of Galois_Field_element;
subtype Poly2Type is array(0 to RS_T-1) of Galois_Field_element;

signal Count : std_logic_vector(GFpower downto 0);
signal CountEnable : std_logic;
signal InitChien : std_logic;
signal DoChien : std_logic;
signal OutputEnable : std_logic;
signal shift_register_delay : shift_register_delay_type;
signal RS_data_delayed : Galois_Field_element;
signal Lambda : Poly1Type;
signal Omega : Poly2Type;
signal IntLambda_0 : Galois_Field_element;
signal IntLambda_1 : Galois_Field_element;
signal IntLambda_3 : Galois_Field_element;
signal IntOmega : Poly2Type;
signal X1 : Galois_Field_element;
signal X2 : Galois_Field_element;
signal X3 : Galois_Field_element;
signal X1 D1 : Galois_Field_element;
signal EVAL_OM_1 : Galois_Field_element;
signal EVAL_OM_2 : Galois_Field_element;
signal EVAL_OM : Galois_Field_element;
signal EVAL_OM_D1 : Galois_Field_element;
signal EVAL_LP : Galois_Field_element;
signal EVAL_DEN : Galois_Field_element;
signal DEN_INV : Galois_Field_element;
signal CORR_FACTOR : Galois_Field_element;
signal Temp1 : Galois_Field_element;
signal Temp2 : Galois_Field_element;
signal Temp3 : Galois_Field_element;
signal Temp4 : Galois_Field_element;
signal ChienSum : Galois_Field_element;
signal EVAL_CS_1 : Galois_Field_element;
signal EVAL_CS_2 : Galois_Field_element;
signal EVAL_CS_3 : Galois_Field_element;

Figure 63. Chien Search Constants and Signals
<table>
<thead>
<tr>
<th>Lambda(0)</th>
<th>&lt;=</th>
<th>lambda_poly (7 downto 0);</th>
</tr>
</thead>
<tbody>
<tr>
<td>Lambda(1)</td>
<td>&lt;=</td>
<td>lambda_poly (15 downto 8);</td>
</tr>
<tr>
<td>Lambda(2)</td>
<td>&lt;=</td>
<td>lambda_poly (23 downto 16);</td>
</tr>
<tr>
<td>Lambda(3)</td>
<td>&lt;=</td>
<td>lambda_poly (31 downto 24);</td>
</tr>
<tr>
<td>Omega(0)</td>
<td>&lt;=</td>
<td>omega_poly (7 downto 0);</td>
</tr>
<tr>
<td>Omega(1)</td>
<td>&lt;=</td>
<td>omega_poly (15 downto 8);</td>
</tr>
<tr>
<td>Omega(2)</td>
<td>&lt;=</td>
<td>omega_poly (23 downto 16);</td>
</tr>
<tr>
<td>Omega(3)</td>
<td>&lt;=</td>
<td>omega_poly (31 downto 24);</td>
</tr>
</tbody>
</table>

Init_IntLambda_and_IntOmega : process (clk) begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      IntLambda_0 <= zero;
      IntLambda_1 <= zero;
      for i in 0 to RS_T-1 loop
        IntOmega(i) <= zero;
      end loop;
    elsif (StartChien = '1') then
      IntLambda_0 <= Lambda(0);
      IntLambda_1 <= Lambda(1);
      for i in 0 to RS_T-1 loop
        IntOmega(i) <= Omega(i);
      end loop;
    end if;
  end if;
end process;

System_Counter : process (clk) begin
  if (clk'event and clk = '1') then
    if (!reset_n = '0') or (StartChien = '1') then
      Count <= '0';
    elsif (CountEnable = '1') then
      Count <= Count + 1;
    end if;
  end if;
end process;

System_Count_Enable : process (clk) begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      CountEnable <= '0';
    elsif (StartChien = '1') then
      CountEnable <= '1';
    elsif (Count = CountMax) then
      CountEnable <= '0';
    end if;
  end if;
end process;

Figure 64. Internal Lambda and Omega Process and System Counter
Figure 65. Internal Control Signal Processes

Figure 66 shows the code for the X1 pipeline and the powers of X1 pipeline. The signal X1 represents the inverse of the present location in the Chien search. Its initial value is equal to $\alpha^{-W}$. Every clock cycle after initialization, it is multiplied by $\alpha$. The X1_D signals are delayed copies of X1, and are used to generate the powers of X1. The powers_of_X1 pipeline generates all of the powers up to t-1. These values are used in the polynomial evaluation of the derivative of the error-locator polynomial and the error-evaluator polynomial. The code is generic, except for the reference to the initial value of X1.
Figure 66. Present Location X1 and Powers of X1 Pipelines

Figure 67 shows the code for the evaluation of the derivative of the error-locator polynomial (EVAL_LP_Pipeline) and the error-evaluator polynomial (EVAL_OM_Pipeline). The EVAL_LP_Pipeline include the required inversion. Also included is the code for the correction factor calculation. This portion of the VHDL code is not generic, and must be tailored for each RS code as per Table 10.

Figure 68 shows the code for the Chien search pipeline. Since the Chien sum is formed by a pipeline, the Temp registers follow the pattern of Temp_i = α^(-N+1_i). This ensures that the Chien sum at the end of the pipeline is correct. This portion of the VHDL code is also not generic, and must be tailored for each RS code as per Table 10.

Figure 69 shows the code for the data delay, and the error correction. The incoming delayed data is corrected by the correction factor when the Chien sum is 0. The length of the data delay, SR_max, is equal to RS_N + key_equation_solver_delay + ChienSearch_pipeline_delay. The code for these two processes is generic and can be reused for other RS codes without modification.
EVAL_OM_Pipeline : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      EVAL_OM_1 <= zero;
      EVAL_OM_2 <= zero;
      EVAL_OM <= zero;
      EVAL_OM_D1 <= zero;
    else
      EVAL_OM_1 <= add(IntOmega(0), mul(IntOmega(1), X1));
      EVAL_OM_2 <= add(EVAL_OM_1, mul(IntOmega(2), X2));
      EVAL_OM <= add(EVAL_OM_2, mul(IntOmega(3), X3));
      EVAL_OM_D1 <= EVAL_OM;
    end if;
  end if;
end process;

EVAL_LP_Pipeline : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      EVAL_LP <= zero;
      EVAL_DEN <= zero;
      DEN_INV <= zero;
    else
      EVAL_LP <= add(IntLambda_1, mul(IntLambda_2, X2));
      EVAL_DEN <= EVAL_LP; -- RS_T_is_even, K_mul stage is not needed, EVAL_LAMBDA Prime
delay stage = 1
      DEN_INV <= inv(EVAL_DEN); -- K_mul stage is needed or EVAL_LAMBDA Prime delay
      stage = 1
    end if;
  end if;
end process;

Calc_Correction_Factor : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      CORR_FACTOR <= (others=>'0');
    elsif (rs_enable = '1') then
      CORR_FACTOR <= mul(EVAL_OM_D1, DEN_INV);
    else
      CORR_FACTOR <= (others=>'0');
    end if;
  end if;
end process;

Figure 67. Evaluation of Omega and Lambda Prime Pipelines, and Calculation of the Correction Factor
Calc_Temp_Registers : process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    Temp1 <= zero;
    Temp2 <= zero;
    Temp3 <= zero;
    Temp4 <= zero;
  elsif (InitChien = '1') then
    Temp1 <= mul(alpha234, Lambda(1)); -- 234 = (256 - 22)* 1 mod 255 = 234 mod 255
    Temp2 <= mul(alpha221, Lambda(2)); -- 211 = (256 - 23)* 2 mod 255 = 211 mod 255
    Temp3 <= mul(alpha186, Lambda(3)); -- 186 = (256 - 24)* 3 mod 255 = 186 mod 255
    Temp4 <= mul(alpha159, Lambda(4)); -- 159 = (256 - 25)* 4 mod 255 = 159 mod 255
  elsif (DoChien = '1') then
    Temp1 <= mul(alpha1, Temp1);
    Temp2 <= mul(alpha2, Temp2);
    Temp3 <= mul(alpha3, Temp3);
    Temp4 <= mul(alpha4, Temp4);
  end if;
end if;
eend process;

Chien_Sum_Pipeline : process (clk)
begin
if (clk'event and clk = '1') then
  if ((reset_n = '0') or (StartChien = '1')) then
    EVAL_CS_1 <= zero;
    EVAL_CS_2 <= zero;
    EVAL_CS_3 <= zero;
    ChienSum <= zero;
  elsif (DoChien = '1') then
    EVAL_CS_1 <= add(IntLambda_0, Temp1);
    EVAL_CS_2 <= add(EVAL_CS_1, Temp2);
    EVAL_CS_3 <= add(EVAL_CS_2, Temp3);
    ChienSum <= add(EVAL_CS_3, Temp4);
  end if;
end if;
eend process;

Figure 68. Chien Sum Pipeline
Delay_RS_Data : process (clk)
begin
  if (clk'event and clk = 'l') then
    if (reset_n = '0') then
      for i in 0 to SR_max loop
        shift_register_delay(i) <= (others=>'0');
      end loop;
    else
      for i in SR_max downto 1 loop
        shift_register_delay(i) <= shift_register_delay(i-1);
      end loop;
      shift_register_delay(0) <= RS_Data_In;
    end if;
  end if;
end process;
RS_data_delayed <= shift_register_delay(SR_max);

Correct_RS_Data : process (clk)
begin
  if (clk'event and clk = 'l') then
    if (reset_n = '0') then
      RS_Data_out <= zero;
    elsif ((ChienSum = zero) and (OutputEnable = 'l')) then
      RS_Data_out <= add(CORR_FACTOR,RS_data_delayed);
    elsif (OutputEnable = 'l') then
      RS_Data_out <= RS_data_delayed;
    else
      RS_Data_out <= zero;
    end if;
  end if;
end process;

Figure 69. Data Delay, and Error Correction Processes
9 Synthesis and Test Results for RS Decoders

RS decoder designs were synthesized for codes with different error-correcting capabilities across 3 values of the degree of the field generator polynomial, m, namely, 4, 6 and 8. The results are discussed in the next 2 sections.

9.1 RS Decoder Synthesis Speed Results

Several RS decoder designs were chosen as candidate designs. The error-correcting capability ranged from 1 to 7. This was applied to codes from GF(2^4), GF(2^6), and GF(2^8). The synthesis flow consisted of the Synplicity synthesis tool (from Synplicity) and Xilinx software for the final place-and-route and timing analysis. The speed results are tabulated in Table 11 and shown graphically in Figure 70. As can be seen from the data, as the value of m increases, the maximum frequency decreases, due to the increasing complexity of the Galois field multipliers and inverters. Also, when t=1 or t=2, the maximum frequency is higher than when t > 2, due to the simpler decoding process, that is, no key equation solver. As t increases, when t>2, the maximum frequency is relatively flat. The input bit rate is shown along with the clock speed. Since the decoder is working on a Galois field symbol by symbol basis, the equivalent input bit rate is the clock speed multiplied by the Galois field symbol width 4, 6, or 8 as the case may be.
<table>
<thead>
<tr>
<th>Error-correcting ability</th>
<th>GF(2^4)</th>
<th></th>
<th>GF(2^6)</th>
<th></th>
<th>GF(2^8)</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Clock Speed (MHz)</td>
<td>Input Bit Rate (Mps)</td>
<td>Clock Speed (MHz)</td>
<td>Input Bit Rate (Mps)</td>
<td>Clock Speed (MHz)</td>
<td>Input Bit Rate (Mps)</td>
</tr>
<tr>
<td>1</td>
<td>97.94</td>
<td>391.8</td>
<td>105.33</td>
<td>632.0</td>
<td>89.53</td>
<td>716.2</td>
</tr>
<tr>
<td>2</td>
<td>134.37</td>
<td>537.5</td>
<td>129.89</td>
<td>779.3</td>
<td>98.14</td>
<td>785.1</td>
</tr>
<tr>
<td>3</td>
<td>90.33</td>
<td>361.3</td>
<td>82.37</td>
<td>494.2</td>
<td>60.53</td>
<td>484.3</td>
</tr>
<tr>
<td>4</td>
<td>84.60</td>
<td>338.4</td>
<td>76.98</td>
<td>461.9</td>
<td>59.03</td>
<td>472.3</td>
</tr>
<tr>
<td>5</td>
<td>84.60</td>
<td>338.4</td>
<td>74.52</td>
<td>447.1</td>
<td>59.38</td>
<td>475.1</td>
</tr>
<tr>
<td>6</td>
<td>75.47</td>
<td>301.9</td>
<td>74.52</td>
<td>447.1</td>
<td>59.38</td>
<td>475.1</td>
</tr>
<tr>
<td>7</td>
<td>82.37</td>
<td>329.5</td>
<td>70.57</td>
<td>423.4</td>
<td>59.59</td>
<td>476.8</td>
</tr>
</tbody>
</table>

Table 11. Maximum Decoder Speed (MHz) vs. Error Correcting Ability

![RS Decoder Speed](image)

**Figure 70. Reed-Solomon Decoder Maximum Speed**
9.2 RS Decoder Synthesis Area Results

The area results are tabulated in Table 12 and shown graphically in Figure 71. It is clear from the results that area is a linear function of the error-correcting capability. There is a small kink in the curve at t=2 due to the change over in the decoding algorithms used. The decoder area increases with increasing error-correcting capability, again, as a consequence of evermore complex Galois field multipliers and inverters. As can be compared to the encoders, the RS decoders are bigger than the RS encoders by a factor of 6 to 20, or roughly, an order of magnitude.

<table>
<thead>
<tr>
<th>Error-correcting ability</th>
<th>GF($2^t$)</th>
<th>GF($2^h$)</th>
<th>GF($2^h$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>124</td>
<td>159</td>
<td>279</td>
</tr>
<tr>
<td>2</td>
<td>200</td>
<td>446</td>
<td>941</td>
</tr>
<tr>
<td>3</td>
<td>448</td>
<td>719</td>
<td>1336</td>
</tr>
<tr>
<td>4</td>
<td>535</td>
<td>929</td>
<td>1656</td>
</tr>
<tr>
<td>5</td>
<td>653</td>
<td>1110</td>
<td>2026</td>
</tr>
<tr>
<td>6</td>
<td>759</td>
<td>1352</td>
<td>2358</td>
</tr>
<tr>
<td>7</td>
<td>963</td>
<td>1473</td>
<td>2840</td>
</tr>
</tbody>
</table>

Table 12. Decoder Area (in slices) vs. Error Correcting Ability
Figure 71. Reed-Solomon Decoder Area (in slices)

9.3 Decoder Testbench

As for the encoders, the performance of the decoders described above was verified with a VHDL testbench. Fixed and random error patterns were injected, and the output of the syndrome calculation block, the key equation solver, and the Chien search were verified. The VHDL code for the decoder testbench is shown in Appendix F.
10 Comparison of Generated Cores to Available Cores

10.1 Encoder Cores

This section will compare the area and speed metrics obtained for the generated encoder cores with those of other commercially available encoder cores. Table 13 summarizes the area and speed metrics for several encoder cores. The encoders listed use GF($2^5$), and are 8 error-correcting RS codes. The information regarding these cores is limited to the applicable data sheets. As such, the target technology may not be exactly the same as that listed for the encoder cores listed in this thesis. Two of the commercial cores have targeted reconfigurable hardware, (Actel and Xilinx). The other two target ASIC technology. It should be noted that the target technology used in this thesis, namely the Xilinx XC6100E Field Programmable Gate Array, is not the fastest series of FPGA from Xilinx. The XC6100E is part of the Virtex I series. Over the last 2 years, this has been superseded by the Virtex II and the Virtex II Pro series, both of which include parts with higher density and better performance. However, even considering this fact, the observed performance in terms of clock frequency is comparable with commercial cores.

<table>
<thead>
<tr>
<th>Company</th>
<th>Target Technology</th>
<th>Speed</th>
<th>Area</th>
<th>Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td>Machine Learning Laboratory Inc.,</td>
<td>Actel AXcelerator AX500-3</td>
<td>119 MHz</td>
<td>7%</td>
<td>7%</td>
</tr>
<tr>
<td>Memec Design</td>
<td>Xilinx V50-6</td>
<td>113 MHz</td>
<td>94 CLBs</td>
<td>[20]</td>
</tr>
<tr>
<td>4iiii Communications</td>
<td>ASIC 0.35 micron</td>
<td>318 MHz</td>
<td>2260 gates</td>
<td>[21]</td>
</tr>
<tr>
<td>Telecommunications and Information Technologies</td>
<td>ASIC 0.6 micron</td>
<td>55 MHz</td>
<td>2464 gates</td>
<td>[22]</td>
</tr>
</tbody>
</table>

Table 13. Device Utilization and Performance of Existing Encoder Cores
Table 14 shows the characteristics of the Xilinx Virtex FPGA Family of which the XCV1000E is a member. The quoted [14] system performance is 200 MHz, and an equivalent 1,124,022 system gates. Table 15 shows the characteristics of the Actel Axcelerator Family of which the AX500-3 is a member. In terms of a direct comparison, the equivalent system gates will be used. For an encoder using $\text{GF}(2^8)$ and correcting up to 8 errors, the number of system gates is 3080 , running at 140 MHz. The Actel core 35000 system gates, running at 119 MHz. While the frequencies of operation are comparable, the equivalent system gates between the two technologies differ considerably.

The most direct comparison can be made with the Memec core, which is targeted, to the Xilinx Virtex V50-6 FPGA. This core runs at 113 MHz and uses 94 CLBs, whereas the generated cores run at 140 MHz and use 80 CLBs. The next 2 cores are targeted to ASIC technology, and we can see that in the case of the core from 4i2i Communications is over 2 times the speed of the FPGA core. which is to be expected. ASICs are still faster than the best FPGAs. A comparison of area is not possible since the target technologies are too radically different.
<table>
<thead>
<tr>
<th>Device</th>
<th>System Gates</th>
<th>CLB Array</th>
<th>Logic Cells</th>
<th>Maximum Available I/O</th>
<th>Block RAM Bits</th>
<th>Maximum SelectRAM™ Bits</th>
</tr>
</thead>
<tbody>
<tr>
<td>XCV50</td>
<td>57,966</td>
<td>16x24</td>
<td>1,725</td>
<td>180</td>
<td>32,758</td>
<td>24,579</td>
</tr>
<tr>
<td>XCV100</td>
<td>106,904</td>
<td>20x30</td>
<td>2,700</td>
<td>180</td>
<td>40,960</td>
<td>38,400</td>
</tr>
<tr>
<td>XCV150</td>
<td>164,674</td>
<td>24x36</td>
<td>3,886</td>
<td>260</td>
<td>49,152</td>
<td>55,293</td>
</tr>
<tr>
<td>XCV200</td>
<td>236,666</td>
<td>28x42</td>
<td>5,292</td>
<td>284</td>
<td>57,344</td>
<td>75,264</td>
</tr>
<tr>
<td>XCV300</td>
<td>322,970</td>
<td>32x48</td>
<td>9,112</td>
<td>316</td>
<td>65,536</td>
<td>98,304</td>
</tr>
<tr>
<td>XCV400</td>
<td>468,232</td>
<td>40x60</td>
<td>10,800</td>
<td>404</td>
<td>81,920</td>
<td>153,600</td>
</tr>
<tr>
<td>XCV600</td>
<td>661,111</td>
<td>48x72</td>
<td>15,382</td>
<td>512</td>
<td>99,394</td>
<td>221,184</td>
</tr>
<tr>
<td>XCV800</td>
<td>888,430</td>
<td>56x84</td>
<td>21,168</td>
<td>512</td>
<td>114,688</td>
<td>301,056</td>
</tr>
<tr>
<td>XCV1000</td>
<td>1,124,022</td>
<td>64x96</td>
<td>27,648</td>
<td>512</td>
<td>131,072</td>
<td>393,216</td>
</tr>
</tbody>
</table>

**Table 14. Xilinx Virtex FPGA Family Members**

<table>
<thead>
<tr>
<th></th>
<th>AX125</th>
<th>AX250</th>
<th>AX500</th>
<th>AX1000</th>
<th>AX2000</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Equiv. System Gates</strong></td>
<td>125,000</td>
<td>250,000</td>
<td>500,000</td>
<td>1,000,000</td>
<td>2,000,000</td>
</tr>
<tr>
<td><strong>Typical Gates</strong></td>
<td>82,000</td>
<td>154,000</td>
<td>286,000</td>
<td>612,000</td>
<td>1,060,000</td>
</tr>
<tr>
<td><strong>Total RAM Bits</strong></td>
<td>29,184</td>
<td>71,680</td>
<td>95,332</td>
<td>198,912</td>
<td>338,688</td>
</tr>
<tr>
<td><strong>Max Registers</strong></td>
<td>1,344</td>
<td>2,816</td>
<td>5,376</td>
<td>12,096</td>
<td>21,504</td>
</tr>
<tr>
<td><strong>Total Modules</strong></td>
<td>2,016</td>
<td>4,224</td>
<td>8,064</td>
<td>18,144</td>
<td>32,256</td>
</tr>
<tr>
<td><strong>Dedicated Registers</strong></td>
<td>672</td>
<td>1,408</td>
<td>2,888</td>
<td>5,048</td>
<td>10,752</td>
</tr>
<tr>
<td><strong>RAM Blocks</strong></td>
<td>4</td>
<td>12</td>
<td>15</td>
<td>36</td>
<td>64</td>
</tr>
<tr>
<td><strong>PerPin FIFOs</strong></td>
<td>168</td>
<td>248</td>
<td>336</td>
<td>516</td>
<td>684</td>
</tr>
<tr>
<td><strong>Max No. of LVDS Pairs</strong></td>
<td>84</td>
<td>124</td>
<td>168</td>
<td>258</td>
<td>342</td>
</tr>
<tr>
<td><strong>PLL's</strong></td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
</tr>
<tr>
<td><strong>I/O Registers</strong></td>
<td>504</td>
<td>744</td>
<td>1,008</td>
<td>1,548</td>
<td>2,052</td>
</tr>
<tr>
<td><strong>User I/O's</strong></td>
<td>168</td>
<td>248</td>
<td>336</td>
<td>516</td>
<td>684</td>
</tr>
<tr>
<td><strong>Packages</strong></td>
<td>CS180</td>
<td>FG256</td>
<td>FG484</td>
<td>FG484</td>
<td>FG676</td>
</tr>
<tr>
<td></td>
<td>FG256</td>
<td>FG484</td>
<td>FG676</td>
<td>FG676</td>
<td>FG895</td>
</tr>
<tr>
<td></td>
<td>FG324</td>
<td>FG484</td>
<td>FG968</td>
<td>FG968</td>
<td>FG1152</td>
</tr>
</tbody>
</table>

**Table 15. Accelerator Family Selection Guide**

130
10.2 Decoder Cores

This section will compare the area and speed metrics obtained for the generated decoder cores with those of other commercially available decoder cores. Table 16 summarizes the area and speed metrics for several decoder cores. The decoders listed use GF($2^8$), and are 8 error-correcting RS codes. Four of the commercial cores have targeted reconfigurable hardware, (Altera and Xilinx). The other core was targeted to ASIC technology.

The equivalent core produced by the VHDL core generator resulted in the use 3070 slices, and 77.6 MHz frequency of operation in a Xilinx XC2V500-4. This is more area compared to the average of the 3 cores that are specified for use with Xilinx FPGAs. In terms of frequency of operation, it is comparable with 4 of the cores. The first core from Machine Learning Laboratory Inc is quite fast, running at 143 MHz. In general, the core produced by the VHDL core generator is close to the average speed mentioned in the cores listed in Table 16. The area is approximately 50% more than the average.

<table>
<thead>
<tr>
<th>Company</th>
<th>Target Technology</th>
<th>Speed</th>
<th>Area</th>
<th>Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td>Machine Learning</td>
<td>Xilinx X2V1000-6</td>
<td>143 MHz</td>
<td>1950 slices</td>
<td>[17]</td>
</tr>
<tr>
<td>Laboratory Inc.,</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Memec Design</td>
<td>Xilinx XC2V250-6</td>
<td>77 MHz</td>
<td>1201 slices</td>
<td>[18]</td>
</tr>
<tr>
<td>Paxonet Communications</td>
<td>Xilinx XC2V500-4</td>
<td>97.9 MHz</td>
<td>2545 slices</td>
<td>[19]</td>
</tr>
<tr>
<td>4i2i Communications</td>
<td>ASIC 0.35 micron</td>
<td>75 MHz</td>
<td>37000 gates</td>
<td>[21]</td>
</tr>
<tr>
<td>Altera</td>
<td>Altera FLEX 10KE</td>
<td>61 MHz</td>
<td>2431 logic elements</td>
<td>[23]</td>
</tr>
</tbody>
</table>

Table 16. Device Utilization and Performance of Existing Decoder Cores
11 Conclusion

This thesis has introduced and proved the extended inversionless Massey-Berlekamp algorithm, which is a modest improvement over the inversionless Massey-Berlekamp algorithm for solving the key equation in a Reed-Solomon decoder. The error location and correction equations for an arbitrary RS code with an error correcting capability of less than or equal to 2 errors was also presented. Using this information, VHDL based designs for specific RS codes were implemented for both encoder and decoder. Once this was done, a program was written to automatically generate the VHDL code for either an RS encoder, or an RS decoder for an arbitrary RS code based on user inputs. It was shown that much of the VHDL code could be reused as is for any RS code, be it encoder or decoder. The rest of the code needed only minor modifications, or specifications of different constants to implement the new functionality. The design of the RS encoder and decoder were pipelined, enabling a substantial bit rate of operation. For the larger decoders, typical of those used in modern digital communication systems, bit rates over 450 Mbits/sec were achieved on FPGAs. For the encoders, bit rates of over 1 Gbits/sec were achieved on FPGAs. The end result is thus a VHDL core generator for Reed-Solomon encoders and decoders.

The following items may be considered for future work.

1. The VHDL code generated was written specifically to be as generic as possible, that is, no specific FPGA attributes were coded for. As such, delay blocks were implemented with shift register arrays. One could, however, code for specific use of the Xilinx FPGAs, namely, the use of BlockRAM, a large amount of internal dual port RAM available within a FPGA, which is there whether or not you use it. These RAMs could be used to implement delay blocks, and also the Galois field inverter. This will result in fewer CLB’s (logic blocks) being used.

2. The representation of the Galois field elements is that standard polynomial form. Other representations exist, such as dual basis representation used in the
Consultative Committee for Space Data Systems (CCSDS) standard [15], or the composite field representation suggested by C. Paar [16]. The composite field representation is of particular interest since it has been shown that faster and smaller Galois field multipliers and inverters are possible using this technique.

3. There exist several other algorithms for solving the key equation, namely, Euclid’s algorithm, and the original Massey-Berlekamp algorithm. These could be substituted for the extended inversionless Massey-Berlekamp algorithm presented here.
12 References


16. C. Paar and M. Rosner, "Comparison of Arithmetic Architectures for Reed-Solomon Decoders in Reconfigurable Hardware", FCCM '97


21. 4i2i Communications Ltd., “Reed-Solomon Core Data Sheet”. http://www.4i2i.com/4i2iRS_V2_220402.PDF


13 Appendix A - RS Encoder VHDL Code

This appendix contains the VHDL code generated for a RS encoder for a 3-error correcting code over GF(2^4), with \( p(x) = x^4 + x + 1 \).
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity rs_encoder is
port (reset_n : in std_logic;
      data_out : out std_logic_vector (3 downto 0);
      data_in : in std_logic_vector (3 downto 0);
      input_strobe : in std_logic;
      clk : in std_logic;
      output_strobe : out std_logic;
      data_size : in std_logic_vector (3 downto 0));
end;

architecture RTL of rs_encoder is
constant GFpower : integer := 4;
subtype Galois_Field_element is std_logic_vector((GFpower-1) downto 0);
constant RS_T : integer := 3;
constant Zero : Galois_Field_element := "0000";
constant one : Galois_Field_element := "0001";
constant alpha4 : Galois_Field_element := "0011";
constant alpha6 : Galois_Field_element := "1100";
constant alpha9 : Galois_Field_element := "1010";
constant alpha10 : Galois_Field_element := "0111";
constant alpha14 : Galois_Field_element := "1001";
signal DoCalc : std_logic;
signal count : std_logic_vector ((GFpower-1) downto 0);
signal sum : Galois_Field_element;
type parity_reg_type is array(0 to (2*RS_T-1)) of Galois_Field_element;
signal parity_reg : parity_reg_type;

function mul (b : in Galois_Field_element; c : in Galois_Field_element) return Galois_Field_element is
begin
  variable d : Galois_Field_element;
  d(0) := (b(0) and c(0)) xor (b(1) and c(1)) xor (b(2) and c(2)) xor (b(3) and c(3));
  d(1) := (b(0) and c(1)) xor (b(1) and c(0)) xor (b(2) and c(3)) xor (b(3) and c(2));
  d(2) := (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(3) and c(3));
  d(3) := (b(0) and c(3)) xor (b(1) and c(2)) xor (b(2) and c(1)) xor (b(3) and c(0));
  return d;
end mul;

function add (b : in Galois_Field_element; c : in Galois_Field_element) return Galois_Field_element is
begin
  variable d : Galois_Field_element;
  d(0) := (b(0) xor c(0));
  d(1) := (b(1) xor c(1));
  d(2) := (b(2) xor c(2));
  d(3) := (b(3) xor c(3));
  return d;
end add;

begin
  process (clk)
  begin
    if (clk'event and clk = '1') then
      if (reset_n = '0') then
        count <= (others => '0');
        elsif (input_strobe = '1') then
          count <= one;
          elsif (DoCalc = '1') then
            count <= count + 1;
            else count <= (others => '0');
            end if;
      end if;
    end if;
  end process;
end;
process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      DoCalc <= '0';
    elsif (input_strobe = '1') then
      DoCalc <= '1';
    elsif (count-data_size) then
      DoCalc <= '0';
    end if;
  end if;
end process;

process (parity_reg, data_in)
begin
  sum <= add(parity_reg((2*RS_T-1)),data_in);
end process;

process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      for i in 0 to (2*RS_T-1) loop
        parity_reg(i) <= (others=>'0');
      end loop;
    elsif (input_strobe = '1') then
      for i in 0 to (2*RS_T-1) loop
        parity_reg(i) <= (others=>'0');
      end loop;
    elsif (DoCalc = '1') then
      parity_reg( 5) <= add(parity_reg( 4),mul(sum,alpha0));
      parity_reg( 4) <= add(parity_reg( 3),mul(sum,alpha1));
      parity_reg( 3) <= add(parity_reg( 2),mul(sum,alpha4));
      parity_reg( 2) <= add(parity_reg( 1),mul(sum,alpha6));
      parity_reg( 1) <= add(parity_reg( 0),mul(sum,alpha9));
      parity_reg( 0) <= mul(sum,alpha6);
    else
      for i in (2*RS_T-1) downto 1 loop
        parity_reg(i) <= parity_reg(i-1);
      end loop;
      parity_reg(0) <= (others=>'0');
    end if;
  end if;
end process;

process(clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      data_out <= (others=>'0');
    elsif (DoCalc='1') then
      data_out <= data_in;
    else
      data_out <= parity_reg((2*RS_T-1));
    end if;
  end if;
end process;

process(clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      output_strobe <= '0';
    else
      output_strobe <= input_strobe;
    end if;
  end if;
end process;
end;
14 Appendix B - RS Decoder VHDL Code (1 error)

This appendix contains the VHDL code generated for an RS decoder for a 1-error correcting code over GF($2^6$), with $\rho(x) = x^6 + x + 1$. The value of $N$ is 14 and the value of the log of the initial root of the code generator polynomial is 25. This is a RS(14,12) code.
library ieee; use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity InputProcess is
port (
  reset_n : in std_logic;
  clk : in std_logic;
  rs_data_in : in std_logic_vector (5 downto 0);
  rs_data_in_start : in std_logic;
  syndrome_calc_done : out std_logic;
  errors_present : out std_logic;
  syndrome : out std_logic_vector (11 downto 0)
);
end;

architecture RTL of InputProcess is
constant GPPower : integer := 6;
subtype Galois_Field_element is std_logic_vector((GPPower-1) downto 0);
constant RS_T : integer := 1;
constant RS_N : integer := 14;
constant RS_MO : integer := 37;
constant Two_T_minus_1 : integer := 2*RS_T-1;
constant zero : Galois_Field_element := "000000";
constant one : Galois_Field_element := "000001";
constant alpha37 : Galois_Field_element := "101100";
constant alpha38 : Galois_Field_element := "011011";
type IntS_type is array(0 to Two_T_minus_1) of Galois_Field_element;
type state_type is (Idle, RxData, XferData);
signal present_state : state_type;
signal IntCount : std_logic_vector (3 downto 0);
signal internal_errors_present : std_logic;
signal CountEnable : std_logic;
signal CountReset : std_logic;
signal StartXfer : std_logic;
signal XferSyndrome : std_logic;
signal DoCalc : std_logic;
signal IntS : IntS_type;
signal IntEP : IntEP_type;
function add (b, c : Galois_Field_element) return Galois_Field_element is
  variable d : Galois_Field_element;
begin
  d(0) := (b(0) xor c(0));
  d(1) := (b(1) xor c(1));
  d(2) := (b(2) xor c(2));
  d(3) := (b(3) xor c(3));
  d(4) := (b(4) xor c(4));
  d(5) := (b(5) xor c(5));
  return d;
end add;

function mul (b, c : Galois_Field_element) return Galois_Field_element is
  variable d : Galois_Field_element;
begin
  d(0) := (b(0) and c(0)) xor (b(1) and c(5)) xor (b(2) and c(4)) xor (b(3) and c(1));
  d(1) := (b(0) and c(1)) xor (b(1) and c(0)) xor (b(2) and c(5)) xor (b(3) and c(4)) xor (b(4) and c(2));
  d(2) := (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(3) and c(5)) xor (b(4) and c(1));
  d(3) := (b(0) and c(3)) xor (b(1) and c(2)) xor (b(2) and c(1)) xor (b(3) and c(0)) xor (b(4) and c(5));
  d(4) := (b(0) and c(4)) xor (b(1) and c(3)) xor (b(2) and c(2)) xor (b(3) and c(1)) xor (b(4) and c(0));
  d(5) := (b(0) and c(5)) xor (b(1) and c(4)) xor (b(2) and c(3)) xor (b(3) and c(2));
  return d;
end mul;

begin
  process (clk)
  begin
    if (clk and clk = '1') then
      if (reset_n = '0') or (CountReset = '1') then
        IntCount <= (others=0);
      elsif (CountEnable = '1') then
        IntCount <= IntCount + 1;
      end if;
    end if;
  end process:

  process (clk)
  begin
    if (clk and clk = '1') then
      if (reset_n = '0') then
        StartXfer <= '0';
      elsif (IntCount="1011") then -- RS(14,12) : max_count = 11
        StartXfer <= '1';
      else
        StartXfer <= '0';
      end if;
    end if;
  end process:

  process (clk)
  begin
    if (clk and clk = '1') then
      if (reset_n = '0') then
        for i in 0 to Two_T_minus_L loop
          IntS(i) <= '0';
        end loop;
      elsif (rs_data_in_start = '1') then
        for i in 0 to Two_T_minus_L loop
          IntS(i) <= rs_data_in;
        end loop;
      else (DoCalc='1') then

IntS(0) <= add(mul(alpha7, IntS(0)), rs_data_in);
IntS(1) <= add(mul(alpha8, IntS(1)), rs_data_in);
end if;
end if;
end process;

process (clk)
begin
if (clk'event and clk='1') then
if (reset_n = '0') then
  syndrome <= (others=>'0');
  internal_errors_present <= '0';
elsif (XferSyndrome='1') then
  syndrome <=
  IntS(1) & IntS(0);
  internal_errors_present <=
  IntEP(0) or IntEP(1);
end if;
end if;
end process;

process (IntS)
begin
  for i in 0 to Two_T_minus_1 loop
    IntEP(i)<='1';
    if (IntS(i)=zero) then IntEP(i)<='0'; end if;
  end loop;
end process;

InputControlSD_Idle : process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    XferSyndrome<='0';
    DoCalc<='0';
    CountReset<='1';
    CountEnable<='0';
    present_state <= Idle;
  else
    case present_state is
      when Idle =>
        if (rs_data_in_start = '1') then
          DoCalc<='1';
          CountReset<='0';
          CountEnable<='1';
          present_state <= RxData;
        else
          present_state <= Idle;
        end if;
      when RxData =>
        if (StartXfer = '1') then
          XferSyndrome<='l';
          DoCalc<='0';
          CountReset<='1';
          CountEnable<='0';
          present_state <= XferData;
        else
          present_state <= RxData;
        end if;
      when XferData =>
        XferSyndrome<='0';
        DoCalc<='0';
        CountReset<='1';
        CountEnable<='0';
        present_state <= Idle;
      when others =>
        XferSyndrome<='0';
        DoCalc<='0';
        CountReset<='1';
        CountEnable<='0';
        present_state <= Idle;
    end case;
  end if;
end if;
end process;

143
end case;
end if;
end if;
end process;
syndrome_calc_done <= XferSyndrome;
errors_present <= internal_errors_present;
end;
----------------------------------------------------------------------------------
-- End of InputProcess
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
-- Start of ChienSearchProcess
----------------------------------------------------------------------------------

library ieee; use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;

entity ChienSearchProcess is
  port ( 
    clk : in std_logic;
    reset_n : in std_logic;
    syndromes : in std_logic_vector (11 downto 0);
    StartChien : in std_logic;
    re_enable : in std_logic;
    RS_Data_In : in std_logic_vector (5 downto 0);
    RS_Data_out : out std_logic_vector (5 downto 0);
    RS_Data_out_Start : out std_logic
  );
end;

architecture RTL of ChienSearchProcess is
constant GFpower : integer := 6;
constant RS_T : integer := 1;
constant RS_N : integer := 14;
constant RS_M0 : integer := 37;
constant ChienSearch_pipeline_delay : integer := 1;
constant SR_max : integer := RS_N + ChienSearch_pipeline_delay;
constant OutputEnableStartCount : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(ChienSearch_pipeline_delay-1, GFPower-1);
constant RS_Data_out_StartCount : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(ChienSearch_pipeline_delay, GFPower-1);
constant OutputEnableStopCount : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N-ChienSearch_pipeline_delay-1, GFPower-1);
constant DoChienCountMax : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N-ChienSearch_pipeline_delay-2, GFPower-1);
constant CountMax : std_logic_vector(GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR(RS_N-ChienSearch_pipeline_delay, GFPower-1);
constant zero : Galois_Field_element := "0000000";
constant one : Galois_Field_element := "0000001";
constant Temp1_initial_value : Galois_Field_element := "110100";
constant alpha(RS_N-1) = a50
constant x0a_m1 : Galois_Field_element := "101100";
constant alpha(RS_M0) = a37
constant alpha : Galois_Field_element := "000010";
constant x0a_initial_value : Galois_Field_element := "101001";
constant alpha(-(RS_N-1)*RS_M0) = a23

type shift_register_delay_type is array(0 to SR_max) of Galois_Field_element;
signal shift_register_delay : shift_register_delay_type;
signal RS_data_delayed : Galois_Field_element;
signal CORR_FACTOR : Galois_Field_element;
signal Count : std_logic_vector(GFpower downto 0);
signal CountEnable : std_logic;
signal InitChien : std_logic;
signal InitChien_d1 : std_logic;
signal DoChien : std_logic;
signal OutputEnable : std_logic;
signal inv_s0 : Galois_Field_element;
signal x0a : Galois_Field_element;
signal Temp1 : Galois_Field_element;
type syndrome_type is array(0 to 2*RS_T-1) of Galois_Field_element;
signal S : syndrome_type;

function inv (b : in std_logic_vector) return std_logic_vector is
  variable d : std_logic_vector (5 downto 0);
begin
  d := "000000";
  if (b="000001") then d := "000001";
  elsif (b="000010") then d := "100001";
  elsif (b="000011") then d := "111110";
  elsif (b="000100") then d := "110001";
  elsif (b="000101") then d := "101011";
  elsif (b="000110") then d := "011111";
  elsif (b="000111") then d := "101100";
  elsif (b="001000") then d := "111001";
  elsif (b="001001") then d := "100101";
  elsif (b="001010") then d := "110100";
  elsif (b="001011") then d := "011001";
  elsif (b="001100") then d := "100110";
  elsif (b="001101") then d := "110111";
  elsif (b="001110") then d := "011100";
  elsif (b="001111") then d := "110011";
  elsif (b="010000") then d := "111101";
  elsif (b="010001") then d := "101101";
  elsif (b="010010") then d := "110000";
  elsif (b="010011") then d := "100011";
  elsif (b="010100") then d := "101010";
  elsif (b="010101") then d := "110110";
  elsif (b="010110") then d := "011010";
  elsif (b="010111") then d := "110001";
  elsif (b="011000") then d := "100100";
  elsif (b="011001") then d := "111011";
  elsif (b="011010") then d := "010111";
  elsif (b="011011") then d := "111100";
  elsif (b="011100") then d := "101000";
  elsif (b="011101") then d := "110010";
  elsif (b="011110") then d := "011000";
  elsif (b="011111") then d := "111000";
  elsif (b="100000") then d := "111111";
  elsif (b="100001") then d := "100010";
  elsif (b="100010") then d := "111000";
  elsif (b="100011") then d := "101110";
  elsif (b="100100") then d := "110100";
  elsif (b="100101") then d := "010011";
  elsif (b="100110") then d := "111010";
  elsif (b="100111") then d := "011101";
  elsif (b="101000") then d := "101001";
  elsif (b="101001") then d := "110000";
  elsif (b="101010") then d := "100001";
  elsif (b="101011") then d := "111011";
  elsif (b="101100") then d := "101101";
  elsif (b="101101") then d := "111101";
  elsif (b="101110") then d := "010101";
  elsif (b="101111") then d := "111110";
  elsif (b="110000") then d := "110100";
  elsif (b="110001") then d := "100110";
  elsif (b="110010") then d := "110110";
  elsif (b="110011") then d := "010110";
  elsif (b="110100") then d := "101100";
  elsif (b="110101") then d := "111100";
  elsif (b="110110") then d := "011110";
  elsif (b="110111") then d := "111111";
  elsif (b="111000") then d := "101000";
  elsif (b="111001") then d := "111001";
  elsif (b="111010") then d := "010100";
  elsif (b="111011") then d := "111110";
  elsif (b="111100") then d := "010101";
  elsif (b="111101") then d := "111111";
  elsif (b="111110") then d := "010111";
  elsif (b="111111") then d := "111110";
end function;
```vhdl
function add (b, c : in Galois_Field_element) return Galois_Field_element is
variable d : Galois_Field_element;

begin
  d(0) := (b(0) xor c(0));
  d(1) := (b(1) xor c(1));
  d(2) := (b(2) xor c(2));
  d(3) := (b(3) xor c(3));
  d(4) := (b(4) xor c(4));
  d(5) := (b(5) xor c(5));
  return d;
end add;

function mul (b, c : in Galois_Field_element) return Galois_Field_element is
variable d : Galois_Field_element;

begin
  d(0) := (b(0) and c(0)) xor (b(1) and c(5)) xor (b(2) and c(4)) xor (b(3) and c(3)) xor (b(4) and c(2)) xor (b(5) and c(1));
  d(1) := (b(0) and c(1)) xor (b(1) and c(0)) xor (b(2) and c(5)) xor (b(3) and c(4)) xor (b(4) and c(3)) xor (b(5) and c(2));
  d(2) := (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(3) and c(5)) xor (b(4) and c(4)) xor (b(5) and c(3));
  d(3) := (b(0) and c(3)) xor (b(1) and c(2)) xor (b(2) and c(1)) xor (b(3) and c(0)) xor (b(4) and c(5)) xor (b(5) and c(4));
  d(4) := (b(0) and c(4)) xor (b(1) and c(3)) xor (b(2) and c(2)) xor (b(3) and c(1)) xor (b(4) and c(0)) xor (b(5) and c(5));
  d(5) := (b(0) and c(5)) xor (b(1) and c(4)) xor (b(2) and c(3)) xor (b(3) and c(2)) xor (b(4) and c(1)) xor (b(5) and c(0));
  return d;
end mul;

function IsNotZero(b : in std_logic_vector) return std_logic is
variable d : std_logic;

begin
  d :=
    b(0) or
    b(1) or
    b(2) or
    b(3) or
    b(4) or
    b(5);
  return d;
end IsNotZero;

begin
  S(1) <= syndromes(11 downto 6);
  S(0) <= syndromes(5 downto 0);

  System.Counter : process (clk)
  begin
    if clk'event and clk = '1' then
      if (((Reset_n = '0') or (StartChien = '1'))) then
        Count <= others => '0';
      elsif (CountEnable = '1') then
        Count <= Count + 1;
      end if;
    end if;
  end process;
```

System_Count_Enable : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      CountEnable <= '0';
    elsif (StartChien = '1') then
      CountEnable <= '1';
    elsif (Count = CountMax) then
      CountEnable <= '0';
    end if;
  end if;
end process;

InitChien_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      InitChien <= '0';
      InitChien_d1 <= '0';
    else
      InitChien <= StartChien;
      InitChien_d1 <= InitChien;
    end if;
  end if;
end process;

OutputEnable_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      OutputEnable <= '0';
    elsif (Count = OutputEnableStartCount) then
      OutputEnable <= '1';
    elsif (Count = OutputEnableStopCount) then
      OutputEnable <= '0';
    end if;
  end if;
end process;

RS_Data_out_Start_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      RS_Data_Out_Start <= '0';
    elsif (Count = RS_Data_out_StartCount) then
      RS_Data_Out_Start <= '1';
    else
      RS_Data_Out_Start <= '0';
    end if;
  end if;
end process;

DoChien_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (Count = DoChienCountMax)) then
      DoChien <= '0';
    elsif (InitChien_d1 = '1') then
      DoChien <= '1';
    end if;
  end if;
end process;

Calc_Correction_Factor : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      CORR_FACTOR <= zero;
    elsif (rs_enable = '1') and (Temp1=one) then
      CORR_FACTOR <= mulIS(0),x0a);
else
    CORR_FACTOR <= zero;
end if;
end if;
end process;

xoA_process : process (clk)
begin
if (clk'event and clk = '1') then
    if (reset_n = '0') then
        xoA <= zero;
    elsif (InitChien_d1 = '1') then
        xoA <= xoA_initial_value;
    else
        xoA <= mul(xoA, xoA_mul);
    end if;
end if;
end process;

Delay_RS_Data : process (clk)
begin
if (clk'event and clk = '1') then
    if (reset_n = '0') then
        for i in 0 to SR_max loop
            shift_register_delay(i) <= zero;
        end loop;
    else
        for i in SR_max downto 1 loop
            shift_register_delay(i) <= shift_register_delay(i-1);
        end loop;
        shift_register_delay(0) <= RS_Data_In;
    end if;
end if;
end process;

RS_data_delayed <= shift_register_delay(SR_max);

Correct_RS_Data : process (clk)
begin
if (clk'event and clk = '1') then
    if (reset_n = '0') then
        RS_Data_out <= zero;
    elsif (OutputEnable = '1') then
        RS_Data_out <= add(CORR_FACTOR, RS_data_delayed);
    else
        RS_Data_out <= zero;
    end if;
end if;
end process;

inv_x0_process : process(S)
begin
    inv_x0 <= inv(S(0));
end process;

Calc_Temp_Registers : process (clk)
begin
if (clk'event and clk = '1') then
    if (reset_n = '0') then
        Temp1 <= zero;
    elsif (StartChien = '1') then
        Temp1 <= Temp1_initial_value;
    elsif (InitChien = '1') then
        Temp1 <= mul(S(1), Temp1);
    elsif (InitChien_d1 = '1') then
        Temp1 <= mul(inv_x0, Temp1);
    elsif (DoChien = '1') then
        Temp1 <= mul(alphain1, Temp1);
    else
        Temp1 <= mul(xoA, Temp1);
    end if;
end if;
end process;
-- End of ChienSearchProcess

-- Start of RSTopProcess

library ieee;
use ieee.std_logic_1164.all;
entity rs_dec_top is
  generic (
    GPPower : INTEGER := 6;
    RS_T : INTEGER := 1
  );
  port (
    clk : in std_logic;
    reset_n : in std_logic;
    RS_Data_in : in std_logic_vector(GPPower - 1 downto 0);
    rs_data_in_start : in std_logic;
    RS_Data_out : out std_logic_vector(GPPower - 1 downto 0);
    RS_Data_out_Start : out std_logic;
    rs_enable : in std_logic
  );
end rs_dec_top;

use work.all;
architecture RTL of rs_dec_top is
  signal errors_present : std_logic;
  signal syndrome : std_logic_vector(2 * GPPower + RS_T + 1 downto 0);
  signal syndrome_calc_done : std_logic;
  component InputProcess
    port (
      reset_n : in std_logic;
      clk : in std_logic;
      rs_data_in : in std_logic_vector(5 downto 0);
      rs_data_in_start : in std_logic;
      syndrome_calc_done : out std_logic;
      errors_present : out std_logic;
      syndrome : out std_logic_vector(11 downto 0)
    );
  end component;
  component ChienSearchProcess
    port (
      clk : in std_logic;
      reset_n : in std_logic;
      syndromes : in std_logic_vector(11 downto 0);
      StartChien : in std_logic;
      rs_enable : in std_logic;
      RS_Data_In : in std_logic_vector(5 downto 0);
      RS_Data_out : out std_logic_vector(5 downto 0);
      RS_Data_out_Start : out std_logic
    );
  end component;
begin
inst_InputProcess : InputProcess
  port map (
    reset_n => reset_n,
    clk => clk,
    rs_data_in => RS_Data_In(GPPower - 1 downto 0),
    rs_data_in_start => rs_data_in_start,
    syndrome_calc_done => syndrome_calc_done,
    errors_present => errors_present,
    syndrome => syndrome,
    syndrome_calc_done => syndrome_calc_done,
    rs_enable => rs_enable,
    rs_data_in_start => rs_data_in_start,
    syndrome_calc_done => syndrome_calc_done,
    syndrome => syndrome,
    syndrome_calc_done => syndrome_calc done,
inst_ChienSearchProcess: ChienSearchProcess
    port map(
        clk => clk,
        reset_n => reset_n,
        syndrome => syndrome(2 * GFFPower - RS_T - 1 downto 0),
        StartChien => syndrome_calc_done,
        rs_enable => rs_enable,
        RS_Data_In => RS_Data_In(GFFPower - 1 downto 0),
        RS_Data_out => RS_Data_out(GFFPower - 1 downto 0),
        RS_Data_out_Start => RS_Data_out_Start);

end RTL;

------------------------------------------
-- End of RSTopProcess
------------------------------------------
Chapter 15 - RS Decoder VHDL Code (2 errors)

This appendix contains the VHDL code generated for an RS decoder for a 2-error correcting code over GF($2^5$), with $p(x) = x^5 + x^2 + 1$. The value of $N$ is 23 and the value of the log of the initial root of the code generator polynomial is 17. This is a RS(23,19) code.
library ieee; use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity InputProcess is
port (
  reset_n : in std_logic;
  clk : in std_logic;
  rs_data_in : in std_logic_vector (4 downto 0);
  rs_data_in_start : in std_logic;
  syndrome_calc_done : out std_logic;
  errors_present : out std_logic;
  syndrome : out std_logic_vector (19 downto 0)
);
end;

architecture RTL of InputProcess is
constant GFPower : integer := 5;
subtype Galois_Field_element is std_logic_vector((GFPower-1) downto 0);
constant RS_T : integer := 2;
constant RS_N : integer := 23;
constant RS_M0 : integer := 17;
constant Two_T_minus_1 : integer := 2*RS_T-1;
constant zero : Galois_Field_element := "00000";
constant one : Galois_Field_element := "00001";
constant alpha17 : Galois_Field_element := "10011";
constant alpha18 : Galois_Field_element := "00011";
constant alpha19 : Galois_Field_element := "00110";
constant alpha20 : Galois_Field_element := "01100";
type IntS_type is array(0 to Two_T_minus_1) of Galois_Field_element;
type IntS_type is array(0 to Two_T_minus_1) of std_logic;
type state_type is (Idle, RxData, XferData);
signal present_state : state_type;
signal IntCount : std_logic_vector (4 downto 0);
signal internal_errors_present : std_logic;
signal CountEnable : std_logic;
signal CountReset : std_logic;
signal StartXfer : std_logic;
signal XferSyndrome : std_logic;
signal DoCalc : std_logic;
signal IntS : IntS_type;
signal IntEP : IntEP_type;

function add (b : Galois_Field_element; c : Galois_Field_element) return Galois_Field_element is
begin
  d(0) := (b(0) xor c(0));
  d(1) := (b(1) xor c(1));
  d(2) := (b(2) xor c(2));
  d(3) := (b(3) xor c(3));
  d(4) := (b(4) xor c(4));
return d;
end add;

function mul (b : Galois_Field_element; c : Galois_Field_element) return Galois_Field_element is
begin
  d(0) := (b(0) and c(0)) xor (b(1) and c(4)) xor (b(2) and c(3)) xor (b(3) and c(2)) xor
           (b(4) and c(1)) xor (b(0) and c(4)) xor (b(1) and c(3));
  d(1) := (b(0) and c(1)) xor (b(1) and c(0)) xor (b(2) and c(1)) xor (b(3) and c(0)) xor
           (b(4) and c(1));
  d(2) := (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(3) and c(1)) xor
           (b(4) and c(1)) xor (b(2) and c(1));
  d(3) := (b(0) and c(3)) xor (b(1) and c(2)) xor (b(2) and c(1)) xor (b(3) and c(0)) xor
           (b(4) and c(4));
  d(4) := (b(0) and c(4)) xor (b(1) and c(3)) xor (b(2) and c(2)) xor (b(3) and c(1)) xor
           (b(4) and c(4));
return d;
end mul;

begin
process (clk) begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') or (CountReset = '1') then
      IntCount <= (others=>'0');
      elsif (CountEnable = '1') then
        IntCount <= IntCount + 1;
    end if;
  end if;
end process;

process (clk) begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      StartXfer <= '0';
    elsif (IntCount="110100") then
      RS(23,19); max_count = 20
      StartXfer <= '1';
    else
      StartXfer <= '0';
    end if;
  end if;
end process;

process (clk) begin
  if (clk'event and clk='1') then
    if (reset_n = '0') then
      for i in 0 to Two_T_minus_1 loop
        IntS(i) <= zero;
      end loop;
    elsif (rs_data_in_start = '1') then
      for i in 0 to Two_T_minus_1 loop
        IntS(i) <= rs_data_in;
      end loop;
    elsif (DoCalc='1') then
      IntS(0) <= add(mul(alpha7,IntS(0)),rs_data_in);
      IntS(1) <= add(mul(alpha18,IntS(1)),rs_data_in);
      IntS(2) <= add(mul(alpha19,IntS(2)),rs_data_in);
    end if;
  end if;
end process;
process (clk)
begin
if (clk'event and clk='1') then
    if (reset_n = '0') then
        syndrome <= (others=>'0');
        internal_errors_present <= '0';
    elsif (XferSyndrome='1') then
        syndrome <=
            IntS(3) & IntS(2) & IntS(1) & IntS(0);
        internal_errors_present <=
            IntEP(0) or IntEP(1) or IntEP(2) or IntEP(3);
    end if;
end if;
end process;

process (IntS)
begin
    for i in 0 to Two_T_minus_1 loop
        IntEP(i)<= '1';
        if (IntS(i)=zero) then IntEP(i)<= '0'; end if;
    end loop;
end process;

InputControlSD_Idle : process (clk)
begin
    if (clk'event and clk = '1') then
        if (reset_n = '0') then
            XferSyndrome<='0';
            DoCalc<='0';
            CountReset<='1';
            CountEnable<='0';
            present_state <= Idle;
        else
            case present_state is
            when Idle =>
                if (rs_data_in_start = '1') then
                    DoCalc<='1';
                    CountReset<='0';
                    CountEnable<='1';
                    present_state <= RxData;
                else
                    present_state <= Idle;
                end if;
            when RxData =>
                if (StartXfer = '1') then
                    XferSyndrome<='1';
                    DoCalc<='0';
                    CountReset<='1';
                    CountEnable<='0';
                    present_state <= XferData;
                else
                    present_state <= RxData;
                end if;
            when XferData =>
                XferSyndrome<='0';
                DoCalc<='0';
                CountReset<='1';
                CountEnable<='0';
                present_state <= Idle;
            when others =>
                XferSyndrome<='0';
                DoCalc<='0';
                CountReset<='1';
                CountEnable<='0';
                present_state <= Idle;
            end case;
        end if;
    end if;
end process;
end if;
end if;
end process;
syndrome_calc_done <= XferSyndrome;
errors_present <= internal_errors_present;
end;

-------------------------------------------------------------
-- End of InputProcess
-------------------------------------------------------------

-------------------------------------------------------------
-- Start of ChienSearchProcess
-------------------------------------------------------------

library ieee; use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;

entity ChienSearchProcess is
port (  
  clk : in std_logic;
  reset_n : in std_logic;
  syndromes : in std_logic_vector (19 downto 0);
  StartChien : in std_logic;
  rs_enable : in std_logic;
  RS_Data_In : in std_logic_vector (4 downto 0);
  RS_Data_out : out std_logic_vector (4 downto 0);
  RS_Data_out_start : out std_logic;
);
end;

architecture RTL of ChienSearchProcess is
constant GFpower : integer := 5;
constant RS_M0 : integer := 17;
constant RS_N : integer := 23;
constant RS_T : integer := 3;
constant ChienSearch_pipeline_delay : integer := 7;
constant SR_max : integer := RS_N + ChienSearch_pipeline_delay;
subtype Galois_Field_element is std_logic_vector (GFpower-1 downto 0);
type shift_register_delay_type is array (0 to SR_max) of Galois_Field_element;
type syndrome_type is array (0 to 2*RS_T-1) of Galois_Field_element;
constant CountMax : std_logic_vector (GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR (RS_N+ChienSearch_pipeline_delay, GFpower+1);
constant DoChienCountMax : std_logic_vector (GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR (RS_N+ChienSearch_pipeline_delay-1, GFpower+1);
constant OutputEnableStartCount : std_logic_vector (GFpower downto 0) :=
  CONV_STD_LOGIC_VECTOR (ChienSearch_pipeline_delay-1, GFpower+1);
constant Temp1_mul_factor : Galois_Field_element := "11101" ;  -- alpha-1*(RS_N-1)*1 = a9
constant Temp2_mul_factor : Galois_Field_element := "00011" ;  -- alpha-1*(RS_N-1)*2 = a18
constant alpha1 : Galois_Field_element := "00010" ;
constant alpha2 : Galois_Field_element := "00100" ;
constant one : Galois_Field_element := "00001" ;
constant x0a_initial_value : Galois_Field_element := "01001" ;  -- alpha-1*(RS_N-1)*RS_M0 = a29
constant x0a_mul : Galois_Field_element := "10011" ;  -- alpha(RS_M0) = a17
constant x1_initial_value : Galois_Field_element := "10101" ;  -- alpha(RS_N-1) = a22
constant x1_mul : Galois_Field_element := "10010" ;  -- alpha-1 = a10
constant x1m_initial_value : Galois_Field_element := "00100" ;  -- alpha(RS_M) = a14
constant x1m_mul : Galois_Field_element := "11101" ;  -- alpha(-RS_M) = a14
constant zero : Galois_Field_element := "00000" ;
signal CORR_FACTOR : Galois_Field_element;
signal CORR_FACTOR_l_error : Galois_Field_element;
signal CORR_FACTOR_error : Galois_Field_element;
signal ChienSum : std_logic_vector;  
  signal Count : std_logic_vector;
signal CountEnable : std_logic;
signal D1 : Galois_Field_element;
signal D2 : Galois_Field_element;
signal DcChien : std_logic;
signal InitChien : std_logic;
signal InitChien_d1 : std_logic;
signal InitChien_d2 : std_logic;
signal InitChien_d3 : std_logic;
signal InitChien_d4 : std_logic;
signal InitChien_d5 : std_logic;
signal InitChien_d6 : std_logic;
signal OutputEnable : std_logic;
signal RS_cata_delayed : Galois_Field_element;
signal Temp1 : Galois_Field_element;
signal Temp1_num_l_error : Galois_Field_element;
signal Temp2 : Galois_Field_element;
signal Temp2_num_l_error : Galois_Field_element;
signal Temp2_num_2_errors : Galois_Field_element;
signal denom : Galois_Field_element;
signal inv_D2 : Galois_Field_element;
signal inv_denom : Galois_Field_element;
signal inv_s0 : Galois_Field_element;
signal num : Galois_Field_element;
signal num_d1 : Galois_Field_element;
signal shift_register_delay : shift_register_delay_type;
signal signal : Galois_Field_element;
signal syndrome : syndrome_type;
signal there_are_two_errors : std_logic;
signal there_is_one_error : std_logic;
signal x0a : Galois_Field_element;
signal x1 : Galois_Field_element;
signal x1m : Galois_Field_element;

function inv (b : in std_logic_vector) return std_logic_vector is
  variable d : std_logic_vector (4 downto 0);
begin
  d := "00000";
  if (b="00001") then d := "00001";
  elsif (b="00010") then d := "10010";
  elsif (b="00011") then d := "11100";
  elsif (b="00100") then d := "01001";
  elsif (b="00101") then d := "10111";
  elsif (b="00110") then d := "01110";
  elsif (b="00111") then d := "01111";
  elsif (b="01000") then d := "00111";
  elsif (b="01001") then d := "11110";
  elsif (b="01010") then d := "10000";
  elsif (b="01011") then d := "10001";
  elsif (b="01100") then d := "01111";
  elsif (b="01101") then d := "11101";
  elsif (b="01110") then d := "10010";
  elsif (b="01111") then d := "11100";
  elsif (b="10000") then d := "01011";
  elsif (b="10001") then d := "10010";
  elsif (b="10010") then d := "11110";
  elsif (b="10011") then d := "11100";
  elsif (b="10100") then d := "01000";
  elsif (b="10101") then d := "01101";
  elsif (b="10110") then d := "00101";
  elsif (b="10111") then d := "10001";
  elsif (b="11000") then d := "01010";
  elsif (b="11001") then d := "11010";
  elsif (b="11010") then d := "11100";
  elsif (b="11011") then d := "11101";
  elsif (b="11100") then d := "00011";
  elsif (b="11101") then d := "10101";
  elsif (b="11110") then d := "11111";
  elsif (b="11111") then d := "10111";
end function;
```vhdl
+\text{elsif \ b = "11110" then d := "10100";}
+\text{elsif \ b = "11111" then d := "11011";}
+\text{end if;}
+\text{return d;}
+\text{end inv;}
+
+\text{function add \ (b, c : in Galois_Field_element) return Galois_Field_element is}
+\text{variable d : Galois_Field_element;}
+\text{begin}
+\text{d(0) := (b(0) xor c(0));}
+\text{d(1) := (b(1) xor c(1));}
+\text{d(2) := (b(2) xor c(2));}
+\text{d(3) := (b(3) xor c(3));}
+\text{d(4) := (b(4) xor c(4));}
+\text{return d;}
+\text{end add;}
+
+\text{function mul \ (b, c : in Galois_Field_element) return Galois_Field_element is}
+\text{variable d : Galois_Field_element;}
+\text{begin}
+\text{d(0) := (b(0) and c(0)) xor (b(1) and c(4)) xor (b(2) and c(3)) xor (b(3) and c(2)) xor (b(4) and c(1));}
+\text{d(1) := (b(0) and c(1)) xor (b(1) and c(0)) xor (b(2) and c(4)) xor (b(3) and c(1)) xor (b(4) and c(2));}
+\text{d(2) := (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(3) and c(4)) xor (b(4) and c(3));}
+\text{d(3) := (b(0) and c(3)) xor (b(1) and c(2)) xor (b(2) and c(1)) xor (b(3) and c(0)) xor (b(4) and c(4));}
+\text{d(4) := (b(0) and c(4)) xor (b(1) and c(3)) xor (b(2) and c(2)) xor (b(3) and c(1)) xor (b(4) and c(0));}
+\text{return d;}
+\text{end mul;}
+
+\text{function IsNotZero \ (b : in std_logic_vector) return std_logic is}
+\text{variable d : std_logic;}
+\text{begin}
+\text{d := b(0) or b(1) or b(2) or b(3) or b(4);}
+\text{return d;}
+\text{end IsNotZero;}
+
+\text{syndrome(3) <= syndromes(19 downto 15);}
+\text{syndrome(2) <= syndromes(14 downto 10);}
+\text{syndrome(1) <= syndromes(9 downto 5);}
+\text{syndrome(0) <= syndromes(4 downto 0);}
+
+\text{System.Counter : process (clk) begin}
+\text{if \ (clk'event and clk = '1') then}
+\text{if \ ((reset_n = '0') or (StartChien = '1')) then}
+\text{Count <= (others => '0');}
+\text{elsif \ (CountEnable = '1') then}
+\text{Count <= Count + 1;}
+\text{end if;}
+\text{end if;}
+\text{end process;}
+
+\text{System.Count_Enable : process (clk) begin}
+\text{if \ (clk'event and clk = '1') then}
+\text{if \ ((reset_n = '0') then}
```
CountEnable <= '0';
elsif (StartChien = '1') then
  CountEnable <= '1';
elsif (Count = CountMax) then
  CountEnable <= '0';
end if;
end if;
end process;

InitChien_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      InitChien <= '0';
      InitChien_d1 <= '0';
      InitChien_d2 <= '0';
      InitChien_d3 <= '0';
      InitChien_d4 <= '0';
      InitChien_d5 <= '0';
      InitChien_d6 <= '0';
    else
      InitChien <= StartChien;
      InitChien_d1 <= InitChien;
      InitChien_d2 <= InitChien_d1;
      InitChien_d3 <= InitChien_d2;
      InitChien_d4 <= InitChien_d3;
      InitChien_d5 <= InitChien_d4;
      InitChien_d6 <= InitChien_d5;
    end if;
  end if;
end process;

OutputEnable_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      OutputEnable <= '0';
    elsif (Count = OutputEnableStartCount) then
      OutputEnable <= '1';
    elsif (Count = OutputEnableStopCount) then
      OutputEnable <= '0';
    end if;
  end if;
end process;

RS_Data_out_Start_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      RS_Data_out_Start <= '0';
    elsif (Count = RS_Data_out_StartCount) then
      RS_Data_out_Start <= '1';
    else
      RS_Data_out_Start <= '0';
    end if;
  end if;
end process;

DoChien_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (Count = DoChienCountMax)) then
      DoChien <= '0';
    elsif (InitChien_d4 = '1') then
      DoChien <= '1';
    end if;
  end if;
end process;

Calc_Correction_Factor : process (clk)
begin

if (clk'event and clk = '1') then
  if (reset_n = '0') then
    CORR_FACTOR <= zero;
  elsif (rs_enable = '1') and (ChienSum=zero) then
    if (there_is_one_error='1') then
      CORR_FACTOR <= CORR_FACTOR_L_error;
    elsif (there_are_two_errors='1') then
      CORR_FACTOR <= CORR_FACTOR_2_errors;
    end if;
  else
    CORR_FACTOR <= zero;
  end if;
end if;
end process;

D1 <= syndromes(4 downto 0);

Calculate_D2 : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      D2 <= zero;
    elsif (InitChien = '1') then
      D2 <= add(mul(syndrome(0),syndrome(2)),mul(syndrome(1),syndrome(1)));
    end if;
  end if;
end process;

Calculate_invD2 : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      inv_D2 <= zero;
    elsif (InitChien_d1 = '1') then
      inv_D2 <= inv(D2);
    end if;
  end if;
end process;

Error_Count_process : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      if (there_is_one_error = '0')
        there_are_two_errors <= '0';
      elsif ((InitChien = '1') and (IsNotZero(D1)='1')) then
        there_is_one_error <= '1';
        there_are_two_errors <= '0';
      elsif ((InitChien_d1 = '1') and (IsNotZero(D2)='1')) then
        there_is_one_error <= '0';
        there_are_two_errors <= '1';
      end if;
    end if;
  end if;
end process;

x0a_process : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      x0a <= zero;
    elsif (InitChien_d5 = '1') then
      x0a <= mul(x0a_initial_value,syndrome(0));
    else
      x0a <= mul(x0a,x0a_mul);
    end if;
  end if;
end process;
CORR_FACTOR_L_error <= x0a;

x1_process : process (clk)
begin

end process;

end if;

if (clk'event and clk = '1') then
  if (reset_n = '0') then
    x1 <= zero;
    elsif (InitChien_d2 = '1') then
      x1 <= x1_initial_value;
    else
      x1 <= mul(x1, x1_mul);
  end if;
else
  x1 <= zero;
end if;
end process;

xim_process : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      xim <= zero;
      elsif (InitChien_d2 = '1') then
        xim <= xim_initial_value;
      else
        xim <= mul(xim, xim_mul);
      end if;
    end if;
  end if;
end process;

CORR_FACTOR_2_errors_process : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      num <= zero;
      num_d1 <= zero;
      denom <= zero;
      inv_deno <= zero;
      CORR_FACTOR_2_errors <= zero;
    else
      num <= add(mul(add(x1.signal, syndrome(0)), syndrome(1)));s
      num_d1 <= num;
      denom <= mul(xim.signal);
      inv_deno <= inv(denom);
      CORR_FACTOR_2_errors <= mul(num_d1, inv_deno);
    end if;
  end if;
end process;

Delay_RS_Data : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      for i in 0 to SR_max loop
        shift_register_delay(i) <= zero;
      end loop;
    else
      for i in SR_max downto 1 loop
        shift_register_delay(i) <= shift_register_delay(i-1);
      end loop;
      shift_register_delay(0) <= RS_Data_In;
    end if;
  end if;
end process;

RS_data_delayed <= shift_register_delay(SR_max);

Correct_RS_Data : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      RS_Data_out <= zero;
      elsif (OutputEnable = '1') then
        RS_Data_out <= add(CORR_FACTOR, RS_data_delayed);
      else
        RS_Data_out <= zero;
      end if;
    end if;
  end if;
end process;
end process;
inv_s0_process: process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      inv_s0 <= zero;
    elsif (InitChien = '1') then
      inv_s0 <= inv(syndrome(0));
    end if;
  end if;
end process;

 Templ_num_1_error_process: process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      Templ_num_1_error <= zero;
    elsif (InitChien = '1') then
      Templ_num_1_error <= mul(inv_s0, syndrome(1));
    end if;
  end if;
end process;

 Templ_num_2_errors_process: process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      Templ_num_2_errors <= zero;
    elsif (InitChien = '1') then
      Templ_num_2_errors <=
                        add(mul(syndrome(0), syndrome(1)), mul(syndrome(1), syndrome(2)));
    end if;
  end if;
end process;

 Temp2_num_process: process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      Temp2_num_2_errors <= zero;
    elsif (InitChien = '1') then
      Temp2_num_2_errors <=
                           add(mul(syndrome(1), syndrome(3)), mul(syndrome(2), syndrome(2)));
    end if;
  end if;
end process;

 Temp2_2_error_value_process: process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      signal <= zero;
    elsif (InitChien_d2 = '1') then
      signal <= mul(Templ_num_2_errors, inv_d2);
    end if;
  end if;
end process;

 Calc_Temp_Registers_process: process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') or (StartChien='1') then
      Temp1 <= zero;
      Temp2 <= zero;
    elsif (InitChien_d4 = '1') then
      if (IsNotZero(D2) = '1') then
        Temp1 <= mul(Temp1_mul_factor, signal);
        Temp2 <= mul(Temp2_mul_factor, Temp2_num_2_errors);
      end if;
    end if;
  end if;
end process;
- elsif (IsNotZero(D1) = '1') then
  Temp1 <= mul(Templ_multiplier.Templ_num_i_error);
  Temp2 <= zero;
- else
  Temp1 <= zero;
  Temp2 <= zero;
end if;
elif (DoChien = '1') then
  Temp1 <= mul(alpha1,Templ1);
  Temp2 <= mul(alpha2,Templ2);
end if;
end if;
end process;

ChienSum_process : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      ChienSum <= zero;
    elsif (DoChien = '1') then
      ChienSum <= add(one,add(Templ1,Templ2));
    end if;
  end if;
end process;
end;

------------------------------------------------------------------------
-- End of ChienSearchProcess
------------------------------------------------------------------------

------------------------------------------------------------------------
-- Start of RSTopProcess
------------------------------------------------------------------------

library ieee;
use ieee.std_logic_1164.all;
entity rs_dec_top is
  generic :
    GFPower : INTEGER := 5;
    RS_T : INTEGER := 2
  );
  port ( clk : in std_logic;
          reset_n : in std_logic;
          RS_Data_In : in std_logic_vector(GFPower - 1 downto 0);
          RS_data_in_start : in std_logic;
          RS_Data_out : out std_logic_vector(GFPower - 1 downto 0);
          RS_Data_out_Start : out std_logic;
          rs_enable : in std_logic
        );
end rs_dec_top;

use work.all;
architecture RTL of rs_dec_top is
  signal errors_present : std_logic;
signal syndrome : std_logic_vector(2 * GFPower * RS_T - 1 downto 0);
signal syndrome_calc_done : std_logic;

component InputProcess
  port ( reset_n : in std_logic;
         clk : in std_logic;
         rs_data_in : in std_logic_vector(4 downto 0);
       );
begin

inst_InputProcess: InputProcess
  port map(
    reset_n => reset_n,
    clk => clk,
    rs_data_in => RS_Data_In(GFPower - 1 downto 0),
    rs_data_in_start => rs_data_in_start,
    syndrome_calc_done => syndrome_calc_done,
    errors_present => errors_present,
    syndrome => syndrome((2 * GFPower + RS_T - 1 downto 0)));

inst_ChienSearchProcess: ChienSearchProcess
  port map(
    clk => clk,
    reset_n => reset_n,
    syndromes => syndrome((2 * GFPower + RS_T - 1 downto 0),
    StartChien => syndrome_calc_done,
    rs_enable => rs_enable,
    RS_Data_In => RS_Data_In(GFPower - 1 downto 0),
    RS_Data_out => RS_Data_out(GFPower - 1 downto 0),
    RS_Data_out_start => RS_Data_out_start);

end RTL;

-- End of ASCII

163
16 Appendix D - RS Decoder VHDL Code (8 errors)

This appendix contains the VHDL code generated for an RS decoder for an 8-error correcting code over GF(2^8), with \( p(x) = x^8 + x^4 + x^3 + x^2 + 1 \). This is an RS(179,163) code.
-- RS Decoder Core Generator
-- Version 1.0
-- written by Vladimir Glavac 4182200
--
-- Reed-Solomon Decoder Parameters
--
-- RS_N = 179 = length of codeword
--
-- RS_T = 8 = error correcting capability
--
-- RS_M0 = 212 = power of initial root of code generator polynomial
--
-- code generator polynomial \( g(x) = \)
--
-- \( a_0 \cdot x^{16} \)
-- \( + a_7 \cdot x^{15} \)
-- \( + a_8 \cdot x^{14} \)
-- \( + a_{233} \cdot x^{13} \)
-- \( + a_{192} \cdot x^{12} \)
-- \( + a_{142} \cdot x^{11} \)
-- \( + a_{158} \cdot x^{10} \)
-- \( + a_{30} \cdot x^9 \)
-- \( + a_{169} \cdot x^8 \)
-- \( + a_{214} \cdot x^7 \)
-- \( + a_{16} \cdot x^6 \)
-- \( + a_{184} \cdot x^5 \)
-- \( + a_{163} \cdot x^4 \)
-- \( + a_{133} \cdot x^3 \)
-- \( + a_{102} \cdot x^2 \)
-- \( + a_{90} \cdot x^1 \)
-- \( + a_{197} \)
--
-- field generator polynomial \( p(x) = [100011101] = x^8 + x^4 + x^3 + x^2 + 1 \)
--
-- Start of InputProcess
--
-- library ieee; use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity InputProcess is
port (reset_n : in std_logic;
clk : in std_logic;
rs_data_in : in std_logic_vector (7 downto 0);
rs_data_in_start : in std_logic;
syndrome_calc_done : out std_logic;
errors_present : out std_logic;
syndrome : out std_logic_vector (127 downto 0))
);
end:

architecture RTL of InputProcess is
constant GFpower : integer := 8;
subtype Galois_Field_element is std_logic_vector((GFpower-1) downto 0);
constant RS_T : integer := 8;
constant RS_N : integer := 179;
constant RS_M0 : integer := 212;
constant Two_T_minus_1 : integer := 2*RS_T-1;
constant zero : Galois_Field_element := "00000000";
constant one : Galois_Field_element := "00000001";
constant alpha212 : Galois_Field_element := "01111001";
constant alpha213 : Galois_Field_element := "11110010";
constant alpha214 : Galois_Field_element := "11111001";
constant alpha215 : Galois_Field_element := "11101111";
constant alpha216 : Galois_Field_element := "11000011";
constant alpha217 : Galois_Field_element := "10011011";
constant alpha218 : Galois_Field_element := "11001011";
constant alpha219 : Galois_Field_element := "10110110";
constant alpha220 : Galois_Field_element := "10111010";
constant alpha221 : Galois_Field_element := "01000101";
constant alpha222 : Galois_Field_element := "10001010";
constant alpha223 : Galois_Field_element := "00010010";
constant alpha224 : Galois_Field_element := "00001001";
constant alpha225 : Galois_Field_element := "00100100";
constant alpha226 : Galois_Field_element := "01001000";
constant alpha227 : Galois_Field_element := "10010000";

type IntEP_type is array (0 to Two_T_minus_1) of Galois_Field_element;

type state_type is (Idle, RxData, XferData);

signal present_state : state_type;

signal IntCount : std_logic_vector (7 downto 0);

signal internal_errors_present : std_logic;

signal CountEnable : std_logic;

signal CountReset : std_logic;

signal StartXfer : std_logic;

signal XferSyndrome : std_logic;

signal DoCalc : std_logic;

signal IntS : IntS_type;

signal IntEP : IntEP_type;

function add (b, c : in Galois_Field_element) return Galois_Field_element is
begin
  variable d : Galois_Field_element;

  d(0) := (b(0) xor c(0));
  d(1) := (b(1) xor c(1));
  d(2) := (b(2) xor c(2));
  d(3) := (b(3) xor c(3));
  d(4) := (b(4) xor c(4));
  d(5) := (b(5) xor c(5));
  d(6) := (b(6) xor c(6));
  d(7) := (b(7) xor c(7));

  return d;
end add;

function mul (b, c : in Galois_Field_element) return Galois_Field_element is
begin
variable d : Galois_Field_element;

  d(0) := (b(0) and c(0)) xor (b(1) and c(7)) xor (b(2) and c(6)) xor (b(3) and c(5)) xor (b(4) and c(4)) xor (b(5) and c(3)) xor (b(6) and c(2)) xor (b(7) and c(1)) xor (b(0) and c(7)) xor (b(1) and c(6)) xor (b(2) and c(5)) xor (b(3) and c(4)) xor (b(4) and c(3)) xor (b(5) and c(2)) xor (b(6) and c(1)) xor (b(7) and c(0)) xor (b(0) and c(6)) xor (b(1) and c(5)) xor (b(2) and c(4)) xor (b(3) and c(3)) xor (b(4) and c(2)) xor (b(5) and c(1)) xor (b(6) and c(0)) xor (b(0) and c(5)) xor (b(1) and c(4)) xor (b(2) and c(3)) xor (b(3) and c(2)) xor (b(4) and c(1)) xor (b(5) and c(0)) xor (b(0) and c(4)) xor (b(1) and c(3)) xor (b(2) and c(2)) xor (b(3) and c(1)) xor (b(4) and c(0)) xor (b(0) and c(3)) xor (b(1) and c(2)) xor (b(2) and c(1)) xor (b(3) and c(0)) xor (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(0) and c(1)) xor (b(1) and c(0)) xor (b(0) and c(0));

  return d;
end mul;
begin

process (clk)
begin
if (clk'event and clk = '1') then
  if ((reset_n = '0') or (CountReset = '1')) then
    IntCount <= '0';
  elsif (CountEnable = '1') then
    IntCount <= IntCount + 1;
  end if;
end if;
end process;

process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    StartXfer <= '0';
  elsif (IntCount = "10110000") then
    StartXfer <= '1';
  else
    StartXfer <= '0';
  end if;
end if;
end process;

process (clk)
begin
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    for i in 0 to Two_T_minus_1 loop
      IntS(i) <= zero;
    end loop;
  elsif (rs_data_in_start = '1') then
    for i in 0 to Two_T_minus_1 loop
      IntS(i) <= rs_data_in;
    end loop;
  elsif (DoCalc = '1') then
    IntS(0) <= add(mul(alpha212, IntS(0)), rs_data_in);
    IntS(1) <= add(mul(alpha213, IntS(1)), rs_data_in);
    IntS(2) <= add(mul(alpha214, IntS(2)), rs_data_in);
    IntS(3) <= add(mul(alpha215, IntS(3)), rs_data_in);
    IntS(4) <= add(mul(alpha216, IntS(4)), rs_data_in);
    IntS(5) <= add(mul(alpha217, IntS(5)), rs_data_in);
    IntS(6) <= add(mul(alpha218, IntS(6)), rs_data_in);
    IntS(7) <= add(mul(alpha219, IntS(7)), rs_data_in);
    IntS(8) <= add(mul(alpha220, IntS(8)), rs_data_in);
    IntS(9) <= add(mul(alpha221, IntS(9)), rs_data_in);
    IntS(10) <= add(mul(alpha222, IntS(10)), rs_data_in);
    IntS(11) <= add(mul(alpha223, IntS(11)), rs_data_in);
    IntS(12) <= add(mul(alpha224, IntS(12)), rs_data_in);
    IntS(13) <= add(mul(alpha225, IntS(13)), rs_data_in);
    IntS(14) <= add(mul(alpha226, IntS(14)), rs_data_in);
    IntS(15) <= add(mul(alpha227, IntS(15)), rs_data_in);
  end if;
end if;
end process;
end process;

process (clk)
begin
  if (clk'event and clk='1') then
    if (reset_n = '0') then
      syndrome <= (others=>'0');
      internal_errors_present <= '0';
    elsif (XferSyndrome='1') then
      syndrome <=
      IntS(15) & IntS(14) & IntS(13) & IntS(12) & IntS(11) & IntS(10) &
      IntS(9) & IntS(8) & IntS(7) & IntS(6) & IntS(5) & IntS(4) &
      IntS(3) & IntS(2) & IntS(1) & IntS(0);
      internal_errors_present <=
      IntEP(0) or IntEP(1) or IntEP(2) or IntEP(3) or IntEP(4) or IntEP(5) or
      IntEP(6) or IntEP(7) or IntEP(8) or IntEP(9) or IntEP(10) or IntEP(11) or
      IntEP(12) or IntEP(13) or IntEP(14) or IntEP(15);
    end if;
  end if;
end process;

process (IntS)
begin
  for i in 0 to Two_T_minus_1 loop
    IntEP(i)<'1';
    if (IntS(i)=zero) then IntEP(i)<'0'; end if;
  end loop;
end process;

InputControlSD_idle : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      XferSyndrome<='0';
      DoCalc<='0';
      CountReset<='1';
      CountEnable<='0';
      present_state <= Idle;
    else
      case present_state is
        when Idle =>
          if (rs_data_in_start = '1') then
            DoCalc<='1';
            CountReset<='0';
            CountEnable<='1';
            present_state <= RxData;
          else
            present_state <= Idle;
          end if;
        when RxData =>
          if (StartXfer = '1') then
            XferSyndrome<='1';
            DoCalc<='0';
            CountReset<='1';
            CountEnable<='0';
            present_state <= XferData;
          else
            present_state <= RxData;
          end if;
        when XferData =>
          XferSyndrome<='0';
          DoCalc<='0';
          CountReset<='1';
          CountEnable<='0';
          present_state <= Idle;
        when others =>
          XferSyndrome<='0';
          DoCalc<='0';
          CountReset<='1';
          CountEnable<='0';
          present_state <= Idle;
      end case;
  end if;
end process;
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity MBSolver is
port (reset_n : in std_logic;
        clk : in std_logic;
        ErrorsPresent : in std_logic;
        XferSyndrome : in std_logic;
        StartChien : out std_logic;
        syndrome_poly : in std_logic_vector (127 downto 0);
        omega_poly : out std_logic_vector (63 downto 0);
        lambda_poly : out std_logic_vector (71 downto 0)
); 
end;

architecture RTL of MBSolver is

constant GFpower : integer := 8;
subtype Galois_Field_element is std_logic_vector((GFpower-1) downto 0);
constant RS_T : integer := 8;
constant RS_N : integer := 179;
constant RS_M0 : integer := 212;
constant zero : Galois_Field_element := "00000000";
constant one : Galois_Field_element := "00000001";
constant size_of_N : integer := 9;

type SRegType is array(0 to 3*RS_T-1) of Galois_Field_element;
type Poly1Type is array(0 to RS_T) of Galois_Field_element;
type Poly2Type is array(0 to 2*RS_T) of Galois_Field_element;
type Poly3Type is array(1 to RS_T) of Galois_Field_element;
type state_type is ([Idle,ChienSearchStart,init,Synchronize,Update_Polys]);

signal current_state : state_type;
signal Initialize : std_logic;
signal CountEnable : std_logic;
signal StoreNewPolys : std_logic;
signal N : std_logic_vector((size_of_N-1) downto 0);
signal SReg : SRegType;
signal Delta : Galois_Field_element;
signal Convolution_Term : Poly1Type;
signal Post_Convolution_Term : Poly1Type;
signal Gamma : Galois_Field_element;
signal L : std_logic_vector((size_of_N-1) downto 0);
signal TwoL : std_logic_vector((size_of_N-1) downto 0);
signal Lambda : Poly1Type;
signal ZOmega : Poly1Type;
signal Mu, GammaMu : Poly1Type;
signal DeltaLambda : Poly3Type;
signal B : Poly1Type;
signal KeepOldL : std_logic;
signal Convolution_Term_Multiplier : std_logic_vector(RS_T downto 0);

end case;
end if;
end if;
end process;
syndrome_calc_done <= XferSyndrome;
errors_present <= internal_errors_present;
end;

-- End of InputProcess

-- Start of MBSolverProcess


function convolution_term_mul (b : in Galois_Field_element; c : in std_logic) return Galois_Field_element is
  variable d : Galois_Field_element;
begin
  d(0) := b(0) and c;
  d(1) := b(1) and c;
  d(2) := b(2) and c;
  d(3) := b(3) and c;
  d(4) := b(4) and c;
  d(5) := b(5) and c;
  d(6) := b(6) and c;
  d(7) := b(7) and c;
  return d;
end convolution_term_mul;

function add (b, c : in Galois_Field_element) return Galois_Field_element is
  variable d : Galois_Field_element;
begin
  d(0) := b(0) xor c(0);
  d(1) := b(1) xor c(1);
  d(2) := b(2) xor c(2);
  d(3) := b(3) xor c(3);
  d(4) := b(4) xor c(4);
  d(5) := b(5) xor c(5);
  d(6) := b(6) xor c(6);
  d(7) := b(7) xor c(7);
  return d;
end add;

function is_not_0 (b : in Galois_Field_element) return std_logic is
  variable d : std_logic;
begin
  d := b(0) or b(1) or b(2) or b(3) or b(4) or b(5) or b(6) or b(7);
  return d;
end is_not_0;

function is_0 (b : in Galois_Field_element) return std_logic is
  variable d : std_logic;
begin
  d := not is_not_0(b);
  return d;
end is_0;

function mul (b, c : in Galois_Field_element) return Galois_Field_element is
  variable d : Galois_Field_element;
begin
  d(0) := b(0) and c(0) xor (b(1) and c(7)) xor (b(2) and c(6)) xor (b(3) and c(5)) xor (b(4) and c(4)) xor (b(5) and c(3)) xor (b(6) and c(2)) xor (b(7) and c(1));
  d(1) := b(0) and c(1) xor (b(1) and c(0)) xor (b(2) and c(7)) xor (b(3) and c(6)) xor (b(4) and c(5)) xor (b(5) and c(4)) xor (b(6) and c(3)) xor (b(7) and c(2));
  d(2) := b(0) and c(2) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(3) and c(7)) xor (b(4) and c(6)) xor (b(5) and c(5)) xor (b(6) and c(4)) xor (b(7) and c(3));
  d(3) := b(0) and c(3) xor (b(1) and c(2)) xor (b(2) and c(1)) xor (b(3) and c(0)) xor (b(4) and c(7)) xor (b(5) and c(6)) xor (b(6) and c(5)) xor (b(7) and c(4));
  d(4) := b(0) and c(4) xor (b(1) and c(3)) xor (b(2) and c(2)) xor (b(3) and c(1)) xor (b(4) and c(0)) xor (b(5) and c(7)) xor (b(6) and c(6)) xor (b(7) and c(5));
  d(5) := b(0) and c(5) xor (b(1) and c(4)) xor (b(2) and c(3)) xor (b(3) and c(0)) xor (b(4) and c(7)) xor (b(5) and c(6)) xor (b(6) and c(5)) xor (b(7) and c(4));
  d(6) := b(0) and c(6) xor (b(1) and c(5)) xor (b(2) and c(4)) xor (b(3) and c(1)) xor (b(4) and c(0)) xor (b(5) and c(7)) xor (b(6) and c(6)) xor (b(7) and c(5));
  d(7) := b(0) and c(7) xor (b(1) and c(6)) xor (b(2) and c(5)) xor (b(3) and c(0)) xor (b(4) and c(7)) xor (b(5) and c(6)) xor (b(6) and c(5)) xor (b(7) and c(4));
end mul;
\[
d(5):= (b\cdot0) \text{ xor } (b\cdot1 \text{ and } c\cdot4) \text{ xor } (b\cdot2 \text{ and } c\cdot3) \text{ xor } (b\cdot2 \text{ and } c\cdot7) \text{ xor } (b\cdot3 \text{ and } c\cdot2) \text{ xor } (b\cdot3 \text{ and } c\cdot6) \text{ xor } (b\cdot1 \text{ and } c\cdot7) \text{ xor } (b\cdot4 \text{ and } c\cdot1) \text{ xor } (b\cdot4 \text{ and } c\cdot5) \text{ xor } (b\cdot4 \text{ and } c\cdot6) \text{ xor } (b\cdot4 \text{ and } c\cdot7) \text{ xor } (b\cdot5 \text{ and } c\cdot0) \text{ xor } (b\cdot5 \text{ and } c\cdot4) \text{ xor } (b\cdot5 \text{ and } c\cdot5) \text{ xor } (b\cdot5 \text{ and } c\cdot6) \text{ xor } (b\cdot6 \text{ and } c\cdot1) \text{ xor } (b\cdot6 \text{ and } c\cdot4) \text{ xor } (b\cdot6 \text{ and } c\cdot5) \text{ xor } (b\cdot7 \text{ and } c\cdot1) \text{ xor } (b\cdot7 \text{ and } c\cdot4) \text{ xor } (b\cdot7 \text{ and } c\cdot5) \text{ xor } (b\cdot7 \text{ and } c\cdot6) ;
\]
\[
d(6):= (b\cdot0 \text{ and } c\cdot5) \text{ xor } (b\cdot1 \text{ and } c\cdot5) \text{ xor } (b\cdot2 \text{ and } c\cdot4) \text{ xor } (b\cdot3 \text{ and } c\cdot4) \text{ xor } (b\cdot3 \text{ and } c\cdot7) \text{ xor } (b\cdot4 \text{ and } c\cdot2) \text{ xor } (b\cdot4 \text{ and } c\cdot6) \text{ xor } (b\cdot4 \text{ and } c\cdot7) \text{ xor } (b\cdot5 \text{ and } c\cdot0) \text{ xor } (b\cdot5 \text{ and } c\cdot1) \text{ xor } (b\cdot5 \text{ and } c\cdot4) \text{ xor } (b\cdot5 \text{ and } c\cdot5) \text{ xor } (b\cdot5 \text{ and } c\cdot6) \text{ xor } (b\cdot6 \text{ and } c\cdot0) \text{ xor } (b\cdot6 \text{ and } c\cdot1) \text{ xor } (b\cdot6 \text{ and } c\cdot4) \text{ xor } (b\cdot6 \text{ and } c\cdot5) \text{ xor } (b\cdot6 \text{ and } c\cdot6) \text{ xor } (b\cdot7 \text{ and } c\cdot0) \text{ xor } (b\cdot7 \text{ and } c\cdot1) \text{ xor } (b\cdot7 \text{ and } c\cdot4) \text{ xor } (b\cdot7 \text{ and } c\cdot5) \text{ xor } (b\cdot7 \text{ and } c\cdot6) ;
\]
\[
d(7):= (b\cdot0 \text{ and } c\cdot7) \text{ xor } (b\cdot1 \text{ and } c\cdot6) \text{ xor } (b\cdot2 \text{ and } c\cdot5) \text{ xor } (b\cdot3 \text{ and } c\cdot4) \text{ xor } (b\cdot4 \text{ and } c\cdot3) \text{ xor } (b\cdot4 \text{ and } c\cdot7) \text{ xor } (b\cdot5 \text{ and } c\cdot0) \text{ xor } (b\cdot5 \text{ and } c\cdot1) \text{ xor } (b\cdot5 \text{ and } c\cdot6) \text{ xor } (b\cdot6 \text{ and } c\cdot0) \text{ xor } (b\cdot6 \text{ and } c\cdot1) \text{ xor } (b\cdot6 \text{ and } c\cdot5) \text{ xor } (b\cdot6 \text{ and } c\cdot6) \text{ xor } (b\cdot7 \text{ and } c\cdot0) \text{ xor } (b\cdot7 \text{ and } c\cdot1) \text{ xor } (b\cdot7 \text{ and } c\cdot4) \text{ xor } (b\cdot7 \text{ and } c\cdot5) \text{ xor } (b\cdot7 \text{ and } c\cdot6) ;
\]

return d;
end mul;

begin

sReg_Process : process (clk)
begin
if (clk'event and clk='1') then
  if (reset_n='0') then
    for i in 0 to (1*RST-1) loop
      SReg(i) <= zero;
    end loop;
  elsif (Initialize = '1') then
    for i in 0 to (RST-1) loop
      SReg(i) <= zero;
    end loop;
  for i in 0 to (2*RST-1) loop
    SReg(i*RST) <= syndrome_poly((i+1)^GFPower-1) downto (i^GFPower));
  end loop;
  elsif (StoreNewPolys = '1') then
    for i in 0 to (1*RST-2) loop
      SReg(i) <= SReg(i+1);
    end loop;
    SReg((RST-1)) <= zero;
  end if;
end if;
end process SReg_Process;

L_Process : process (clk)
begin
if (clk'event and clk='1') then
  if (reset_n='0') then
    L <= (others=>0);
  elsif (Initialize = '1') then
    L <= (others=>0);
  elsif (StoreNewPolys = '1') then
    if (KeepOldL='0') then
      L <= N + 1 - L;
    end if;
  end if;
end if;
end process L_Process;

TwoL_Process : process(L)
begin
  TwoL <= L(size_of_N-2 downto 0)&'0';
end process TwoL_Process;

KeepOldL_Process : process(TwoL, N, Delta)
begin
  if ((TwoL=N) or (is_0(Delta)='1')) then
    KeepOldL <= '1';
  else
    KeepOldL <= '0';
  end if;
end process KeepOldL_Process;

Gamma_Process : process (clk)
begin
  if (clk'sevent and clk='1') then
    if (reset_n='0') then
      Gamma <= (others=>'0');
    elsif (Initialize = '1') then
      Gamma <= one;
    elsif (StoreNewPolys = '1') then
      if (KeepOldL='0') then
        Gamma <= Delta;
      end if;
    end if;
  end if;
end process Gamma_Process;

N_Process : process (clk)
begin
  if (clk'sevent and clk='1') then
    if (reset_n='0') then
      N <= (others=>'0');
    elsif (Initialize = '1') then
      N <= (others=>'0');
    elsif (CountEnable = '1') then
      N <= N + 1;
    end if;
  end if;
end process N_Process;

Convolution_Term_Process : process (SReg, Mu)
begin
  for i in 0 to RS_T loop
    Convolution_Term(i) <= mul(SReg(RS_T-i), Mu(i));
  end loop;
end process Convolution_Term_Process;

Convolution_Term_Multiplier_Process : process (L)
begin
  case L is
  when "00000" => Convolution_Term_Multiplier <= "000000001";
  when "00001" => Convolution_Term_Multiplier <= "000000011";
  when "00010" => Convolution_Term_Multiplier <= "000000111";
  when "00011" => Convolution_Term_Multiplier <= "000001111";
  when "00100" => Convolution_Term_Multiplier <= "000011111";
  when "00101" => Convolution_Term_Multiplier <= "000111111";
  when "00110" => Convolution_Term_Multiplier <= "001111111";
  when "00111" => Convolution_Term_Multiplier <= "011111111";
  when "01000" => Convolution_Term_Multiplier <= "111111111";
  when others => Convolution_Term_Multiplier <= "111111111";
  end case;
end process Convolution_Term_Multiplier_Process;

Post_Convolution_Term_Process : process (Convolution_Term_Multiplier, Convolution_Term)
begin
  for i in 0 to RS_T loop
    convolution_term_mul(Convolution_Term(i), Convolution_Term_Multiplier(i));
  end loop;
end process Post_Convolution_Term_Process;

Delta_Process : process (Post_Convolution_Term)
begin
  delta <= add(Post_Convolution_Term(0),
               add(Post_Convolution_Term(1),
               add(Post_Convolution_Term(2),
               add(Post_Convolution_Term(3),
               add(Post_Convolution_Term(4),
               add(Post_Convolution_Term(5),
               add(Post_Convolution_Term(6),
               add(Post_Convolution_Term(7), Post_Convolution_Term(8))))))))));
\begin{verbatim}
end process Delta_Process;

Lambda_Process : process (clk)
begin
  if (clk'event and clk='1') then
    if (reset_n='0') then
      for i in 0 to RS_T loop
        Lambda(i) <= (others=>'0');
      end loop;
    elsif (Initialize = '1') then
      Lambda(0) <= one;
      for i in 1 to RS_T loop
        Lambda(i) <= zero;
      end loop;
    elsif (StoreNewPolys = '1') then
      if (KeepOldL='1') then
        for i in 1 to RS_T loop
          Lambda(i) <= Lambda(i-1);
        end loop;
      else
        for i in 0 to RS_T loop
          Lambda(i) <= Mu(i);
        end loop;
      end if;
    end if;
  end if;
end process Lambda_Process;

B_Process : process (clk)
begin
  if (clk'event and clk='1') then
    if (reset_n='0') then
      for i in 0 to RS_T loop
        B(i) <= (others=>'0');
      end loop;
    elsif (Initialize = '1') then
      B(0) <= one;
      for i in 1 to RS_T loop
        B(i) <= zero;
      end loop;
    elsif (StoreNewPolys = '1') then
      if (KeepOldL='1') then
        for i in 1 to RS_T loop
          B(i) <= B(i-1);
        end loop;
      else
        for i in 0 to RS_T loop
          B(i) <= ZOmega(i);
        end loop;
      end if;
    end if;
  end if;
end process B_Process;

ZOmega_Process : process (clk)
begin
  if (clk'event and clk='1') then
    if (reset_n='0') then
      for i in 0 to RS_T loop
        ZOmega(i) <= (others=>'0');
      end loop;
    elsif (Initialize = '1') then
      for i in 0 to RS_T loop
        ZOmega(i) <= (others=>'0');
      end loop;
    elsif (StoreNewPolys = '1') then
      for i in 1 to RS_T loop
        ZOmega(i) <= add(mul(Gamma,ZOmega(i)),mul(Delta,B(i-1)));
      end loop;
    end if;
  end if;
end process ZOmega_Process;
\end{verbatim}
ZOmega(0) <= mul(Gamma, ZOmega(0));
end if;
end if;
end process ZOmega_Process;

GammaMu_Process : process (Gamma, Mu)
begin
  for i in 0 to RS_T loop
    GammaMu(i) <= mul(Gamma, Mu(i));
  end loop;
end process GammaMu_Process;

DeltaLambda_Process : process (Delta, Lambda)
begin
  for i in l to RS_T loop
    DeltaLambda(i) <= mul(Delta, Lambda(i-1));
  end loop;
end process DeltaLambda_Process;

omega_poly <=
  ZOmega(8) & ZOmega(7) & ZOmega(6) & ZOmega(5) & ZOmega(4) & ZOmega(3) & ZOmega(2) & ZOmega(1);

lambda_poly <=
  Mu(8) & Mu(7) & Mu(6) & Mu(5) & Mu(4) & Mu(3) & Mu(2) & Mu(1) &
  Mu(0);

Mu_Process : process (clk)
begin
  if (clk'event and clk='1') then
    if 'reset_n='0') then
      for i in 0 to RS_T loop
        Mu(i) <= (others=>'0');
      end loop;
    elsif ('Initialize='1') or ('ErrorsPresent='0') then
      for i in l to RS_T loop
        Mu(i) <= (others=>'0');
      end loop;
      Mu(0) <= one;
    elsif ('StoreNewPolys='1') then
      for i in l to RS_T loop
        Mu(i) <= add(GammaMu(i), DeltaLambda(i));
      end loop;
      Mu(0) <= GammaMu(0);
    end if;
  end if;
end process Mu_Process;

process (clk)
begin
  if (clk'event and clk='1') then
    if 'reset_n='0') then
      Initialize <= '0';
      StoreNewPolys <= '0';
      StartChien <= '0';
      CountEnable <= '0';
      current_state <= Idle;
    else
      case current_state is
        when Idle =>
          if ('XferSyndrome='1') then
            Initialize <= '1';
            StoreNewPolys <= '0';
            StartChien <= '0';
            CountEnable <= '0';
            current_state <= Init;
          else
            current_state <= Idle;
          end if;
        when Init =>
            if ('TransferDone='1') then
              Initialize <= '0';
              StoreNewPolys <= '0';
              StartChien <= '0';
              CountEnable <= '0';
              current_state <= Idle;
            else
              current_state <= Init;
            end if;
        when Idle;...
when ChienSearchStart =>
    Initialize <= '0';
    StoreNewPolys <= '0';
    StartChien <= '0';
    CountEnable <= '0';
    current_state <= Idle;

when Init =>
    if (ErrorsPresent = '0') then
        Initialize <= '0';
        StoreNewPolys <= '0';
        StartChien <= '0';
        CountEnable <= '1';
        current_state <= Synchronize;
    else
        Initialize <= '0';
        StoreNewPolys <= '1';
        StartChien <= '0';
        CountEnable <= '1';
        current_state <= Update_Polys;
    end if;

when Synchronize =>
    if (N = "01111") then
        Initialize <= '0';
        StoreNewPolys <= '0';
        StartChien <= '1';
        CountEnable <= '0';
        current_state <= ChienSearchStart;
    else
        current_state <= Synchronize;
    end if;

when Update_Polys =>
    if (N = "01111") then
        Initialize <= '0';
        StoreNewPolys <= '0';
        StartChien <= '1';
        CountEnable <= '0';
        current_state <= ChienSearchStart;
    else
        current_state <= Update_Polys;
    end if;

when others =>
    Initialize <= '0';
    StoreNewPolys <= '0';
    StartChien <= '0';
    CountEnable <= '0';
    current_state <= Idle;
end case;
end if;
end if;
end process;
end;

-- End of MBSolverProcess

--
-- Chien search parameters
--
-- RS_N = 179 ; RS_T = 8 ; RS_M0 = 212
-- RS_T_is_odd  = FALSE
-- RS_T_is_even = TRUE
-- K_mul_stage_is_needed = TRUE
-- K_VAL_is_needed  = TRUE
function inv (b : in std_logic_vector) return std_logic_vector is
variable d : std_logic_vector (7 downto 0);
begin
  d := "00000000";
  if (b="00000001") then d := "00000001";
  elsif (b="00000010") then d := "10001110";
  elsif (b="00000011") then d := "11110100";
  elsif (b="00000100") then d := "01000111";
  elsif (b="00000101") then d := "10110111";
  elsif (b="00000110") then d := "01111110";
  end if;
end function;
elseif (b="00010101") then d := "10011000";
elseif (b="00010110") then d := "00001111";
elseif (b="00011011") then d := "01011100";
elseif (b="00011000") then d := "00001011";
elseif (b="00011001") then d := "11011010";
elseif (b="00011010") then d := "01001100";
elseif (b="00011011") then d := "10011000";
elseif (b="00010100") then d := "01001000";
elseif (b="00010111") then d := "10001111";
elseif (b="00010110") then d := "10010010";
elseif (b="00010101") then d := "00010010";
elseif (b="01010000") then d := "10000000";
elseif (b="01010001") then d := "10001111";
elseif (b="01010010") then d := "10011111";
elseif (b="01010011") then d := "11000110";
elseif (b="01011000") then d := "11000110";
elseif (b="01011010") then d := "11010100";
elseif (b="01011001") then d := "11011010";
elseif (b="01011011") then d := "11101010";
elseif (b="01011100") then d := "11110001";
elseif (b="01011101") then d := "11110101";
elseif (b="01011110") then d := "11111001";
elseif (b="01011111") then d := "11111101";
elseif (b="01001001") then d := "01001001";
elseif (b="01001000") then d := "01001001";
elseif (b="01001011") then d := "01001001";
elseif (b="01001001") then d := "01001001";
elseif (b="01001010") then d := "01001001";
elseif (b="01001101") then d := "01001001";
elseif (b="01001110") then d := "01001001";
elseif (b="01001100") then d := "01001001";
elseif (b="01001111") then d := "01001001";
elseif (b="00100000") then d := "00100000";
elseif (b="00100011") then d := "00100000";
elseif (b="00100010") then d := "00100000";
elseif (b="00100001") then d := "00100000";
elseif (b="00100100") then d := "00100000";
elseif (b="00100101") then d := "00100000";
elseif (b="00100110") then d := "00100000";
elseif (b="00100111") then d := "00100000";
elseif (b="00101000") then d := "00100000";
elseif (b="00101001") then d := "00100000";
elseif (b="00101010") then d := "00100000";
elseif (b="00101011") then d := "00100000";
elseif (b="00101100") then d := "00100000";
elseif (b="00101101") then d := "00100000";
elseif (b="00101110") then d := "00100000";
elseif (b="00101111") then d := "00100000";
elseif (b="00110000") then d := "00100000";
elseif (b="00110001") then d := "00100000";
elseif (b="00110010") then d := "00100000";
elseif (b="00110011") then d := "00100000";
elseif (b="00110100") then d := "00100000";
elseif (b="00110101") then d := "00100000";
elseif (b="00110110") then d := "00100000";
elseif (b="00110111") then d := "00100000";
elseif (b="00111000") then d := "00100000";
elseif (b="00111001") then d := "00100000";
elseif (b="00111010") then d := "00100000";
elseif (b="00111011") then d := "00100000";
elseif (b="00111100") then d := "00100000";
elseif (b="00111101") then d := "00100000";
elseif (b="00111110") then d := "00100000";
elseif (b="00111111") then d := "00100000";
else  d := "00100000";
elsif (b="11011101") then d := "10011001";
elsif (b="11011100") then d := "00001010";
elsif (b="11011101") then d := "00110011";
elsif (b="11011110") then d := "10010011";
elsif (b="11011111") then d := "00110001";
elsif (b="11100000") then d := "00010110";
elsif (b="11100001") then d := "01111111";
elsif (b="11100010") then d := "21110010";
elsif (b="11100011") then d := "11111000";
elsif (b="11100100") then d := "01011000";
elsif (b="11100101") then d := "01110100";
elsif (b="11100110") then d := "11110100";
elsif (b="11100111") then d := "11010100";
elsif (b="11101000") then d := "11010110";
elsif (b="11101001") then d := "01011111";
elsif (b="11101010") then d := "10111000";
elsif (b="11101011") then d := "00100001";
elsif (b="11110000") then d := "01011000";
elsif (b="11110001") then d := "01110110";
elsif (b="11110010") then d := "01011010";
elsif (b="11110011") then d := "00101100";
elsif (b="11110100") then d := "00110100";
elsif (b="11110101") then d := "11110000";
elsif (b="11110110") then d := "11110001";
elsif (b="11110111") then d := "10110101";
elsif (b="11111000") then d := "11010010";
elsif (b="11111001") then d := "00000011";
elsif (b="11110100") then d := "10001111";
elsif (b="11111010") then d := "11000111";
elsif (b="11111101") then d := "11001001";
elsif (b="11111000") then d := "01000010";
elsif (b="11111001") then d := "11010100";
elsif (b="11111011") then d := "01101000";
elsif (b="11111110") then d := "01101010";
elsif (b="11111101") then d := "01111111";
elsif (b="11111110") then d := "01111110";
elsif (b="11111111") then d := "11111111";
end if;
return d;
end if;

function add (b, c : in Galois_Field_element) return Galois_Field_element is
variable d : Galois_Field_element;
begin

d(0) := (b(0) xor c(0));
d(1) := (b(1) xor c(1));
d(2) := (b(2) xor c(2));
d(3) := (b(3) xor c(3));
d(4) := (b(4) xor c(4));
d(5) := (b(5) xor c(5));
d(6) := (b(6) xor c(6));
d(7) := (b(7) xor c(7));
return d;
end add;

function mul (b, c : in Galois_Field_element) return Galois_Field_element is
variable d : Galois_Field_element;
begin

d(0) := (b(0) and c(0)) xor (b(1) and c(7)) xor (b(2) and c(6)) xor (b(3) and c(5)) xor (b(4) and c(4)) xor (b(5) and c(3)) xor (b(6) and c(2)) xor (b(7) and c(1)) xor (b(7) and c(1)) xor (b(7) and c(7));
d(1) := (b(0) and c(1)) xor (b(1) and c(0)) xor (b(2) and c(7)) xor (b(3) and c(6)) xor (b(4) and c(5)) xor (b(5) and c(4)) xor (b(6) and c(3)) xor (b(6) and c(2)) xor (b(7) and c(7));
d(2) := (b(0) and c(2)) xor (b(1) and c(1)) xor (b(2) and c(0)) xor (b(3) and c(7)) xor (b(4) and c(6)) xor (b(5) and c(5)) xor (b(5) and c(1)) xor (b(6) and c(0)) xor (b(6) and c(1)) xor (b(6) and c(6)) xor (b(6) and c(7)) xor (b(7) and c(6)) xor (b(7) and c(0)) xor (b(7) and c(1));
return d;
end mul;
d(1) := (b(0) and c(3)) xor (b(1) and c(2)) xor (b(1) and c(7)) xor (b(2) and c(1)) xor (b(2) and c(6)) xor (b(3) and c(4)) xor (b(4) and c(5)) xor (b(4) and c(7)) xor (b(5) and c(3)) xor (b(5) and c(6)) xor (b(6) and c(2)) xor (b(6) and c(7));
d(4) := (b(0) and c(1)) xor (b(1) and c(2)) xor (b(2) and c(3)) xor (b(1) and c(7)) xor (b(2) and c(2)) xor (b(3) and c(1)) xor (b(3) and c(5)) xor (b(4) and c(4)) xor (b(5) and c(5));
d(5) := (b(0) and c(6)) xor (b(1) and c(1)) xor (b(1) and c(4)) xor (b(2) and c(7)) xor (b(3) and c(2)) xor (b(3) and c(6)) xor (b(4) and c(5)) xor (b(5) and c(6));
d(6) := (b(0) and c(5)) xor (b(1) and c(5)) xor (b(2) and c(4)) xor (b(3) and c(7));
d(7) := (b(0) and c(7)) xor (b(1) and c(6));
return d;
end;

function IsNotZero(b : in std_logic_vector) return std_logic is
  variable d : std_logic;
begin
  d := b(0) or b(1) or b(2) or b(3) or b(4) or b(5) or b(6) or b(7);
  return d;
end IsNotZero;

begin
Lambda(0) <= lambda_poly ( 7 downto 0);
Lambda(1) <= lambda_poly ( 15 downto 8);
Lambda(2) <= lambda_poly ( 23 downto 16);
Lambda(3) <= lambda_poly ( 31 downto 24);
Lambda(4) <= lambda_poly ( 39 downto 32);
Lambda(5) <= lambda_poly ( 47 downto 40);
Lambda(6) <= lambda_poly ( 55 downto 48);
Lambda(7) <= lambda_poly ( 63 downto 56);
Lambda(8) <= lambda_poly ( 71 downto 64);
Omega(0) <= omega_poly ( 7 downto 0);
Omega(1) <= omega_poly ( 15 downto 8);
Omega(2) <= omega_poly ( 23 downto 16);
Omega(3) <= omega_poly ( 31 downto 24);
Omega(4) <= omega_poly ( 39 downto 32);
Omega(5) <= omega_poly ( 47 downto 40);
Omega(6) <= omega_poly ( 55 downto 48);
Omega(7) <= omega_poly ( 63 downto 56);

Init_IntLambda_and_IntOmega : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      
    end if;
  end if;
end process;
end;

182
Int\_lambda\_0 <= zero;
Int\_lambda\_1 <= zero;
Int\_lambda\_2 <= zero;
Int\_lambda\_3 <= zero;
Int\_lambda\_4 <= zero;
for i in 0 to RS\_T-1 loop
\text{IncOmega}(i) <= zero;
\text{end loop;}
elskill (StartChien = '1') then
Int\_lambda\_0 <= Lambda(0);
Int\_lambda\_1 <= Lambda(1);
Int\_lambda\_2 <= Lambda(2);
Int\_lambda\_3 <= Lambda(3);
Int\_lambda\_4 <= Lambda(4);
\text{end loop;}
elskill (CountEnable = '1') then
\text{Count} <= \text{Count} + 1;
\text{end if;}
end if;
end process;

System\_Counter : process (clk)
begins
\text{if} (clk'event and clk = '1') then
\text{if} ((reset\_n = '0') or (StartChien = '1')) then
\text{Count} <= others => '0';
elskill (CountEnable = '1') then
\text{Count} <= \text{Count} + 1;
\text{end if;}
\text{end if;}
end process;

System\_Count\_Enable : process (clk)
begins
\text{if} (clk'event and clk = '1') then
\text{if} ((reset\_n = '0')) then
\text{CountEnable} <= '0';
elskill (StartChien = '1') then
\text{CountEnable} <= '1';
elskill (Count = Count\_Max) then
\text{CountEnable} <= '0';
\text{end if;}
\text{end if;}
end process;

InitChien\_Control : process (clk)
begins
\text{if} (clk'event and clk = '1') then
\text{if} ((reset\_n = '0')) then
\text{InitChien} <= '0';
else
\text{InitChien} <= StartChien;
\text{end if;}
\text{end if;}
end process;

OutputEnable\_Control : process (clk)
begins
\text{if} (clk'event and clk = '1') then
\text{if} ((reset\_n = '0')) then
\text{OutputEnable} <= '0';
elskill (Count = OutputEnable\_StartCount) then
\text{OutputEnable} <= '1';
elskill (Count = OutputEnable\_StopCount) then
\text{OutputEnable} <= '0';
\text{end if;}
\text{end if;}
end process;

RS\_Data\_out\_Start\_Control : process (clk)
begins
if (clk'event and clk = '1') then
  if (reset_n = '0') then
    RS_Data_out_Start <= '0';
    elseif (Count = RS_Data_out_StartCount) then
      RS_Data_out_Start <= '1';
    else
      RS_Data_out_Start <= '0';
  end if;
end if;
end process;

DoChien_Control : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (Count = DoChienCountMax)) then
      DoChien <= '0';
    elseif (InitChien = '1') then
      DoChien <= '1';
    end if;
  end if;
end process;

X1_Pipeline : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      X1 <= alpha77;
      X1_D1 <= zero;
      X1_D2 <= zero;
      X1_D3 <= zero;
      X1_D4 <= zero;
      X1_D5 <= zero;
    else
      X1 <= mul(X1,alphal);
      X1_D1 <= X1;
      X1_D2 <= X1_D1;
      X1_D3 <= X1_D2;
      X1_D4 <= X1_D3;
      X1_D5 <= X1_D4;
    end if;
  end if;
end process;

Powers_of_X_Pipeline : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      X2 <= zero;
      X3 <= zero;
      X4 <= zero;
      X5 <= zero;
      X6 <= zero;
      X7 <= zero;
    else
      X2 <= mul(X1,X1);
      X3 <= mul(X2,X1_D1);
      X4 <= mul(X3,X1_D2);
      X5 <= mul(X4,X1_D3);
      X6 <= mul(X5,X1_D4);
      X7 <= mul(X6,X1_D5);
    end if;
  end if;
end process;

EVAL_OM_Pipeline : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      EVAL_OM_1 <= zero;
      EVAL_OM_2 <= zero;
      EVAL_OM_3 <= zero;
    else
      EVAL_OM_1 <= X2;
      EVAL_OM_2 <= X3;
      EVAL_OM_3 <= X4;
    end if;
  end if;
end process;
EVAL_OM_4 <= zero;
EVAL_OM_5 <= zero;
EVAL_OM_6 <= zero;
EVAL_OM <= zero;
EVAL_OM_D1 <= zero;
else
  EVAL_OM_1 <= add(IntOmega(0), mul(IntOmega(1), X1));
  EVAL_OM_2 <= add(EVAL_OM_1, mul(IntOmega(2), X2));
  EVAL_OM_3 <= add(EVAL_OM_2, mul(IntOmega(3), X3));
  EVAL_OM_4 <= add(EVAL_OM_3, mul(IntOmega(4), X4));
  EVAL_OM_5 <= add(EVAL_OM_4, mul(IntOmega(5), X5));
  EVAL_OM_6 <= add(EVAL_OM_5, mul(IntOmega(6), X6));
  EVALOM <= add(EVAL_OM_6, mul(IntOmega(7), X7));
  EVAL_OM_D1 <= EVAL_OM;
end if;
end if;
end process;

EVAL_LP_Pipeline : process (clk)
begi
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      EVAL_LP_1 <= zero;
      EVAL_LP_2 <= zero;
      EVAL_LP_3 <= zero;
      EVAL_LP_4 <= zero;
      EVAL_LP <= zero;
      EVAL_DEN <= zero;
      DEN_INV <= zero;
    else
      EVAL_LP_1 <= add(IntLambda_1, mul(IntLambda_3, X2));
      EVAL_LP_2 <= EVAL_LP_1;
      EVAL_LP_3 <= add(EVAL_LP_2, mul(IntLambda_5, X4));
      EVAL_LP_4 <= EVAL_LP_3;
      EVAL_LP <= add(EVAL_LP_4, mul(IntLambda_7, X6));
      EVAL_DEN <= mul(EVAL_LP, K_VAL); -- K_mul stage is needed
      DEN_INV <= inv(EVAL_DEN); -- K_mul stage is needed or EVAL_LAMBDA PRIME delay
    stage = '1'
    end if;
  end if;
end process;

K_VAL_Process : process (clk)
begi
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      K_VAL <= K_VAL_initial_value;
    else
      K_VAL <= mul(K_VAL, K_MUL);
    end if;
  end if;
end process;

Calc_Correction_Factor : process (clk)
begi
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      CORR_FACTOR <= (others => '0');
    elseif (remark = '1') then
      CORR_FACTOR <= mul(EVAL_OM_D1, DEN_INV);
    else
      CORR_FACTOR <= (others => '0');
    end if;
  end if;
end process;

Delay_RS_Data : process (clk)
begi
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      for i in 0 to SR_max loop

shift_register_delay(i) <= 'others=>'0');
end loop;
else
  for i in SR_max downto 1 loop
    shift_register_delay(i) <= shift_register_delay(i-1);
  end loop;
shift_register_delay(0) <= RS_Data_In;
end if;
end if;
end process;
RS_data_delayed <= shift_register_delay(SR_max);
Correct_RS_Data : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      RS_Data_out <= '0';
    elsif ((ChienSum = '0') and (OutputEnable = '1')) then
      RS_Data_out <= add(SQR_FACTOR,RS_data_delayed);
    elsif (OutputEnable = '1') then
      RS_Data_out <= RS_data_delayed;
    else
      RS_Data_out <= '0';
    end if;
  end if;
end process;
Calc_Temp_Registers : process (clk)
begin
  if (clk'event and clk = '1') then
    if (reset_n = '0') then
      Temp1 <= '0';
      Temp2 <= '0';
      Temp3 <= '0';
      Temp4 <= '0';
      Temp5 <= '0';
      Temp6 <= '0';
      Temp7 <= '0';
      Temp8 <= '0';
    elsif (InitChien = '1') then
      Temp1 <= mul(alpha77, Lambda(1));  -- 77 = (256 - 179) * 1 mod 255 = 77 mod 255
      Temp2 <= mul(alpha52, Lambda(2));  -- 52 = (256 - 180) * 2 mod 255 = 52 mod 255
      Temp3 <= mul(alpha225, Lambda(3)); -- 225 = (256 - 181) * 3 mod 255 = 225 mod 255
      Temp4 <= mul(alpha41, Lambda(4));  -- 41 = (256 - 182) * 4 mod 255 = 41 mod 255
      Temp5 <= mul(alpha110, Lambda(5)); -- 110 = (256 - 183) * 5 mod 255 = 110 mod 255
      Temp6 <= mul(alpha177, Lambda(6)); -- 177 = (256 - 184) * 6 mod 255 = 177 mod 255
      Temp7 <= mul(alpha242, Lambda(7)); -- 242 = (256 - 185) * 7 mod 255 = 242 mod 255
      Temp8 <= mul(alpha50, Lambda(8));  -- 50 = (256 - 186) * 8 mod 255 = 50 mod 255
    elsif (DoChien = '1') then
      Temp1 <= mul(alpha1, Temp1);
      Temp2 <= mul(alpha2, Temp2);
      Temp3 <= mul(alpha3, Temp3);
      Temp4 <= mul(alpha4, Temp4);
      Temp5 <= mul(alpha5, Temp5);
      Temp6 <= mul(alpha6, Temp6);
      Temp7 <= mul(alpha7, Temp7);
      Temp8 <= mul(alpha8, Temp8);
    end if;
  end if;
end process;
Chien_Sum_Pipeline : process (clk)
begin
  if (clk'event and clk = '1') then
    if ((reset_n = '0') or (StartChien = '1')) then
      EVAL_CS_1 <= '0';
      EVAL_CS_2 <= '0';
      EVAL_CS_3 <= '0';
      EVAL_CS_4 <= '0';
      EVAL_CS_5 <= '0';
      EVAL_CS_6 <= '0';
    end if;
  end if;
end process;
EVAL_CS_7 <= zero;
ChienSum <= zero;
elif 'DoChien = '1' then
  EVAL_CS_1 <= add(IntLambda_0.Temp1);
  EVAL_CS_2 <= add(EVAL_CS_1.Temp2);
  EVAL_CS_3 <= add(EVAL_CS_2.Temp3);
  EVAL_CS_4 <= add(EVAL_CS_3.Temp4);
  EVAL_CS_5 <= add(EVAL_CS_4.Temp5);
  EVAL_CS_6 <= add(EVAL_CS_5.Temp6);
  EVAL_CS_7 <= add(EVAL_CS_6.Temp7);
  ChienSum <= add(EVAL_CS_7.Temp8);
end if;
end if;
end process;
end;

------------------------------------------------------------------------------------------------------------------------
-- End of ChienSearchProcess
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-- Start of RSTOPProcess
------------------------------------------------------------------------------------------------------------------------

library ieee;
use ieee.std_logic_1164.all;
entity RS_Decoder_top is
  generic (GFpower : INTEGER := 8; RS_N : INTEGER := 179; RS_T : INTEGER := 8);
  port ( 
    clk : in std_logic;
    reset_n : in std_logic;
    rs_data_in : in std_logic_vector(GFpower - 1 downto 0);
    rs_data_in_start : in std_logic;
    RS_Data_out : out std_logic_vector(GFpower - 1 downto 0);
    RS_Data_out_start : out std_logic;
    rs_enable : in std_logic );
end RS_Decoder_top;

use work.all;
architecture RS_Decoder_top of RS_Decoder_top is
  signal errors_present : std_logic;
  signal lambda_poly : std_logic_vector((RS_T + 1) * GFpower - 1 downto 0);
  signal omega_poly : std_logic_vector(RS_T * GFpower - 1 downto 0);
  signal StartChien : std_logic;
  signal syndrome : std_logic_vector(2 * GFpower + RS_T - 1 downto 0);
  signal syndrome_calc_done : std_logic;
  component MBSolver
  port ( 
    reset_n : in std_logic;
    clk : in std_logic;
    ErrorsPresent : in std_logic;
    XferSyndrome : in std_logic;
    StartChien : out std_logic;
    syndrome_poly : in std_logic_vector(2 * GFpower + RS_T - 1 downto 0);
    omega_poly : out std_logic_vector(RS_T * GFpower - 1 downto 0);
    lambda_poly : out std_logic_vector((RS_T + 1) * GFpower - 1 downto 0)
  );
end component;
component InputProcess
  port ( 
    reset_n : in std_logic;
    clk : in std_logic;
    rs_data_in : in std_logic_vector(GFpower - 1 downto 0);
    rs_data_in_start : in std_logic;
  );
end component;

syndrome_calc_done : out std_logic;
errors_present : out std_logic;
syndrome : out std_logic_vector(2 * GFFPower * RS_T - 1 downto 0);
end component;

component ChienSearchProcess
port (;
clk : in std_logic;
reset_n : in std_logic;
lambda_poly : in std_logic_vector((RS_T + 1) * GFFPower - 1 downto 0);
omega_poly : in std_logic_vector(RS_T * GFFPower - 1 downto 0);
StartChien : in std_logic;
rs_enable : in std_logic;
RS_Data_In : in std_logic_vector(GFFPower - 1 downto 0);
RS_Data_out : out std_logic_vector(GFFPower - 1 downto 0);
RS_Data_out_start : out std_logic;
);end component;

begin
inst_MBSolver: MBSolver
port map (;
reset_n => reset_n,
clk => clk,
ErrorsPresent => errors_present,
XferSyndrome => syndrome_calc_done,
StartChien => StartChien,
syndrome_poly => syndrome(2 * GFFPower * RS_T - 1 downto 0),
omega_poly => omega_poly(RS_T * GFFPower - 1 downto 0),
lambda_poly => lambda_poly((RS_T + 1) * GFFPower - 1 downto 0));

inst_InputProcess: InputProcess
port map (;
reset_n => reset_n,
clk => clk,
rs_data_in => rs_data_in(GFFPower - 1 downto 0),
rs_data_in_start => rs_data_in_start,
syndrome_calc_done => syndrome_calc_done,
errors_present => errors_present,
syndrome => syndrome(2 * GFFPower * RS_T - 1 downto 0));

inst_ChienSearch: ChienSearchProcess
port map (;
clk => clk,
reset_n => reset_n,
lambda_poly => lambda_poly((RS_T + 1) * GFFPower - 1 downto 0),
omega_poly => omega_poly(RS_T * GFFPower - 1 downto 0),
StartChien => StartChien,
rs_enable => rs_enable,
RS_Data_In => rs_data_in(GFFPower - 1 downto 0),
RS_Data_out => RS_Data_out(GFFPower - 1 downto 0),
RS_Data_out_start => RS_Data_out_start);
end RS_Decoder_top;

---------------------------------------------
-- End of RSTopProcess
17 Appendix E - RS Encoder Testbench VHDL Code

This appendix contains the VHDL code generated for the RS encoder testbench.
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_signed.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_textio.all;
use std.textio.all;

declaration
type pass_fail_type is (pass, fail);
end;

begin

entity tb_validate_encoder is
generic (
  encoder_expected_filename : string := "enc_exp.txt";
  report_filename : string := "enc_report.txt";
  GFpower : integer := 4;
  RS_N : integer := 15;
  RS_T : integer := 3
);

port (encoder_data_out : in std_logic_vector (7 downto 0);
      encoder_output_strobe : in std_logic;
      clk : in std_logic;
      reset_n : in std_logic);
end;

architecture RTL of tb_validate_encoder is

file encoder_expected_file : text is in encoder_expected_filename;
file report_file : text is out report_filename;
type pass_fail_type is (pass, fail);
signal sampling_encoder : std_logic;
signal overall_pass_fail_status : pass_fail_type := pass;

procedure write_output_separator is
  variable ll : line;
begin
  write(ll, string("|------------------------------------------------------------------------------------------------|
"));
  writeln(output, ll);
end write_output_separator;

procedure write_report_separator is
  variable ll : line;
begin
  write(ll, string("|------------------------------------------------------------------------------------------------|
"));
  writeln(report_file, ll);
end write_report_separator;

procedure make_pass_fail_line (RS_check_name : in string; cell_count : in integer;
pass_fail_status : in pass_fail_type; ll : inout line) is
begin
  write(ll, string("|"));
  write(ll, RS_check_name);
  write(ll, string(" codeword "));
  write(ll, cell_count, justified=>right, field=>4);
  if (pass_fail_status=pass) then
    write(ll, string(" |	PASS |"));
  else
    write(ll, string(" |	FAIL |"));
  end if;
end make_pass_fail_line;

procedure data_compare (variable_name : in string; actual_data : in integer; exp_data : in integer;
status : inout pass_fail_type) is
  variable ll : line;
begin
  write(ll, string("|"));
  write(ll, variable_name);
  write(ll, string(" : expected = 
"));
  write(ll, exp_data, justified=>right, field=>4);
  if (actual_data /= exp_data) then
    write_output_separator;
    make_pass_fail_line (variable_name, 1, pass_fail_status=>fail, ll);
    report_file.write_string("Data mismatch for \
" & variable_name & ":
Expected: \
" & Integer'image(actual_data) & 
"\nGot: \
" & Integer'image(exp_data) & 
"");
end if;
end data_compare;
end;

end;

write(ll, string(" : actual = "));
write(ll, actual_data, justified->right, field=4);
if (actual_data=exp_data) then
    write(ll, string(" | PASS | "));
else
    write(ll, string(" | FAIL | "));
status := fail;
end if;
writeln(report_file, ll);
end data_compare;

begin

verify_encoder_output : process

variable ll, l2 : line;
variable cell_count : integer;
variable actual_encoder_data_int : integer;
variable expected_encoder_data_int : integer;
variable pass_fail_status : pass_fail_type;

begin

pass_fail_status := pass;
overall_pass_fail_status := pass;
cell_count := 0;
sampling_encoder := '0';
write_output_separator;

while (true) loop

wait until encoder_output_strobe'event and encoder_output_strobe='l';
wait until clk'event and clk='0';
wait until clk'event and clk='0';
cell_count := cell_count + 1;
pass_fail_status := pass;
sampling_encoder := '0';
write_report_separator;
for i in RS_N-1 downto 0 loop
    readline(encoder_expected_file, ll);
    read(ll, expected_encoder_data_int);
    sampling_encoder := 'l';
    actual_encoder_data_int := conv_integer('0',encoder_data_out);
data_compare(string("encoder_data_out "), actual_encoder_data_int, expected_encoder_data_int, pass_fail_status);
    if (pass_fail_status=fail) then overall_pass_fail_status := fail; end if;
    wait for 1 ns;
    sampling_encoder := '0';
    wait until clk'event and clk='0';
end loop;
write_report_separator;
make_pass_fail_line(string("encoder"),cell_count,pass_fail_status,l2);
writeln(report_file,l2);
write_report_separator;
write(l2,string(" "));
writeln(report_file,l2); -- this writes a blank line
make_pass_fail_line(string("encoder"),cell_count,pass_fail_status,l2);
writeln(output,l2);

wait until clk'event and clk='0';
sampling_encoder := '0';
if (enfile(encoder_expected_file)) then
    write_output_separator;
    write(ll,string(" "));
    writeln(report_file,ll);
    write(ll,string("End of processing at "));
write(ll, now);
write(ll, string("*"));
write(ll, string("End of processing at "));
write(ll, now);
write(output, ll);
if (overall_pass_failure_status == failure) then
  assert (false) report "Simulation Done. All tests have PASSED." severity error;
else
  assert (false) report "Simulation Done. ATTENTION: There were some failures. -- FAIL FAIL FAIL" severity error;
end if;
wait;
end if;
end loop; -- while (TRUE) loop
end process verify_encoder_output;
end;
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_textio.all;
use ieee.std_logic_arith.all;
use std.textio.ALL;

entity tb_rs_encoder_sim is
generic ( 
  GF_power : integer := 8; RS_N : integer := 20; RS_T : integer := 1 
);
port (reset_n : out std_logic;
  clk : out std_logic;
  data_size : out std_logic_vector (GF_power-1) downto 0;
  rs_data_in : out std_logic_vector (GF_power-1) downto 0;
  rs_data_in_start : out std_logic 
);
end;
architecture RTL of tb_rs_encoder_sim is
signal int_clk: std_logic;
constant in_filename: string := "enc_stim.txt"
begin
generate_Intsys_clk: process
begin
  int_clk <= '1';
  while true loop
    wait for 50 ns;
    int_clk <= not int_clk;
    wait for 50 ns;
    int_clk <= not int_clk;
  end loop;
end process;
clk <= int_clk;
process
begin
  reset_n <= '0';
  for i in 1 to 40 loop
    wait until (int_clk'event and int_clk='1');
  end loop;
  reset_n <= '1';
  wait;
end process;
generate_stimulus : process
variable stimulus_var : std_logic_vector(GFPower downto 0);
file infile: text is in in_filename;
variable stimulus_line : line;
variable RS_N_int : integer;
variable RS_T_int : integer;
begin
    rs_data_in_start <= '0';
    rs_data_in <= (others => '0');
data_size <= conv_std_logic_vector(RS_N-2*RS_T,GFPower);
for i in 1 to 50 loop
    wait until int_clk'event and int_clk='1';
end loop;
while not endfile(infile) loop
    wait for 5 ns;
    readline(infile, stimulus_line);
    read(stimulus_line, stimulus_var);
    rs_data_in_start <= stimulus_var(GFPower);
    rs_data_in <= stimulus_var((GFPower-1) downto 0);
    wait until int_clk'event and int_clk='1';
end loop; -- while not endfile(infile) loop
wait; -- forever
end process generate_stimulus;
end:
library ieee;
use ieee.std_logic_1164.all;
entity top_tb_RS_encoder is
generic (GFPower : INTEGER := 8;
        RS_N : INTEGER := 15;
        RS_T : INTEGER := 1
    );
end top_tb_RS_encoder;
use work.all;
architecture RTL of top_tb_RS_encoder is
signal clk : std_logic;
signal data_size : std_logic_vector(GFPower - 1 downto 0);
signal encoder_data_out : std_logic_vector(GFPower - 1 downto 0);
signal encoder_output_strobe : std_logic;
signal reset_n : std_logic;
signal rs_data_in : std_logic_vector(GFPower - 1 downto 0);
signal rs_data_in_start : std_logic;
component tb_rs_encoder_stim
    generic (GFPower : INTEGER := 8;
        RS_N : INTEGER := 15;
        RS_T : INTEGER := 1
    );
    port (reset_n : out std_logic;
        clk : out std_logic;
        data_size : out std_logic_vector((GFPower - 1) downto 0);
        rs_data_in : out std_logic_vector((GFPower - 1) downto 0);
        rs_data_in_start : out std_logic
    );
end component;
component rs_encoder
    port (reset_n : in std_logic;
        data_out : out std_logic_vector(GFPower - 1 downto 0)
    );
data_in : in std_logic_vector(GFPower - 1 downto 0);
input_strobe : in std_logic;
clk : in std_logic;
output_strobe : out std_logic;
data_size : in std_logic_vector(GFPower - 1 downto 0);

end component;

component tb_validate_encoder
  generic
  encoder_expected_filename : STRING := "enc_exp.txt";
  report_filename : STRING := "enc_report.txt";
  GFPower : INTEGER := 4;
  RS_N : INTEGER := 15;
  RS_T : INTEGER := 3;
end component;

begin

inst_tb_rs_encoder_stim: tb_rs_encoder_stim
  generic map (GFPower, RS_N, RS_T)
  port map (
    reset_n => reset_n,
    clk => clk,
    data_size => data_size(GFPower - 1 downto 0),
    rs_data_in => rs_data_in(GFPower - 1 downto 0),
    rs_data_in_start => rs_data_in_start);

inst_rs_encoder: rs_encoder
  port map (
    reset_n => reset_n,
    data_out => encoder_data_out(GFPower - 1 downto 0),
    data_in => rs_data_in(GFPower - 1 downto 0),
    input_strobe => rs_data_in_start,
    clk => clk,
    output_strobe => encoder_output_strobe,
    data_size => data_size(GFPower - 1 downto 0));

inst_tb_validate_encoder: tb_validate_encoder
  generic map ("enc_exp.txt",
               "enc_report.txt",
               GFPower,
               RS_N,
               RS_T)
  port map (
    encoder_data_out => encoder_data_out(GFPower - 1 downto 0),
    encoder_output_strobe => encoder_output_strobe,
    clk => clk,
    reset_n => reset_n);
end RTL;
Appendix F - RS Decoder Testbench VHDL Code

This appendix contains the VHDL code generated for the RS decoder testbench.
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_signed.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_textio.all;
use std.textio.all;

entity tb_validate_RS is

generic (
  InputProcess_expected_filename : string := "ip_exp.txt";
  MBSolver_expected_filename : string := "mb_exp.txt";
  ChienSearch_expected_filename : string := "cs_exp.txt";
  stimulus_filename : string := "ip_stim.txt";
  report_filename : string := "ip_report.txt";
  GPPower : integer := 4;
  RS_N : integer := 15;
  RS_T : integer := 3
);

port (syndrome_calc done : in std_logic;
  errors_present : in std_logic;
  StartChien : in std_logic;
  lambda_poly : in std_logic_vector ((GPPower * (RS_T + 1) - 1) downto 0);
  omega_poly : in std_logic_vector ((GPPower * RS_T - 1) downto 0);
  syndrome : in std_logic_vector ((GPPower * RS_T + 1) - 1) downto 0);
  reset_n : in std_logic;
  clk : in std_logic;
  blk_out : in std_logic_vector (GPPower-1 downto 0);
  blk_sync_out : in std_logic );

end;

architecture RTL of tb_validate_RS is

file InputProcess_expected_file : text is in InputProcess_expected_filename;
file MBSolver_expected_file : text is in MBSolver_expected_filename;
file ChienSearch_expected_file : text is in ChienSearch_expected_filename;
file report_file : text is out report_filename;
type pass_fail_type is (pass, fail);
signal sampling_IP : std_logic;
signal sampling_MB : std_logic;
signal sampling_CS : std_logic;
signal syndrome_element : std_logic_vector(GPPower-1 downto 0);
signal overall_IP_pass_fail_status : pass_fail_type := pass;
signal overall_MB_pass_fail_status : pass_fail_type := pass;
signal overall_CS_pass_fail_status : pass_fail_type := pass;

procedure write_output_separator is
variable ll : line;
beginn
  write(ll,
      string('"|----------------------------------------------------------|"));
  writeln(output, ll);
end write_output_separator;

procedure write_report_separator is
variable ll : line;
beginn
  write(ll,
      string('"|----------------------------------------------------------|"));
  writeln(report_file, ll);
end write_report_separator;

procedure make_pass_fail_line (RS_check_name : in string; cell_count : in integer;
  pass_fail_status : in pass_fail_type; ll : inout line ) is
begin
  write(ll,string('" | "));
  write(ll,RS_check_name);
  write(ll,string(" codeword "));
  write(ll,cell_count,justified=>right,field=>4);
  if (pass_fail_status=pass) then
    write(ll,string('" ; PASS/FAIL status = | PASS |"));
  end if;
end make_pass_fail_line;

196
else
  write(ll,string(" ; PASS/FAIL status = | FAIL | ******* FAIL"));
end if;

enc_make_pass_fail_line;

procedure data_compare (variable_name : in string; actual_data : in integer; exp_data : in integer; status : inout pass_fail_type) is
variable ll: line;
begin
  write(ll,string("| "));
  write(ll,variable_name);
  write(ll,string(" ; expected = "));
  write(ll,exp_data,justified=>right,field=>4);
  write(ll,string(" ; actual = "));
  write(ll,actual_data,justified=>right,field=>4);
  if (actual_data=exp_data) then
    write(ll, string(" | PASS |"));
  else
    write(ll, string(" | FAIL |"));
    status := fail;
  end if;
  writeln(report_file,ll);
end data_compare;

begin -- architecture RTL of tb_validate_IP

verify_InputProcess_output : process

variable ll, l2 : line;
variable cell_count : integer;
variable expected Syndrome_int : integer;
variable actual Syndrome_int : integer;
variable pass fail status : pass_fail_type;
variable first_bit : integer;
variable last_bit : integer;
begin
  syndrome_element <= (others=>'0');
  write(ll,string("stimulus file : "));
  write(ll,stimulus_filename);
  writeln(output,ll);
  write(ll,string("InputProcess expected file : "));
  write(ll,InputProcess_expected_filename);
  writeln(output,ll);
  write(ll,string(" MBSolver expected file : "));
  write(ll,MBSolver_expected_filename);
  writeln(output,ll);
  write(ll,string(" ChienSearch expected file : "));
  write(ll,ChienSearch_expected_filename);
  writeln(output,ll);
  write(ll,string(" report file : "));
  write(ll,report_filename);
  writeln(output,ll);
  writeln(output,ll);
  write(ll,string("stimulus file : "));
  write(ll,stimulus_filename);
  writeln(report_file,ll);
  write(ll,string("InputProcess expected file : "));
  write(ll,InputProcess_expected_filename);
  writeln(report_file,ll);
  write(ll,string(" MBSolver expected file : "));
  write(ll,MBSolver_expected_filename);
  writeln(report_file,ll);
  write(ll,string(" ChienSearch expected file : "));
  write(ll,ChienSearch_expected_filename);
  writeln(report_file,ll);
  write(ll,string(" report file : "));
  write(ll,report_filename);
  writeln(report_file,ll);
  write(ll,string(" report file : "));
end verify_InputProcess_output;
write report_filename);
write line(report_file, l1);
write line(report_file, l1);

pass fail status := pass;
overall IP pass fail status <= pass;
cell_count := 0;
sampling IP <= '0';
write output separator;

while (true) loop

wait until syndrome_calc_done'event and syndrome_calc_done='1';
wait until clk'event and clk='0';
wait until clk'event and clk='0';
cell_count := cell_count + 1;
pass fail status := pass;

write report separator;
sampling IP <= '1';
for i in 0 to 2*RS_T-1 loop
  readline(inputProcess_expected_file, l1);
  read(l1, expected syndrome_int);
  first bit := GFPower*(i+1) - 1;
  last bit := GFPower*i;
  syndrome element <= syndrome(first bit downto last bit);
  actual syndrome_int := conv_integer('0'&syndrome(first bit downto last bit));
  write(l1, string("' actual syndrome : "));
  write(l1, syndrome(first bit downto last bit));
  write(l1, report_filename);
  data compare(string("'syndrome "), actual syndrome_int, expected syndrome_int, pass fail status);
  if (pass fail status=fall) then overall IP pass fail status <= fall; end if;
  first bit := first bit - GFPower;
end loop;
write report separator;
make pass fail line(string("'InputProcess "), cell_count, pass fail status, l1);
write line(report_file, l1);
make pass fail line(string("'InputProcess "), cell_count, pass fail status, l1);
write line(output, l1);
wait for 1 ns;
sampling IP <= '0';
wait until clk'event and clk='0';
end loop; -- while (TRUE) loop

end process verify InputProcess output;

verify MBSolver output : process

variable l1, l2 : line;
variable cell_count : integer;
variable expected lambda_int : integer;
variable actual lambda_int : integer;
variable expected omega_int : integer;
variable skip MB_int : integer;
variable actual omega_int : integer;
variable pass fail status : pass fail type;
variable first bit : integer;
variable last bit : integer;

begin
overall MB pass fail status <= pass;
pass fail status := pass;
cell_count := 0;
while (true) loop

write report separator;
make pass fail line(string("'InputProcess "), cell_count, pass fail status, l1);
write line(output, l1);
make pass fail line(string("'InputProcess "), cell_count, pass fail status, l1);
write line(output, l1);
wait for 1 ns;
sampling IP <= '0';
wait until clk'event and clk='0';
end loop; -- while (TRUE) loop

end process verify MBSolver output;
wait until StartChien'event and StartChien='1';
wait until clk'event and clk='0';
cell_count := cell_count + 1;
pass_fail_status := pass;
sampling_MB := '0';

write_report_separator;
for i in 0 to RS_T loop
  readline(MBSolver_expected_file, li);
  readline(li.expected_lambda_int);
  if (i=0) then read(li.skip_MB_int); end if;
  first_bit := GFPower*(i+1) - 1;
  last_bit := GFPower*I;
  actual_lambda_int := conv_integer('0'&lambda_poly(first_bit downto last_bit));
  if (skip_MB_int=0) then
    data_compare(string('"lambda
      *",actual_lambda_int,expected_lambda_int,pass_fail_status);
  end if;
  if (pass_fail_status=fail) then overall_MB_pass_fail_status <= fail; end if;
  wait for 1 ns;
  sampling_MB := '0';
  wait for 1 ns;
  first_bit := first_bit - GFPower;
end loop;
write_report_separator;
for i in 0 to RS_T-1 loop
  readline(MBSolver_expected_file, li);
  readline(li.expected_omega_int);
  first_bit := GFPower*(i+1) - 1;
  last_bit := GFPower*I;
  actual_omega_int := conv_integer('0'&omega_poly(first_bit downto last_bit));
  if (skip_MB_int=0) then
    data_compare(string('"omega
      *",actual_omega_int,expected_omega_int,pass_fail_status);
  end if;
  if (pass_fail_status=fail) then overall_MB_pass_fail_status <= fail; end if;
  wait for 1 ns;
  sampling_MB := '0';
  wait for 1 ns;
  first_bit := first_bit - GFPower;
end loop;
write_report_separator;
make_pass_fail_line(string('"MBSolver
      *",cell_count,pass_fail_status,li);
write_line(report_file,li);
make_pass_fail_line(string('"MBSolver
      *",cell_count,pass_fail_status,li);
write_line(output,li);
wait until clk'event and clk='0';
sampling_MB := '0';
end loop; -- while (TRUE) loop

end process verify_MBSolver_output;

verify_ChienSearch_output : process

variable li, l2 : line;
variable cell_count : integer;
variable actual_rs_data_int : integer;
variable expected_rs_data_int : integer;
variable pass_fail_status : pass_fail_type;
begin
  pass_fail_status := pass;
  overall_CS_pass_fail_status <= pass;
cell_count := 0;
while (true) loop

wait until blk_sync_out='event and blk_sync_out='l';
wait until clk='event and clk='0';
cell_count := cell_count + 1;
pass_fail_status := pass;
sampling_CS <= '0';

write_report_separator;
for i in RS_N-1 downto 0 loop
  readline(ChienSearch_expected_file, l1);
  read(l1,expected_rs_data_int);
  sampling_CS <= '1';
  actual_rs_data_int := conv_integer('0'&blk_out);
  data_compare(string('"rs_data_out ").actual_rs_data_int,expected_rs_data_int,pass_fail_status);
  if (pass_fail_status=fail) then overall_CS_pass_fail_status <= fail; end if;
  wait for 1 ns;
  sampling_CS <= '0';
  wait until clk='event and clk='0';
end loop;
write_report_separator;

make_pass_fail_line(string'("ChienSearch "),cell_count.pass_fail_status,l2);
write(l2,report_file,l2);
write_report_separator;
write(l2,string('" ");
write_report_separator;
write(l2,report_file,l2);
write_pass_fail_line(string'("ChienSearch "),cell_count.pass_fail_status,l2);
write(l2,output,l2);
write_report_separator;

wait until clk='event and clk='0';
sampling_CS <= '0';

if (endfile(ChienSearch_expected_file)) then
  write(l1,string('" ");
  write_report_separator;
  write(l1,report_file,l1);
  write(l1,string('"End of processing at ");
  write(l1,now);
  write_report_separator;
  write(l1,report_file,l1);
  if (overall_IP_pass_fail_status=pass) then write(l2,string('" ");
  write(l2,report_file,l2);
  write(l2,string('"End of processing at ");
  write(l2,now);
  write_report_separator;
  write(l2,output,l2);

  if ((overall_IP_pass_fail_status=pass) and (overall_MB_pass_fail_status=pass) and
    (overall_CS_pass_fail_status=pass)) then
    assert (false) report "Simulation Done. All tests have PASSED."
  severity error:
  else
    assert (false) report "Simulation Done. ATTENTION : There were some failures. --
  FAIL FAIL FAIL" severity error;
    end if;
    write(l1,string('"overall_IP_pass_fail_status = ");
  if (overall_IP_pass_fail_status=pass) then write(l1,string('"PASS")); else
    write(l1,string('"FAIL")); end if;
    write(output,l1);
    write(l2,string('"overall_MB_pass_fail_status = ");
  if (overall_MB_pass_fail_status=pass) then write(l2,string('"PASS")); else
    write(l2,string('"FAIL")); end if;
    write(output,l1);
    write(l1,string('"overall_CS_pass_fail_status = ");
  if (overall_CS_pass_fail_status=pass) then write(l1,string('"PASS")); else
    write(l1,string('"FAIL")); end if;
    write(output,l1);
  assert (false) report "Simulation Done. End of Report." severity error:
  wait;
  end if;
end loop: -- while (TRUE) loop

end process verify_chien_search_output;

end;

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_textio.all;
use ieee.std_logic_arith.all;
use std.textio.ALL;

entity tb_rs_stim is
  generic (
    GFPower : integer := 8
  );

port (reset_n : out std_logic;
      clk : out std_logic;
      rs_enable : out std_logic;
      rs_data_in : out std_logic_vector ((GFPower-1) downto 0);
      rs_data_in_start : out std_logic);
end;

architecture RTL of tb_rs_stim is
signal int_clk: std_logic;

constant in_filename: string := "ip_stim.txt";

begin

generate_intsys_clk : process
begin
  int_clk <= '1';
  while true loop
    wait for 50 ns;
    int_clk <= not int_clk;
    wait for 50 ns;
    int_clk <= not int_clk;
  end loop;
end process;
clk <= int_clk;

process
begin
  reset_n <= '0';
  rs_enable <= '1';
  for i in 1 to 40 loop
    wait until (int_clk'event and int_clk='1');
  end loop;
  reset_n <= '1';
  wait;
end process;

generate_stimulus : process
variable stimulus_var : std_logic_vector(GFPower downto 0);
file infile: text is in in_filename;
variable stimulus_line : line;
variable RS_N_int : integer;
variable RS_T_int : integer;

begin
  rs_data_in_start <= '0';
  rs_data_in <= (others => '0');
  for i in 1 to 50 loop
    wait until int_clk'event and int_clk='1';
  end loop;

  while not endfile(infile) loop
    readline(infile, stimulus_line);
  end while;

  stimulus_var <= stimulus_line;

end

end process;

wait;
read(stimulus_line, stimulus_var);
rs_data_in_start <= stimulus_var(GPower);
rs_data_in <= stimulus_var((GPower-1) downto 0);
wait until int_clk'event and int_clk='1';
end loop; -- while not endfile(infile) loop
wait; -- forever
end process generate_stimulus;
end;

library ieee;
use ieee.std_logic_1164.all;
entity top_tb_RS_decoder is
generic (     
    GPower : INTEGER := 8;
    RS_N : INTEGER := 22;
    RS_T : INTEGER := 8
);
end top_tb_RS_decoder;

use work.all;
architecture RTL of top_tb_RS_decoder is
signal clk : std_logic;
signal reset_n : std_logic;
signal rs_data_in : std_logic_vector(GPower - 1 downto 0);
signal rs_data_in_start : std_logic;
signal RS_Data_out : std_logic_vector(7 downto 0);
signal RS_Data_out_Start : std_logic;
signal rs_enable : std_logic;
signal tb_errors_present : std_logic;
signal tb_lambda_poly : std_logic_vector((RS_T - 1) * GPower - 1 downto 0);
signal tb_omega_poly : std_logic_vector(RS_T * GPower - 1 downto 0);
signal tb_StartChien : std_logic;
signal tb_syndrome : std_logic_vector(2 * GPower * RS_T - 1 downto 0);
signal tb_syndrome_calc_done : std_logic;
component tb_rs_stim
generic (     
    GPower : INTEGER := 8
);
port (     
    reset_n : out std_logic;
    clk : out std_logic;
    rs_enable : out std_logic;
    rs_data_in : out std_logic_vector((GPower - 1) downto 0);
    rs_data_in_start : out std_logic
);
end component;
component tb_validate_RS
generic (     
    InputProcess_expected_filename : STRING := "ip_exp.txt";
    MBSolver_expected_filename : STRING := "mb_exp.txt";
    ChienSearch_expected_filename : STRING := "cs_exp.txt";
    stimulus_filename : STRING := "ip_stim.txt";
    report_filename : STRING := "ip_report.txt";
    GPower : INTEGER := 4;
    RS_N : INTEGER := 15;
    RS_T : INTEGER := 3
);
port (     
    syndrome_calc_done : in std_logic;
    errors_present : in std_logic;
    StartChien : in std_logic;
    lambda_poly : in std_logic_vector((GPower * (RS_T - 1) - 1) downto 0);
    omega_poly : in std_logic_vector((GPower * RS_T - 1) downto 0);
    syndrome : in std_logic_vector((GPower * RS_T - 2 - 1) downto 0)
);
end component;
reset_n : in std_logic;
clk : in std_logic;
blk_out : in std_logic_vector(GFPower - 1 downto 0);
blk_sync_out : in std_logic;
end component;

component RS_Decoder_top
generic {
  GFPower : INTEGER := 8;
  RS_N : INTEGER := 71;
  RS_T : INTEGER := 8
};

port {
  clk : in std_logic;
  reset_n : in std_logic;
  rs_data_in : in std_logic_vector(GFPower - 1 downto 0);
  rs_data_in_start : in std_logic;
  RS_Data_out : out std_logic_vector(7 downto 0);
  RS_Data_out_Start : out std_logic;
  rs_enable : in std_logic;
  tb_errors_present : out std_logic;
  tb_lambda_poly : out std_logic_vector((RS_T + 1) * GFPower - 1 downto 0);
  tb_omega_poly : out std_logic_vector(RS_T * GFPower - 1 downto 0);
  tb_StartChien : out std_logic;
  tb_syndrome : out std_logic_vector(2 * GFPower * RS_T - 1 downto 0);
  tb_syndrome_calc_done : out std_logic;
};
end component;

begin

inst_tb_rs_stim: tb_rs_stim
  generic map (GFPower)
  port map {
    reset_n => reset_n,
    clk => clk,
    rs_enable => rs_enable,
    rs_data_in => rs_data_in(GFPower - 1 downto 0),
    rs_data_in_start => rs_data_in_start;
  }

inst_tb_validdata_RS: tb_validdata_RS
  generic map ("ip_exp.txt",
              "mb_exp.txt",
              "cs_exp.txt",
              "ip_stim.txt",
              "ip_report.txt",
              GFPower,
              RS_N,
              RS_T)
  port map {
    syndrome_calc_done => tb_syndrome_calc_done,
    errors_present => tb_errors_present,
    StartChien => tb_StartChien,
    lambda_poly => tb_lambda_poly((RS_T + 1) * GFPower - 1 downto 0),
    omega_poly => tb_omega_poly(RS_T * GFPower - 1 downto 0),
    syndrome => tb_syndrome(2 * GFPower * RS_T - 1 downto 0),
    reset_n => reset_n,
    clk => clk,
    blk_out => RS_Data_out(7 downto 0),
    blk_sync_out => RS_Data_out_Start;
  }

inst_RS_Decoder_top: RS_Decoder_top
  generic map (GFPower,
              RS_N,
              RS_T)
  port map {
    clk => clk,
    reset_n => reset_n,
    rs_data_in => rs_data_in(GFPower - 1 downto 0),
    rs_data_in_start => rs_data_in_start;
  }

end;
RS_Data_out => RS_Data_out[7 downto 0],
RS_Data_out_Start => RS_Data_out_Start,
rs_enable => rs_enable,
tb_errors_present => tb_errors_present,
tb_lambda_poly => tb_lambda_poly(RS_T - l) * GFPower - 1 downto 0),
tb_omega_poly => tb_omega_poly(RS_T - GFPower - 1 downto 0),
tb_StartChien => tb_StartChien,
tb Syndrome => tb Syndrome(2 * GFPower - RS_T - 1 downto 0),
tb Syndrome_calc_done => tb Syndrome_calc_done);
end RTL;