Masadeh, Mahmoud ORCID: https://orcid.org/0000-0001-7447-1276, Hasan, Osman and Tahar, Sofiène (2019) Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency. IEEE Access, 7 . pp. 147129-147142. ISSN 2169-3536
Preview |
Text (application/pdf)
3MBMasadeh-IEEE Access-2019.pdf - Published Version Available under License Creative Commons Attribution. |
Official URL: http://dx.doi.org/10.1109/ACCESS.2019.2946513
Abstract
The Multiply-Accumulate Unit (MAC) is an integral computational component of all digital signal processing (DSP) architectures and thus has a significant impact on their speed and power dissipation. Due to an extraordinary explosion in the number of battery-powered “Internet of Things” (IoT) devices, the need for reducing the power consumption of DSP architectures has tremendously increased. Approximate computing (AxC) has been proposed as a potential solution for this problem targeting error-resilient applications. In this paper, we present a novel FPGA implementation for input-aware energy-efficient 8-bit approximate MAC (AxMAC) unit that reduces its power consumption by: performing multiplication operation approximately, or approximating the input operands then replacing multiplication by a simple shift operation. We propose an input-aware conditional block to bypass operands multiplication by (1) zero forwarding for zero-value operands, (2) judiciously approximating 43.8% of inputs into power-of-2 values, and (3) replacing the multiplication of power-of-2 operands by a simple shift operation. Experimental results show that these simplification techniques reduce delay, power and energy consumption with an acceptable quality degradation. We evaluate the effectiveness of the proposed AxMAC units on two image processing applications, i.e., image blending and filtering, and a logistic regression classification application. These applications demonstrate a negligible quality loss, with 66.6% energy reduction and 5% area overhead.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Article |
Refereed: | Yes |
Authors: | Masadeh, Mahmoud and Hasan, Osman and Tahar, Sofiène |
Journal or Publication: | IEEE Access |
Date: | 2019 |
Funders: |
|
Digital Object Identifier (DOI): | 10.1109/ACCESS.2019.2946513 |
Keywords: | Approximate computing, approximate multiplier, approximate multiple-accumulate unit (AxMAC), input-aware approximation, image processing, FPGA. |
ID Code: | 986093 |
Deposited By: | Krista Alexander |
Deposited On: | 13 Nov 2019 22:09 |
Last Modified: | 13 Nov 2019 22:09 |
References:
1. J. Von Neumann, "Probabilistic logics and the synthesis of reliable organisms from unreliable components" in Automata Studies, Princeton, NJ, USA:Princeton Univ. Press, vol. 34, pp. 43-98, 1956.2. A. K. Mishra, R. Barik, S. Paul, "iACT: A software-hardware framework for understanding the scope of approximate computing", Proc. Workshop Approx. Comput. Across Syst. Stack, pp. 52, 2014.
3. J. Bornholt, T. Mytkowicz, K. McKinley, " UnCertain < T >: A first-order type for uncertain data ", SIGPLAN Notices, vol. 49, pp. 51-66, 2014.
4. H. Nakahara, T. Sasao, "A deep convolutional neural network based on nested residue number system", Proc. Int. Conf. Field Program. Logic Appl., pp. 1-6, 2015.
5. CEVA NeuPro a Family of AI Processors for Deep Learning at the Edge, Aug. 2019, [online] Available: https://www.ceva-dsp.com/product/ceva-neupro/.
6. M. Masadeh, O. Hasan, S. Tahar, "Comparative study of approximate multipliers", Proc. ACM Great Lakes Symp. VLSI, pp. 415-418, 2018.
7. M. Masadeh, O. Hasan, S. Tahar, "Comparative study of approximate multipliers", CoRR, vol. abs/1803, pp. 1-23, Mar. 2018, [online] Available: http://arxiv.org/abs/1803.06587.
8. R. P. P. Singh, P. Kumar, B. Singh, "Performance analysis of fast adders using VHDL", Proc. Int. Conf. Adv. Recent Technol. Commun. Comput., pp. 189-193, 2009.
9. L.-H. Chen, O. T.-C. Chen, T.-Y. Wang, Y.-C. Ma, "A multiplication-accumulation computation unit with optimized compressors and minimized switching activities", Proc. Int. Symp. Circuits Syst., pp. 6118-6121, 2005.
10. J.-K. Chang, H. Lee, C.-S. Choi, "A power-aware variable-precision multiply-accumulate unit", Proc. Int. Symp. Commun. Inf. Technol., pp. 1336-1339, 2009.
11. M. S. Kumar, D. A. Kumar, P. Samundiswary, "Design and performance analysis of multiply-accumulate (MAC) unit", Proc. Int. Conf. Circuits Power Comput. Technol., pp. 1084-1089, 2014.
12. S. Dutt, A. Chauhan, R. Bhadoriya, S. Nandi, G. Trivedi, "A high-performance energy-efficient hybrid redundant MAC for error-resilient applications", Proc. Int. Conf. VLSI Design, pp. 351-356, 2015.
13. D. Esposito, A. G. M. Strollo, M. Alioto, "Low-power approximate MAC unit", Proc. 13th Conf. Ph.D. Res. Microelectron. Electron., pp. 81-84, 2017.
14. A. Cilardo, D. D. Caro, N. Petra, F. Caserta, N. Mazzocca, E. Napoli, A. G. M. Strollo, "High speed speculative multipliers based on speculative carry-save tree", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 61, no. 12, pp. 3426-3435, Dec. 2014.
15. I. Qiqieh, R. Shafik, G. Tarawneh, D. Sokolov, A. Yakovlev, "Energy-efficient approximate multiplier design using bit significance-driven logic compression", Proc. Design Autom. Test Eur. Conf., pp. 7-12, 2017.
16. N. Petra, D. De Caro, V. Garofalo, E. Napoli, A. G. M. Strollo, "Design of fixed-width multipliers with linear compensation function", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 58, no. 5, pp. 947-960, May 2011.
17. M. de la Guia Solaz, W. Han, R. Conway, "A flexible low power DSP with a programmable truncated multiplier", IEEE Trans. Circuits Syst., vol. 59, no. 11, pp. 2555-2568, Nov. 2012.
18. D. Mohapatra, V. K. Chippa, A. Raghunathan, K. Roy, "Design of voltage-scalable meta-functions for approximate computing", Proc. Design Autom. Test Eur., pp. 1-6, 2011.
19. A. Raha, V. Raghunathan, "qLUT: Input-aware quantized table lookup for energy-efficient approximate accelerators", ACM Trans. Embedded Comput. Syst., vol. 16, no. 5s, 2017.
20. C. Alvarez, J. Corbal, M. Valero, "Fuzzy memoization for floating-point multimedia applications", IEEE Trans. Comput., vol. 54, no. 7, pp. 922-927, Jul. 2005.
21. Y. Voronenko, M. Püschel, "Multiplierless multiple constant multiplication", ACM Trans. Algorithms, vol. 3, no. 2, May 2007.
22. H. T. Nguyen, A. Chattejee, "Number-splitting with shift-and-add decomposition for power and hardware optimization in linear DSP synthesis", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 4, pp. 419-424, Aug. 2000.
23. A. Raha, H. Jayakumar, V. Raghunathan, "Input-based dynamic reconfiguration of approximate arithmetic units for video encoding", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 3, pp. 846-857, Mar. 2016.
24. A. Ranjan, A. Raha, S. Venkataramani, K. Roy, A. Raghunathan, "ASLAN: Synthesis of approximate sequential circuits", Proc. Design Autom. Test Eur. Conf. Exhib. (DATE), pp. 1-6, 2014.
25. S. Hashemi, R. I. Bahar, S. Reda, "DRUM: A dynamic range unbiased multiplier for approximate applications", Proc. Int. Conf. Computer-Aided Design (ICCAD), pp. 418-425, 2015.
26. M. Imani, D. Peroni, T. Rosing, "CFPU: Configurable floating point multiplier for energy-efficient computing", Proc. Design Autom. Conf., pp. 1-6, 2017.
27. M. Imani, R. Garcia, S. Gupta, T. Rosing, "RMAC: Runtime configurable floating point multiplier for approximate computing", Proc. Int. Symp. Low Power Electron. Design, pp. 12-1-12-6, 2018.
28. P. Kulkarni, P. Gupta, M. Ercegovac, "Trading accuracy for power with an underdesigned multiplier architecture", Proc. Int. Conf. VLSI Design, pp. 346-351, 2011.
29. K. Y. Kyaw, W. L. Goh, K. S. Yeo, "Low-power high-speed multiplier for error-tolerant application", Proc. Int. Conf. Electron Devices Solid-State Circuits, pp. 1-4, 2010.
30. T. Yang, T. Ukezono, T. Sato, "Low-power and high-speed approximate multiplier design with a tree compressor", Proc. Int. Conf. Comput. Design, pp. 89-96, 2017.
31. Z. Yang, J. Han, F. Lombardi, "Approximate compressors for error-resilient multiplier design", Proc. Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst., pp. 183-186, 2015.
32. Virtex-6 XC6VLX75T FPGA, Aug. 2019, [online] Available: https://www.digikey.com/product-detail/en/xilinx-inc/XC6VLX75T-1FF484I/XC6VLX75T-1FF484I-ND/2500879.
33. Mentor Graphics Modelsim, Aug. 2019, [online] Available: https://www.mentor.com/company/higher_ed/modelsim-student-edition.
34. Aug. 2019, [online] Available: https://www.xilinx.com/support/documentation/sw_manuals/xilinx11/ug733.pdf.
35. Xilinx Integrated Synthesis Environment, Aug. 2019, [online] Available: https://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.html.
36. V. Gupta, D. Mohapatra, A. Raghunathan, K. Roy, "Low-power digital signal processing using approximate adders", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124-137, Jan. 2013.
37. W.-C. Yeh, C.-W. Jen, "High-speed and low-power split-radix FFT", IEEE Trans. Signal Process., vol. 51, no. 3, pp. 864-874, Mar. 2003.
38. H. A. F. Almurib, T. N. Kumar, F. Lombardi, "Inexact designs for approximate low power addition by cell replacement", Proc. Design Autom. Test Eur., pp. 660-665, 2016.
39. J. Liang, J. Han, F. Lombardi, "New metrics for the reliability of approximate and probabilistic adders", IEEE Trans. Comput., vol. 62, no. 9, pp. 1760-1771, Sep. 2013.
40. J. M. Rabaey, Digital Integrated Circuits: A Design Perspective, Upper Saddle River, NJ, USA:Prentice-Hall, 1996.
41. Logistic Regression, Aug. 2019, [online] Available: https://www.coursera.org/learn/machine-learning.
42. C. Solomon, Fundamentals of Digital Image Processing: A Practical Approach with Examples in MATLAB, Hoboken, NJ, USA:Wiley, 2011.
Repository Staff Only: item control page