Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency

Title:

Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency

Masadeh, Mahmoud ORCID: https://orcid.org/0000-0001-7447-1276, Hasan, Osman and Tahar, Sofiène (2019) Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency. IEEE Access, 7 . pp. 147129-147142. ISSN 2169-3536

[thumbnail of Masadeh-IEEE Access-2019.pdf]

Preview

Text (application/pdf)
Masadeh-IEEE Access-2019.pdf - Published Version
Available under License Creative Commons Attribution.

3MB

Official URL: http://dx.doi.org/10.1109/ACCESS.2019.2946513

Abstract

The Multiply-Accumulate Unit (MAC) is an integral computational component of all digital signal processing (DSP) architectures and thus has a significant impact on their speed and power dissipation. Due to an extraordinary explosion in the number of battery-powered “Internet of Things” (IoT) devices, the need for reducing the power consumption of DSP architectures has tremendously increased. Approximate computing (AxC) has been proposed as a potential solution for this problem targeting error-resilient applications. In this paper, we present a novel FPGA implementation for input-aware energy-efficient 8-bit approximate MAC (AxMAC) unit that reduces its power consumption by: performing multiplication operation approximately, or approximating the input operands then replacing multiplication by a simple shift operation. We propose an input-aware conditional block to bypass operands multiplication by (1) zero forwarding for zero-value operands, (2) judiciously approximating 43.8% of inputs into power-of-2 values, and (3) replacing the multiplication of power-of-2 operands by a simple shift operation. Experimental results show that these simplification techniques reduce delay, power and energy consumption with an acceptable quality degradation. We evaluate the effectiveness of the proposed AxMAC units on two image processing applications, i.e., image blending and filtering, and a logistic regression classification application. These applications demonstrate a negligible quality loss, with 66.6% energy reduction and 5% area overhead.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:	Article
Refereed:	Yes
Authors:	Masadeh, Mahmoud and Hasan, Osman and Tahar, Sofiène
Journal or Publication:	IEEE Access
Date:	2019
Funders:	Concordia Open Access Author Fund
Digital Object Identifier (DOI):	10.1109/ACCESS.2019.2946513
Keywords:	Approximate computing, approximate multiplier, approximate multiple-accumulate unit (AxMAC), input-aware approximation, image processing, FPGA.
ID Code:	986093
Deposited By:	Krista Alexander
Deposited On:	13 Nov 2019 22:09
Last Modified:	13 Nov 2019 22:09

References:

1. J. Von Neumann, "Probabilistic logics and the synthesis of reliable organisms from unreliable components" in Automata Studies, Princeton, NJ, USA:Princeton Univ. Press, vol. 34, pp. 43-98, 1956.

2. A. K. Mishra, R. Barik, S. Paul, "iACT: A software-hardware framework for understanding the scope of approximate computing", Proc. Workshop Approx. Comput. Across Syst. Stack, pp. 52, 2014.

3. J. Bornholt, T. Mytkowicz, K. McKinley, " UnCertain < T >: A first-order type for uncertain data ", SIGPLAN Notices, vol. 49, pp. 51-66, 2014.

4. H. Nakahara, T. Sasao, "A deep convolutional neural network based on nested residue number system", Proc. Int. Conf. Field Program. Logic Appl., pp. 1-6, 2015.

5. CEVA NeuPro a Family of AI Processors for Deep Learning at the Edge, Aug. 2019, [online] Available: https://www.ceva-dsp.com/product/ceva-neupro/.

6. M. Masadeh, O. Hasan, S. Tahar, "Comparative study of approximate multipliers", Proc. ACM Great Lakes Symp. VLSI, pp. 415-418, 2018.

7. M. Masadeh, O. Hasan, S. Tahar, "Comparative study of approximate multipliers", CoRR, vol. abs/1803, pp. 1-23, Mar. 2018, [online] Available: http://arxiv.org/abs/1803.06587.

8. R. P. P. Singh, P. Kumar, B. Singh, "Performance analysis of fast adders using VHDL", Proc. Int. Conf. Adv. Recent Technol. Commun. Comput., pp. 189-193, 2009.

9. L.-H. Chen, O. T.-C. Chen, T.-Y. Wang, Y.-C. Ma, "A multiplication-accumulation computation unit with optimized compressors and minimized switching activities", Proc. Int. Symp. Circuits Syst., pp. 6118-6121, 2005.

10. J.-K. Chang, H. Lee, C.-S. Choi, "A power-aware variable-precision multiply-accumulate unit", Proc. Int. Symp. Commun. Inf. Technol., pp. 1336-1339, 2009.

11. M. S. Kumar, D. A. Kumar, P. Samundiswary, "Design and performance analysis of multiply-accumulate (MAC) unit", Proc. Int. Conf. Circuits Power Comput. Technol., pp. 1084-1089, 2014.

12. S. Dutt, A. Chauhan, R. Bhadoriya, S. Nandi, G. Trivedi, "A high-performance energy-efficient hybrid redundant MAC for error-resilient applications", Proc. Int. Conf. VLSI Design, pp. 351-356, 2015.

13. D. Esposito, A. G. M. Strollo, M. Alioto, "Low-power approximate MAC unit", Proc. 13th Conf. Ph.D. Res. Microelectron. Electron., pp. 81-84, 2017.

14. A. Cilardo, D. D. Caro, N. Petra, F. Caserta, N. Mazzocca, E. Napoli, A. G. M. Strollo, "High speed speculative multipliers based on speculative carry-save tree", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 61, no. 12, pp. 3426-3435, Dec. 2014.

15. I. Qiqieh, R. Shafik, G. Tarawneh, D. Sokolov, A. Yakovlev, "Energy-efficient approximate multiplier design using bit significance-driven logic compression", Proc. Design Autom. Test Eur. Conf., pp. 7-12, 2017.

16. N. Petra, D. De Caro, V. Garofalo, E. Napoli, A. G. M. Strollo, "Design of fixed-width multipliers with linear compensation function", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 58, no. 5, pp. 947-960, May 2011.

17. M. de la Guia Solaz, W. Han, R. Conway, "A flexible low power DSP with a programmable truncated multiplier", IEEE Trans. Circuits Syst., vol. 59, no. 11, pp. 2555-2568, Nov. 2012.

18. D. Mohapatra, V. K. Chippa, A. Raghunathan, K. Roy, "Design of voltage-scalable meta-functions for approximate computing", Proc. Design Autom. Test Eur., pp. 1-6, 2011.

19. A. Raha, V. Raghunathan, "qLUT: Input-aware quantized table lookup for energy-efficient approximate accelerators", ACM Trans. Embedded Comput. Syst., vol. 16, no. 5s, 2017.

20. C. Alvarez, J. Corbal, M. Valero, "Fuzzy memoization for floating-point multimedia applications", IEEE Trans. Comput., vol. 54, no. 7, pp. 922-927, Jul. 2005.

21. Y. Voronenko, M. Püschel, "Multiplierless multiple constant multiplication", ACM Trans. Algorithms, vol. 3, no. 2, May 2007.

22. H. T. Nguyen, A. Chattejee, "Number-splitting with shift-and-add decomposition for power and hardware optimization in linear DSP synthesis", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 4, pp. 419-424, Aug. 2000.

23. A. Raha, H. Jayakumar, V. Raghunathan, "Input-based dynamic reconfiguration of approximate arithmetic units for video encoding", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 3, pp. 846-857, Mar. 2016.

24. A. Ranjan, A. Raha, S. Venkataramani, K. Roy, A. Raghunathan, "ASLAN: Synthesis of approximate sequential circuits", Proc. Design Autom. Test Eur. Conf. Exhib. (DATE), pp. 1-6, 2014.

25. S. Hashemi, R. I. Bahar, S. Reda, "DRUM: A dynamic range unbiased multiplier for approximate applications", Proc. Int. Conf. Computer-Aided Design (ICCAD), pp. 418-425, 2015.

26. M. Imani, D. Peroni, T. Rosing, "CFPU: Configurable floating point multiplier for energy-efficient computing", Proc. Design Autom. Conf., pp. 1-6, 2017.

27. M. Imani, R. Garcia, S. Gupta, T. Rosing, "RMAC: Runtime configurable floating point multiplier for approximate computing", Proc. Int. Symp. Low Power Electron. Design, pp. 12-1-12-6, 2018.

28. P. Kulkarni, P. Gupta, M. Ercegovac, "Trading accuracy for power with an underdesigned multiplier architecture", Proc. Int. Conf. VLSI Design, pp. 346-351, 2011.

29. K. Y. Kyaw, W. L. Goh, K. S. Yeo, "Low-power high-speed multiplier for error-tolerant application", Proc. Int. Conf. Electron Devices Solid-State Circuits, pp. 1-4, 2010.

30. T. Yang, T. Ukezono, T. Sato, "Low-power and high-speed approximate multiplier design with a tree compressor", Proc. Int. Conf. Comput. Design, pp. 89-96, 2017.

31. Z. Yang, J. Han, F. Lombardi, "Approximate compressors for error-resilient multiplier design", Proc. Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst., pp. 183-186, 2015.

32. Virtex-6 XC6VLX75T FPGA, Aug. 2019, [online] Available: https://www.digikey.com/product-detail/en/xilinx-inc/XC6VLX75T-1FF484I/XC6VLX75T-1FF484I-ND/2500879.

33. Mentor Graphics Modelsim, Aug. 2019, [online] Available: https://www.mentor.com/company/higher_ed/modelsim-student-edition.

34. Aug. 2019, [online] Available: https://www.xilinx.com/support/documentation/sw_manuals/xilinx11/ug733.pdf.

35. Xilinx Integrated Synthesis Environment, Aug. 2019, [online] Available: https://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.html.

36. V. Gupta, D. Mohapatra, A. Raghunathan, K. Roy, "Low-power digital signal processing using approximate adders", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124-137, Jan. 2013.

37. W.-C. Yeh, C.-W. Jen, "High-speed and low-power split-radix FFT", IEEE Trans. Signal Process., vol. 51, no. 3, pp. 864-874, Mar. 2003.

38. H. A. F. Almurib, T. N. Kumar, F. Lombardi, "Inexact designs for approximate low power addition by cell replacement", Proc. Design Autom. Test Eur., pp. 660-665, 2016.

39. J. Liang, J. Han, F. Lombardi, "New metrics for the reliability of approximate and probabilistic adders", IEEE Trans. Comput., vol. 62, no. 9, pp. 1760-1771, Sep. 2013.

40. J. M. Rabaey, Digital Integrated Circuits: A Design Perspective, Upper Saddle River, NJ, USA:Prentice-Hall, 1996.

41. Logistic Regression, Aug. 2019, [online] Available: https://www.coursera.org/learn/machine-learning.

42. C. Solomon, Fundamentals of Digital Image Processing: A Practical Approach with Examples in MATLAB, Hoboken, NJ, USA:Wiley, 2011.

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Altmetric

Research related to the current document (at the CORE website)

Spectrum Research Repository

Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency

Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency

Abstract

References: