Modeling DRAM Access Based on Efficient Tiling in CNN Hardware Accelerators

seyyedi, sakineh; ersali salehi nasab, mostafa

doi:10.61882/jiaeee.22.2.160

Volume 22, Issue 2 (JIAEEE Vol.22 No.2 2025) Journal of Iranian Association of Electrical and Electronics Engineers 2025, 22(2): 160-171 | Back to browse issues page

‎ 10.61882/jiaeee.22.2.160

Mendeley

Zotero

RefWorks

seyyedi S, ersali salehi nasab M. Modeling DRAM Access Based on Efficient Tiling in CNN Hardware Accelerators. Journal of Iranian Association of Electrical and Electronics Engineers 2025; 22 (2) :160-171
URL: http://jiaeee.com/article-1-1773-en.html

Modeling DRAM Access Based on Efficient Tiling in CNN Hardware Accelerators

Sakineh Seyyedi

, Mostafa Ersali salehi nasab ^*

School of Electrical and Computer Engineering, University of Tehran

Abstract: (740 Views)

Artificial neural networks are a subset of machine learning inspired by the biological neural networks of the human brain and have the capability to learn. These networks are applied in various fields, including natural language processing, pattern recognition, image processing, computer vision, and many other areas. CNNs (Convolutional Neural Networks) are an example of these networks that have a layered structure, with convolution being their main operation. Due to the high volume of computations and the flow of data in these networks, there is an increased need for bandwidth and memory transfers. Recent researches have shown that the energy consumption and access time of external memory are 200x and 10x greater than internal memory respectively, which leads to increased energy consumption and an imbalance in the data path topology.
One of the main solutions to reduce energy consumption is to increase data reuse and reduce the number of accesses to external memory. Maximizing data usage reduces the number of data movements and memory accesses. One method for data reuse is loop-level scheduling and applying tiling techniques. This paper models the relationship between the number of accesses to external memory when using tiling. This model is presented as a mathematical formula that can determine the exact number of DRAM accesses based on network parameters and the tile size. Then, in an optimization problem, optimal parameters are obtained with the goal of minimizing the use of external memory and establishing the relationship between network configuration parameters and tile size.

Keywords: CNNs, Energy Consumption, DRAM, Data Reuse, Tiling.

Full-Text [PDF 1625 kb] (104 Downloads)

Type of Article: Research | Subject: Electronic
Received: 2024/11/25 | Accepted: 2025/02/6 | Published: 2025/08/15

References

1. [1] S. Genovese, "Artificial Intelligence: A Guide for Thinking Humans", ORDO, vol. 71, no. 1, pp. 444-449, 2020, doi: 10.1515/ordo-2021-0028. [DOI:10.1515/ordo-2021-0028]

2. [2] O. Campesato, "Artificial Intelligence, Machine Learning, and Deep Learning", Artif. Intell. Mach. Learn. Deep Learn., Feb. 2020, doi: 10.1515/9781683924654/HTML. [DOI:10.1515/9781683924654]

3. [3] Pourahangarian F, Kiani A, Karami A, Zanj B. ECG Arrhythmias Detection Using a New Intelligent System Based on Neural Networks and Wavelet Transform. Journal of Iranian Association of Electrical and Electronics Engineers 2012; 9 (1) :33-39

4. [4] Shahmiri A, Safabakhsh R, Dezhkam R. Automatic Farsi Typo Correction Using a Hybrid Neural Network. Journal of Iranian Association of Electrical and Electronics Engineers 2008; 5 (1) :16-29

5. [5] ChenYu-Hsin, EmerJoel, and SzeVivienne, "Eyeriss", ACM SIGARCH Comput. Archit. News, vol. 44, no. 3, pp. 367-379, Jun. 2016, doi: 10.1145/3007787.3001177. [DOI:10.1145/3007787.3001177]

6. [6] T. Choudhary, V. Mishra, A. Goswami, and J. Sarangapani, "A comprehensive survey on model compression and acceleration", Artif. Intell. Rev., vol. 53, no. 7, pp. 5113-5155, Oct. 2020, doi: 10.1007/S10462-020-09816-7/METRICS. [DOI:10.1007/s10462-020-09816-7]

7. [7] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)", Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 57, pp. 10-14, 2014, doi: 10.1109/ISSCC.2014.6757323. [DOI:10.1109/ISSCC.2014.6757323]

8. [8] S. Zheng et al., "Efficient Scheduling of Irregular Network Structures on CNN Accelerators", IEEE Trans. Comput. Des. Integr. Circuits Syst., vol. 39, no. 11, pp. 3408-3419, Nov. 2020, doi: 10.1109/TCAD.2020.3012215. [DOI:10.1109/TCAD.2020.3012215]

9. [9] Q. Nie and S. Malik, "MemFlow: Memory-Driven Data Scheduling with Datapath Co-Design in Accelerators for Large-Scale Inference Applications", IEEE Trans. Comput. Des. Integr. Circuits Syst., vol. 39, no. 9, pp. 1875-1888, Sep. 2020, doi: 10.1109/TCAD.2019.2925377. [DOI:10.1109/TCAD.2019.2925377]

10. [10] M. Alwani, H. Chen, M. Ferdman, and P. Milder, "Fused-layer CNN accelerators", in Proceedings of the Annual International Symposium on Microarchitecture, MICRO, IEEE Computer Society, Dec. 2016. doi: 10.1109/MICRO.2016.7783725. [DOI:10.1109/MICRO.2016.7783725]

11. [11] L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, "Origami: A convolutional network accelerator", Proc. ACM Gt. Lakes Symp. VLSI, GLSVLSI, vol. 20-22-May-, pp. 199-204, May 2015, doi: 10.1145/2742060.2743766. [DOI:10.1145/2742060.2743766]

12. [12] I. Dadras, S. Seydi, M. H. Ahmadilivani, J. Raik, and M. E. Salehi, "Fully-Fusible Convolutional Neural Networks for End-to-End Fused Architecture with FPGA Implementation", 2023 30th IEEE Int. Conf. Electron. Circuits Syst., pp. 1-5, Dec. 2023, doi: 10.1109/ICECS58634.2023.10382831. [DOI:10.1109/ICECS58634.2023.10382831]

13. [13] Q. Nie and S. Malik, "CNNFlow: Memory-driven Data Flow Optimization for Convolutional Neural Networks", ACM Trans. Des. Autom. Electron. Syst., vol. 28, no. 3, Feb. 2022, doi: 10.1145/3577017/ASSET/73AC8D40-245E-445B-B998-83087708C500/ASSETS/GRAPHIC/TODAES-2022-P-2217-F14.JPG. [DOI:10.1145/3577017]

14. [14] E. Valpreda et al., "HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow Architectures", Electron. 2022, Vol. 11, Page 2933, vol. 11, no. 18, p. 2933, Sep. 2022, doi: 10.3390/ELECTRONICS11182933. [DOI:10.3390/electronics11182933]

15. [15] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep Learning with Limited Numerical Precision", PMLR, pp. 1737-1746, Jun. 01, 2015. Accessed: Jul. 07, 2024. [Online]. Available: https://proceedings.mlr.press/v37/gupta15.html

16. [16] J. Li et al., "SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators", Proc. 2018 Des. Autom. Test Eur. Conf. Exhib. DATE 2018, vol. 2018-January, pp. 343-348, Apr. 2018, doi: 10.23919/DATE.2018.8342033. [DOI:10.23919/DATE.2018.8342033]

17. [17] J. Qiu et al., "Going deeper with embedded FPGA platform for convolutional neural network", FPGA 2016 - Proc. 2016 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 26-35, Feb. 2016, doi: 10.1145/2847263.2847265. [DOI:10.1145/2847263.2847265]

18. [18] X. Wei et al., "Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs", Proc. - Des. Autom. Conf., vol. Part 128280, Jun. 2017, doi: 10.1145/3061639.3062207. [DOI:10.1145/3061639.3062207]

19. [19] Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor", Proc. - Int. Symp. Comput. Archit., vol. 13-17-June-2015, pp. 92-104, Jun. 2015, doi: 10.1145/2749469.2750389. [DOI:10.1145/2749469.2750389]

20. [20] Y. Ma, Y. Cao, S. Vrudhula, and J. S. Seo, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA", IEEE Trans. Very Large Scale Integr. Syst., vol. 26, no. 7, pp. 1354-1367, Jul. 2018, doi: 10.1109/TVLSI.2018.2815603. [DOI:10.1109/TVLSI.2018.2815603]

21. [21] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks", FPGA 2015 - 2015 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 161-170, Feb. 2015, doi: 10.1145/2684746.2689060. [DOI:10.1145/2684746.2689060]

22. [22] F. Indirli, A. Erdem, and C. Silvano, "A Tile-based Fused-layer CNN Accelerator for FPGAs", ICECS 2020 - 27th IEEE Int. Conf. Electron. Circuits Syst. Proc., Nov. 2020, doi: 10.1109/ICECS49266.2020.9294981. [DOI:10.1109/ICECS49266.2020.9294981]

23. [23] H. Huang, X. Hu, X. Li, and X. Xiong, "An efficient loop tiling framework for convolutional neural network inference accelerators", IET Circuits, Devices Syst., vol. 16, no. 1, pp. 116-123, Jan. 2022, doi: 10.1049/CDS2.12091. [DOI:10.1049/cds2.12091]

24. [24] Y. Li, S. Ma, Y. Guo, R. Xu, and G. Chen, "Configurable CNN Accelerator Based on Tiling Dataflow", Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, vol. 2018-Novem, pp. 309-313, Jul. 2018, doi: 10.1109/ICSESS.2018.8663795. [DOI:10.1109/ICSESS.2018.8663795]

25. [25] Y. S. Lin, H. C. Lu, Y. Bin Tsao, Y. M. Chih, W. C. Chen, and S. Y. Chien, "GrateTile: Efficient Sparse Tensor Tiling for CNN Processing", IEEE Work. Signal Process. Syst. SiPS Des. Implement., vol. 2020-October, Oct. 2020, doi: 10.1109/SIPS50750.2020.9195243. [DOI:10.1109/SiPS50750.2020.9195243]

26. [26] "DLcoursera/Convolutional Neural Networks/week01/Convolution+model+-+Step+by+Step+-+v2.ipynb at master · csaybar/DLcoursera · GitHub", Accessed: Dec. 15, 2024. [Online]. Available: https://github.com/csaybar/DLcoursera/blob/master/Convolutional Neural Networks/week01/Convolution%2Bmodel%2B-%2BStep%2Bby%2BStep%2B-%2Bv2.ipynb

27. [27] A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications", Apr. 2017, Accessed: Dec. 02, 2024. [Online]. Available: https://arxiv.org/abs/1704.04861v1

28. [28] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks", pp. 4510-4520, 2018. [DOI:10.1109/CVPR.2018.00474]

29. [29] B. Koonce, "MobileNetV3", Convolutional Neural Networks with Swift Tensorflow, pp. 125-144, 2021, doi: 10.1007/978-1-4842-6168-2_11. [DOI:10.1007/978-1-4842-6168-2_11]

30. [30] Asadi Amiri S, Andi M. Classification of Pistachio Varieties Using MobileNet Deep Learning Model. Journal of Iranian Association of Electrical and Electronics Engineers 2025; 22 (1) :133-140 [DOI:10.61186/jiaeee.22.1.133]

31. [31] A. Ghorbani and M. Amon, "Implementation of convolutional neural network accelerator on FPGA using high-level synthesis method," *National Conference on Electrical and Electronics Industry*, Dec. 3, 2020. Accessed: Jan. 15, 2025.

32. [32] Aghaei N, Akbarizadeh G, Kosarian A. Using ShuffleNet to design a deep semantic segmentation model for oil spill detection in synthetic aperture radar images. Journal of Iranian Association of Electrical and Electronics Engineers 2022; 19 (3) :131-144 [DOI:10.52547/jiaeee.19.3.131]

Send email to the article author

Rights and permissions
	This Journal is an open access Journal Licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. (CC BY NC 4.0)

Designed & Developed by: Yektaweb

English title

Related Websites

Site Keywords

Vote