IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 4, APRIL 2012
Variation Trained Drowsy Cache (VTD-Cache): A History Trained Variation Aware Drowsy Cache for Fine Grain Voltage Scaling
Avesta Sasan, Member, IEEE, Kiarash Amiri, Student Member, IEEE, Houman Homayoun, Member, IEEE, Ahmed M. Eltawil, Member, IEEE, and Fadi J. Kurdahi, Fellow, IEEE
Abstract—In this paper we present the “Variation Trained Drowsy Cache” (VTD-Cache) architecture. VTD-Cache allows for a significant reduction in power consumption while addressing reliability issues raised by memory cell process variability. By managing voltage scaling at a very fine granularity, each cache way can be sourced at a different voltage where the selection of voltage levels depends on both the vulnerability of the memory cells in that cache way to process variation and the likelihood of access to that cache location. After a short training period, the proposed architecture will micro-tune the cache, allowing significant power reduction with negligible increase in the number of misses. In addition, the proposed architecture actively monitors the access pattern and reconfigures the supply voltage setting to adapt to the execution pattern of the program. The novel and modular architecture of the VTD-Cache and its associated controller makes it easy to be implemented in memory compilers with a small area and power overhead. In a case study, the SimpleScalar simulation of the proposed 32 kB cache architecture reports over 57% reduction in power consumption over standard SPEC2000 integer benchmarks while incurring an area overhead of less than 4% and an execution time penalty smaller than 1%. Index Terms—Cache, drowsy cache, fault tolerance, leakage, low power, manufacturing defects, power efficient, process variation, static random access memory (SRAM), technology scaling, voltage scaling.
I. INTRODUCTION
R
ECENT studies [1]–[3] suggest that static power consumption is
References: [1] S. Borkar, “Design challenges of technology scaling,” IEEE Micro, vol. 19, no. 4, pp. 23–29, Jul.–Aug. 1999. [2] Y.-F. Tsai, D. Duarte, N. Vijaykrishnan, and M. J. Irwin, “Implications of technology scaling on leakage reduction techniques,” in Proc. Des. Autom. Conf., Jun. 2003, pp. 187–190. [3] M. Anis, “Subthreshold leakage current: Challenges and solutions,” in Proc. 15th Int. Conf. Microelectron. (ICM), Dec. 2003, pp. 77–80. [4] A. Agarwal, B. C. Paul, S. Mukhopadhyay, and K. Roy, “Process variation in embedded memories: Failure analysis and variation aware architecture,” IEEE Trans. Solid-State Circuits, vol. 40, no. 9, pp. 1804–1814, Sep. 2005. [5] S. R. Nassif, “Modeling and analysis of manufacturing variation,” in Proc. CICC, 2001, pp. 223–228. [6] Bhavnagarwala, X. Tang, and J. D. Meindl, “The impact of intrinsic device fluctuation on CMOS SRAM cell stability,” IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 658–665, Apr. 2001. [7] A. Sasan, H. Homayoun, A. Eltawil, and F. J. Kurdahi, “Process variation aware SRAM/cache for aggressive voltage-frequency scaling,” in Proc. DATE, pp. 911–916. [8] A. Sasan (Mohammad A Makhzan), A. Khajeh, A. Eltawil, and F. Kurdahi, “Limits of voltage scaling for caches utilizing fault tolerant techniques,” in Proc. ICCD, pp. 488–495. [9] C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S. L. Lu, “Trading off cache capacity for reliability to enable low voltage operation,” in Proc. ISCA, 2008, pp. 203–214. [10] H. Mahmoodi, H. Mahmoodi, and K. Roy, “Modeling of failure probability and statistical design of SRAM array for yield enhancement in nano-scaled cmos,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 24, no. 12, pp. 1859–1880, Dec. 2003. [11] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, “Statistical design and optimization of SRAM cell for yield enhancement,” in Proc. Int. Conf. Comput.-Aided Des. (ICACD), 2004, pp. 10–13. [12] J. P. Kulkarni, K. Kim, and K. Roy, “A 160 mV robust schmitt trigger based subthreshold SRAM,” IEEE J. Solid-State Circuits, vol. 42, no. 10, pp. 2303–2313, Oct. 2007. [13] K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge, “Drowsy caches: Simple techniques for reducing leakage power,” in Proc. 29th Annu. Int. Symp. Comput. Arch. , 2002, pp. 148–157. [14] SimpleScalar LLC, Ann Arbor, MI, “SimpleScalarTM simulator,” 2003. [Online]. Available: http://www.simplescalar.com/ [15] J. Zushi, G. Zeng, H. Tomiyama, H. Takada, and K. Inoue, “Improved policies for drowsy caches in embedded processors,” in Proc. 4th IEEE Int. Symp. Electron. Des., Test, Appl. (DELTA), Jan. 2008, pp. 362–367. [16] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache decay: Exploiting generational behavior to reduce cache leakage power,” in Proc. Int. Symp. Comput. Arch., 2001, pp. 240–251. [17] H. Zhou, M. C. Toburen, E. Rotenberg, and T. M. Conte, “Adaptive mode-control: A static-power-efficient cache design,” in Proc. Int. Conf. Parallel Arch. Compilation Techn., 2001, pp. 61–70. [18] A. Sasan (Mohammad A Makhzan), A. Khajeh, A. Eltawil, and F. Kurdahi, “Limits of voltage scaling for caches utilizing fault tolerant techniques,” in Proc. ICCD, 2007, pp. 488–495. [19] K. Ishimaru, “45 nm/32 nm CMOS ‘challenge and perspective’,” in Proc. 37th Euro. Solid-State Device Res. Conf. (ESSDERC), Sep. 2007, pp. 32–35. [20] A. Agarwal, K. Roy, and T. N. Vijaykumar, “Exploring high bandwidth pipelined cache architecture for scaled technology,” in Proc. Des., Autom. Test Eur. Conf. Exhibition, 2003, pp. 778–783. [21] M. A. Makhzan (A. Sasan), A. Khajeh, A. Eltawil, and F. J. Kurdahi, “A low power JPEG2000 encoder with iterative and fault tolerant error concealment,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 6, pp. 827–837, Jun. 2009. [22] K. Flautner, K. Nam Sung, S. Martin, D. Blaauw, and T. Mudge, “Drowsy caches: Simple techniques for reducing leakage power,” in Proc. 29th Annu. Int. Symp. Comput. Arch. , 2002, pp. 148–157. [23] H. Homayoun, A. Sasan (M. A. Makhzan), and A. V. Veidenbaum, “Multiple sleep mode leakage control for cache peripheral circuits in embedded processors,” in Proc. CASES, pp. 197–206. [24] M. A. Lucente, C. H. Harris, and R. M. Muir, “Memory system reliability improvement through associative cache redundancy,” in Proc. IEEE Custom Integr. Circuits Conf., May 1990, pp. 19.6/1–19.6/4. [25] Z. Bo, D. Blaauw, D. Sylvester, and K. Flautner, “The limit of dynamic voltage scaling and insomniac dynamic voltage scaling,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 11, pp. 1239–1252, Nov. 2005. [26] X. Tang, V. K. De, and J. D. Meindl, “Intrinsic MOSFET parameter fluctuations due to random dopant placement,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 5, no. 4, pp. 369–376, Dec. 1997. [27] K. Ishimaru, “45 nm/32 nm CMOS ‘challenge and perspective’,” in Proc. 37th Eur. Solid-State Device Res. Conf. (ESSDERC), Sep. 2007, pp. 32–35. [28] R. K. Krishnarnurthy, A. Alvandpour, V. De, and S. Borkar, “High-performance and low-power challenges for sub-70 nm microprocessor circuits,” in Proc. IEEE Custom Integr. Circuits Conf., 2002, pp. 125–128. [29] K. Kuhn, “32nm SOC process, getting many things right at once,” Mar. 26, 2010. [Online]. Available: http://blogs.intel.com/technology/ 2010/03/32nm_soc_process_an_analogy_wi.php Avesta Sasan (Mohammad A. Makhzan) (M’05) received the B.S. degree (summa cum laude) in computer engineering and the M.S. and Ph.D. degrees in electrical engineering from University of California Irvine, Irvine, in 2005, 2006, and 2010, respectively. He is currently with Broadcom Corporation. His research interests include low power design, process variation aware architectures, fault tolerant computing systems, nano-electronic power and device modeling, VLSI signal processing, processor power and reliability optimization and logic-architecture-device co-design. His latest publication and research updates can be found on http://www.avestasasan.com. Kiarash Amiri (S’10) received the B.Sc. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 2003, and the M.Sc. degree in electrical engineering from University of Southern California, Los Angeles, in 2006. He is currently pursuing the Ph.D. degree in electrical engineering from the Department of Electrical Engineering and Computer Science, University of California, Irvine. His research interests include multimedia data compression, low power video and image coding, process variation aware system design, and fault tolerant computing. Houman Homayoun (M’04) received the B.S. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 2003, the M.S. degree in computer engineering from University of Victoria, Canada, in 2005, and the Ph.D. degree in computer science from University of California, Irvine, Irvine, in 2010. Dr. Homayoun was the recipient of the four-years UCI/ICS chair fellowship. His research interests include power-temperature and reliability-aware memory and processor design optimizations and spans the areas of computer architecture and VLSI circuit design. The results of his research were published in top-rated conferences including ISLPED, DAC, DATE, HiPEAC, CASES, ICCD, CF, and LCTES. 642 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 4, APRIL 2012 Ahmed M. Eltawil (M’97) received the Doctorate degree from the University of California, Los Angeles, in 2003 and the M.Sc. and B.Sc. degrees (with honors) from Cairo University, Giza, Egypt, in 1999 and 1997, respectively. He is an Associate Professor with the University of California, Irvine, where he has been with the Department of Electrical Engineering and Computer Science, University of California, Irvine, since 2005. He is the founder and director of the Wireless Systems and Circuits Laboratory (http://newport.eecs.uci.edu/ ~aeltawil/), a member laboratory of the Center for Pervasive Communications and Computing (CPCC). He also holds a visiting professorship with King Saud University, Saudi Arabia. His current research interests include low power digital circuit and signal processing architectures for wireless communication systems with a focus on physical layer design where he has published over 60 technical papers on the subject, including four book chapters. Dr. Eltawil has been on the technical program committees for numerous workshops, symposia, and conferences in the area of VLSI and communication system design. He has received several distinguished awards, including the NSF CAREER Award in 2010 supporting his research in low power systems, as well as the Best Paper Award in 2006 at ISQED. Since 2006, he has been a member of the Association of Public Safety Communications Officials (APCO) and has been actively involved in efforts towards integrating cognitive and software defined radio technology in critical first responder communication networks. Fadi J. Kurdahi (M’87–SM’03–F’05) received the Ph.D. degree from the University of Southern California, Los Angeles, in 1987. Since then, he has been a faculty member with the Department of Electrical and Computer Engineering, University of California, Irvine (UCI), where he conducts research in the areas of computer-aided design of VLSI circuits, high-level synthesis, and design methodology of large scale systems, and serves as the Associate Director for the Center for Embedded Computer Systems (CECS). Dr. Kurdahi was an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II, Area Editor in IEEE Design and Test for reconfigurable computing, and served as program chair, general chair, or on program committees of several workshops, symposia, and conferences in the area of CAD, VLSI, and system design. He was a recipient of the Best Paper Award for the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS in 2002, the Best Paper Award in 2006 at ISQED, and four other Distinguished Paper Awards at DAC, EuroDAC, ASP- DAC, and ISQED. He also received the Distinguished Alumnus Award from this Alma Mater, the American University of Beirut in 2008. He is a fellow of the AAAS.