MARS Lab

Recent Research Highlights:
RF Interconnect for On-Chip Communication
Chip multiprocessors (CMPs) provide the opportunity to integrate cooperating cores onto a single piece of silicon to exploit thread-level parallelism. As the number of cores integrated onto a die increases, the potential for even higher throughput is possible, but the communication architecture between cores must be able to manage both the increase in bandwidth and the increase in latency to communicate between distant points on the die. Poor wire scaling has complicated things further, pushing the need to explore alternative interconnects. One promising alternative is radio frequency (RF) interconnect [15], which is fully compatible with super-scaled CMOS technology but has not been considered for on-chip communication in CMP designs.
We explore the use of multi-band RF interconnect with signal propagation at the speed of light to provide shortcuts in a many core network-on-chip (NoC) mesh topology [16]. Using real-world designs and ITRS projections, we investigate the costs associated with this technology, and examine the latency and bandwidth benefits that it can provide. Assuming a 400mm2 die, we demonstrate that in exchange for a 0.1% area overhead on the active layer, RF-I can provide an average 13% (max 18%) boost in application performance, corresponding to an average 22% (max 24%) reduction in packet latency. We observe that RF access points may become traffic bottlenecks when many packets try to use the RF at once, and present a dynamic strategy that adapts RF-I utilization at runtime to actively combat this congestion.
We have further explored dynamically configured RF shortcuts .that can adapt to changes in interconnect demand. With this, we demonstrate a substantial power savings in the NoC by reducing conventional interconnect bandwidth and by providing application-specific bandwidth where it is required using RF-I [12]. Finally, we have recently explored a CMP communication architecture that leverages wireless communication rather than propagating RF over a transmission line [9].
Interactive Virtual Worlds
The immersive nature of interactive entertainment (IE) relies on rapid, dynamic content creation and realistic visual effects . we have been working to aggressively push the boundaries of what IE can provide at the application-level, and then find ways to satisfy the increase in demand with customized hardware.
We have had initial success in Physics-based animation (PBA), one component of IE applications. We have already released several revisions of a benchmarking suite for real-time physics to characterize this emerging workload. We have proposed a benchmarking suite for real-time physics to characterize this emerging workload [25]. However, the benefits of PBA come at a considerable computational cost, and the demands of interactive entertainment require soft performance bounds of 30-60 frames per second.
Fortunately, the physical simulation of complex scenes is massively parallel in nature. In light of such massive parallelism, we have designed ParallAX [20], a chip multiprocessor design for physics acceleration. ParallAX combines a larger number of simple, fine-grain compute resources with a smaller number of more powerful, coarse-grain compute resources to handle the diverse physics workload, along with a flexible allocation scheme between coarse and fine-grain cores. ParallAX also makes use of L2 cache partitioning to protect high locality phases from poor locality phases [29].
We contend that physics simulation for interactive entertainment must be believable, but need not be 100% accurate. Our work suggests that we can approximate an objective metric for believability using the conservation of energy in a physical system [48]. We have further explored a hierarchical floating point using dynamic precision reduction to leverage this observation, effectively reducing the area required at each fine grain core by sharing FPU resources [17].
We have also begun work on autonomous agent navigation in virtual worlds . a challenging problem as future virtual environments will features thousands of interactive agents (or more) navigating complex, dynamically varying scenes. Our initial efforts have resulted in an aggressive benchmarking suite for agent steering [51], a novel rule-based algorithm for steering [14], and an on-line system for detecting user-specified behaviors in agent navigation .[10]. This latter component will be vital in providing feedback to our eventual acceleration hardware in trading accuracy for believability. While these directions are outside my normal area of research, I have still acted as a major contributor to this research effort.
Dynamically Leveraging Statically Partitioned Resources
Single-threaded applications will have difficulty dynamically leveraging the statically partitioned resources in a Chip Multiprocessor. Such sequential applications may be difficult to statically decompose into threads or may simply be a legacy code where recompilation is not possible or cost-effective. We present a novel approach [53] to dynamically accelerate the performance of sequential application(s) on multiple cores. Execution is allowed to spill from one core to another when resources on one core have been exhausted. We propose two techniques to enable low-overhead migration between cores: prespilling and locality-based filtering. We develop and analyze an arbitration mechanism to intelligently allocate cores among a set of sequential applications on a CMP. On average, core spilling on an eight-core CMP can accelerate single-threaded performance by 35%. We further explore an eight core CMP running a multiple application workload composed of the entire SPEC 2000 benchmark suite in various combinations and arrival times. Using core spilling to accelerate the current set of running applications in cases where there are idle cores, we achieve up to a 40% improvement in performance.
Publications:
Refereed Conference and Workshop Publications:
[1] Beayna Grigorian, Marco Vitanza, Jason Cong, and Glenn Reinman. Accelerating Vision and Navigation Applications on a Customizable Platform, International Conference on Application-specific Systems, Architectures and Processors (ASAP), Sep 2011.
[2] Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman and Yi Zou, An Energy-Efficient Adaptive Hybrid Cache, International Symposium on Low Power Electronics and Design (ISLPED), Aug 2011.
[3] Gyungsu Byun, Yangkyo Kim, Jongsun Kim, Sai-Wang Tam, Jason Cong, Glenn Reinman, and M. F. Chang, An 8.4Gb/s 2.5pJ/b Mobile Memory I/O Interface Using Bi-directional and Simultaneous Dual (Base+RF)-Band Signaling, International Solid-State Circuits Conference (ISSCC), Feb 2011.
[4] Jason Cong, Mohammadali Ghodrat, Michael Gill, Chunyue Liu, Glenn Reinman and Yi Zou. AXR-CMP: Architecture Support in Accelerator-Rich CMPs. Workshop on SoC Architecture, Accelerators and Workloads (SAW-2), Feb 2011.
[5] Shawn Singh, Mubbasir Kapadia, Glenn Reinman and Petros Faloutsos. A Modular Framework for Adaptive Agent-Based Steering. Symposium on Interactive 3D Graphics and Games (I3D), Feb 2011.
[6] Jason Cong, Chunyue Liu, and Glenn Reinman. ACES: Application-specific cycle elimination and splitting for deadlock-free routing on irregular network-on-chip. Design Automation Conference (DAC), Jun 2010.
[7] Shawn Singh, Mubbasir Kapadia, Petros Faloutsos, and Glenn Reinman. On the Interface Between Steering and Animation for Autonomous Characters. Workshop on Crowd Simulation held in conjunction with the 23rd Annual Conference on Computer Animation and Social Agents, May 2010.
[8] Shawn Singh, Mubbasir Kapadia, Glenn Reinman and Petros Faloutsos. An Open Framework for Developing, Evaluating, and Sharing Steering Algorithms. Motion In Games (MIG), Nov 2009.
[9] Suk-Bok Lee, Sai-Wang Tam, Ioannis Pefkianakis, Songwu Lu, M. Frank Chang, Chuanxiong Guo, Glenn Reinman, Chunyi Peng, Mishali Naik, Lixia Zhang, and Jason Cong. A Scalable Micro Wireless Interconnect Structure for CMPs. International Conference on Mobile Computing and Networking, Sept 2009.
[10] Mubbasir Kapadia, Shawn Singh, Brian Allen, Glenn Reinman, and Petros Faloutsos. An Interactive Framework for Specifying and Detecting Steering Behaviors. Symposium on Computer Animation (SCA), Aug 2009.
[11] Jason Cong, M. Frank Chang, Glenn Reinman, and Sai-Wang Tam,Multiband RF-Interconnect for Reconfigurable Network-on-Chip Communications, System Level Interconnect Prediction (SLIP 2009), July 2009.
[12] M. Frank Chang, Jason Cong, Adam Kaplan, Mishali Naik, Jagannath Premkumar, Glenn Reinman, Eran Socher, and Sai-Wang Tam. Power Redu ction of CMP Communication Networks via RF-Interconnects. International Symposium on Microarchitecture (MICRO), Nov 2008.
[13] Jason Cong, Karthik Gururaj, Guoling Han, Adam Kaplan, Mishali Naik, and Glenn Reinman. MC-Sim: An Efficient Simulation Tool for MPSoC Designs. International Conference on Computer-Aided Design (ICCAD), Nov 2008.
[14] Shawn Singh, Mubbasir Kapadia, Mishali Naik, Petros Faloutsos, and Glenn Reinman. Watch Out! A Framework for Evaluating Steering Behaviors. Proceedings of Motion In Games (MIG), June 2008.
[15] M. Frank Chang, Eran Socher, Sai-Wang Tam, Jason Cong, and Glenn Reinman. RF Interconnects for Communications On-Chip. International Symposium on Physical Design (ISPD), Apr 2008.
[16] M. Frank Chang, Jason Cong, Adam Kaplan, Mishali Naik, Glenn Reinman, Eran Socher, and Sai-Wang Tam. CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect. International Symposium on High-Performance Computer Architecture (HPCA), Feb 2008. BEST PAPER AWARD
[17] Tom Yeh, Petros Faloutsos, Sanjay Patel, Milos Ercegovac, and Glenn Reinman. The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration. International Symposium on Microarchitecture (MICRO), Dec 2007.
[18] Yongxiang Liu, Yuchun Ma, Eren Kursun, Jason Cong, and Glenn Reinman. Fine Grain 3D Integration for Microarchitecture Design Through Cube Packing Exploration. IEEE International Conference on Computer Design (ICCD), Oct 2007.
[19] Yongxiang Liu, Yuchun Ma, Eren Kursun, Jason Cong, and Glenn Reinman. 3D Architecture Modeling and Exploration. VLSI/ULSI Multilevel Interconnection Conference, Sept 2007.
[20] Tom Yeh, Petros Faloutsos, Sanjay Patel, and Glenn Reinman. ParallAX: An Architecture for Real-Time Physics. In 34th Annual International Symposium on Computer Architecture (ISCA), June 2007
[21] Yuchun Ma, Zhuoyuan Li, Jason Cong, Xianlong Hong, Glenn Reinman, Sheqin Dong, and Qian Zhou. Micro-architecture Pipelining Optimization with Throughput-Aware Floorplanning. 12th Asia and South Pacific Design Automation Conference (ASPDAC), Jan 2007.
[22] Vasily G. Moshnyaga, Hua Vo, Glenn Reinman, and Miodrag Potkonjak. Reducing Energy of DRAM/Flash Memory System by OS-Controlled Data Refresh. In International Symposium on Circuits and Systems (ISCAS), May 2007.
[23] Anahita Shayesteh, Glenn Reinman, Norm Jouppi, Suleyman Sair, and Tim Sherwood. Improving the Performance and Power Efficiency of Shared Helpers in CMPs. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct 2006.
[24] Vasily Moshnyaga, Hoa Vo, Glenn Reinman, and Miodrag Potkonjak. Handheld System Energy Reduction by OS-Driven Refresh. Power and Timing Modeling, Optimization, and Simulation (PATMOS), September 2006.
[25] Tom Yeh, Petros Faloutsos, and Glenn Reinman. Enabling Real-Time Physics Simulation in Future Interactive Entertainment. ACM SIGGRAPH Video Game Symposium, Aug 2006.
[26] Jason Cong, Ashok Jagannathan, Yuchun Ma, Glenn Reinman, Jie Wei, and Yan Zhang. An Automated Design Flow for 3D Microarchitecture Evaluation. 11th Asia and South Pacific Design Automation Conference (ASPDAC), Jan 2006.
[27] Anahita Shayesteh, Eren Kursun, Tim Sherwood, Suleyman Sair, and Glenn Reinman. Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines. IEEE International Conference on Computer Design (ICCD), Oct 2005.
[28] Yongxiang Liu, Gokhan Memik, and Glenn Reinman. Reducing the Energy of Speculative Instruction Schedulers. IEEE International Conference on Computer Design (ICCD), Oct 2005.
[29] Tom Yeh and Glenn Reinman. Fast and Fair: Data-stream Quality of Service. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Sep 2005.
[30] Jason Cong, Ashok Jagannathan, Glenn Reinman, and Yuval Tamir. Understanding The Energy Efficiency of SMT and CMP with Multi-clustering. IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Aug 2005.
[31] Yongxiang Liu, Anahita Shayesteh, Gokhan Memik, and Glenn Reinman. Tornado Warning: the Perils of Selective Replay in Multithreaded Processors. International Conference on Supercomputing (ICS), June 2005.
[32] Jason Cong, Yiping Fan, Guoling Han, Ashok Jagannathan, Glenn Reinman, and Zhiru Zhang. Instruction Set Extension with Shadow Registers for Configurable Processors. 13th ACM International Symposium on Field-Programmable Gate Arrays, Feb 2005.
[33] Ashok Jagannathan, Hannah Honghua Yang, Kris Konigsfeld, Dan Milliron, Mosur Mohan, Michail Romesis, Glenn Reinman, and Jason Cong. Microarchitecture Evaluation with Floorplanning and Interconnect Pipelining. Asia South Pacific Design Automation Conference (ASPDAC), Jan 2005.
[34] Eren Kursun, Glenn Reinman, Suleyman Sair, Anahita Shayesteh, and Tim Sherwood. Low-Overhead Core Swapping for Thermal Management. Workshop on Power-Aware Computer Systems (PACS'04) held in conjunction with the 37th Annual International Symposium on Microarchitecture, December 2004.
[35] Yongxiang Liu, Anahita Shayesteh, Gokhan Memik, and Glenn Reinman. The Calm Before the Storm: Reducing Replays in the Cyclone Scheduler. IBM T.J. Watson Conference on Interaction between Architecture, Circuits, and Compilers, Oct 2004.
[36] Jason Cong, Ashok Jagannathan, Glenn Reinman, and Yuval Tamir. A Communication-Centric Approach to Instruction Steering for Future Clustered Processors. IBM T.J. Watson Conference on Interaction between Architecture, Circuits, and Compilers, Oct 2004.
[37] Yongxiang Liu, Anahita Shayesteh, Gokhan Memik, and Glenn Reinman. Scaling the Issue Window with Look-Ahead Latency Prediction. International Conference on Supercomputing (ICS), June 2004.
[38] Fang-Chung Chen, Foad Dabiri, Roozbeh Jafari, Eren Kursun, Vijay Raghunathan, Thomas Schoellhammer, Doug Sievers, Deborah Estrin, Glenn Reinman, Majid Sarrafzadeh, Mani Srivastava, Ben Wu, Yang Yang. Reconfigurable Fabric: An enabling technology for pervasive medical monitoring. Communication Networks and Distributed Systems Modeling and Simulation Conference, Jan 2004.
[39] Jason Cong, Ashok Jagannathan, Glenn Reinman, and Michail Romesis. Microarchitecture Evaluation with Physical Planning. Design Automation Conference (DAC), 2003.
[40] Gokhan Memik, Glenn Reinman, and William H. Mangione-Smith. Reducing Energy and Delay Using Efficient Victim Caches. IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2003.
[41] Gokhan Memik, Glenn Reinman, and William H. Mangione-Smith. Just Say No: Benefits of Early Cache Miss Determination. In the proceedings of the 9th IEEE/ACM International Symposium on High Performance Computer Architecture (HPCA), Feb. 2003.
[42] Glenn Reinman, Brad Calder and Todd Austin. High Performance and Energy Efficient Serial Prefetch Architecture. In the proceedings of the 4th International Symposium on High Performance Computing, May 2002, (c) Springer-Verlag.
[43] Glenn Reinman, Brad Calder, and Todd Austin. Fetch Directed Instruction Prefetching. In 32nd International Symposium on Microarchitecture (MICRO), November 1999.
[44] Glenn Reinman, Brad Calder, Dean Tullsen, Gary Tyson, and Todd Austin. Classifying Load and Store Instructions for Memory Renaming. In ACM International Conference on Supercomputing (ICS), June 1999.
[45] Glenn Reinman, Todd Austin, and Brad Calder. A Scalable Front-End Architecture for Fast Instruction Delivery. In 26th Annual International Symposium on Computer Architecture (ISCA), May 1999.
[46] Brad Calder, Glenn Reinman, and Dean Tullsen. Selective Value Prediction. In 26th Annual International Symposium on Computer Architecture (ISCA), May 1999.
[47] Glenn Reinman and Brad Calder. Predictive Techniques for Aggressive Load Speculation. In 31st Annual International Symposium on Microarchitecture (MICRO), December 1998.
Refereed Journal Publications:
[48] Shawn Singh, Mubbasir Kapadia, Glenn Reinman and Petros Faloutsos. Footstep Navigation for Dynamic Crowds. Computer Animation and Virtual Worlds, April 2011.
[49] Jason Cong, Vivek Sarkar, Glenn Reinman, and Alex Bui. Customizable Domain-Specific Computing. IEEE Design & Test, March/April 2011.
[50] Tom Yeh, Glenn Reinman, Sanjay Patel, and Petros Faloutsos. Fool me twice: Exploring and exploiting error tolerance in physics-based animation. ACM Transactions on Graphics (TOG), December 2009.
[51] Shawn Singh, Mubbasir Kapadia, Petros Faloutsos, and Glenn Reinman. SteerBench: A Benchmark Suite for Evaluating Steering Behaviors. Journal of Computer Animation and Virtual Worlds, Feb 2009.
[52] Yuchun Ma, Yongxiang Liu, Eren Kursun, Glenn Reinman, and Jason Cong. Investigating the Effects of Fine-Grain Three-Dimensional Integration on Microarchitecture Design. ACM Journal on Emerging Technologies in Computing Systems (JETC), Oct 2008.
[53] Jason Cong, Guoling Han, Ashok Jagannathan, Glenn Reinman, and Krzysztof Rutkowski. Accelerating Sequential Applications on CMPs Using Core Spilling. In IEEE Transactions on Parallel and Distributed Systems (TPDS), August 2007.
[54] Glenn Reinman and Gruia Pitigoi-Aron. Trace Cache Miss Tolerance for Deeply Pipelined Superscalar Processors. In IEE Proceedings on Computers and Digital Techniques, September 2006.
[55] Eren Kursun, Anahita Shayesteh, Suleyman Sair, Tim Sherwood, and Glenn Reinman. An Evaluation of Deeply Decoupled Cores. In the Journal of Instruction Level Parallelism (JILP), February 2006.
[56] Anahita Shayesteh, Glenn Reinman, Norm Jouppi, Suleyman Sair, and Tim Sherwood. Dynamically Configurable Shared CMP Helper Engines for Improved Performance. In SIGARCH Computer Architecture News, November 2005.
[57] Gokhan Memik, Glenn Reinman, and Bill Mangione-Smith. Precise Instruction Scheduling. In the Journal of Instruction Level Parallelism (JILP), January 2005.
[58] Glenn Reinman. Using an Operand File to Save Energy and to Decouple Commit Resources. In the IEE Proceedings on Computers and Digital Techniques, Vol 152, Issue 5, September 2005.
[59] Glenn Reinman and Brad Calder. Using a Serial Cache for Energy Efficient Instruction Fetching. In the Journal of Systems Architecture (JSA), 2004.
[60] Brad Calder and Glenn Reinman. A Comparative Survey of Load Speculation Architectures. In the Journal of Instruction Level Parallelism (JILP), May 2000.
[61] Glenn Reinman, Brad Calder, and Todd Austin. Optimizations Enabled by a Decoupled Front-End Architecture. IEEE Transactions on Computing (TOC), Vol 50, No 4, February 2000.
Textbook Chapters:
[62] Glenn Reinman. Chapter 2: Instruction Cache Prefetching. Speculative Execution in High
Performance Computer Architectures. Edited by David Kaeli and Pen Yew. CRC Press, 2005.
Technical Reports:
[63] Glenn Reinman and Norm Jouppi. CACTI version 2.0