Businesses and Academics are increasingly turning to Infrastructure as a Service (IaaS) Clouds to fulfill their computing needs. Unfortunately, current IaaS systems provide a severely restricted pallet of rentable computing options which do not optimally fit the workloads that they are executing. Yanqi’s research encompasses various aspects (Sharing Architecture [ASPLOS 2014], MITTS [ISCA 2016], CASH [ISCA 2016]) of computer architecture aimed at improving IaaS Cloud economic efficiency. The talk will focus on the design and evaluation of a manycore architecture (Sharing Architecture) and a memory bandwidth provisioning mechanism (MITTS). The Sharing Architecture is specifically optimized for IaaS systems by being reconfigurable on a sub-core basis. It enables better matching of workload to micro-architecture resources by replacing static cores with Virtual Cores which can be dynamically reconfigured to have different numbers of ALUs and amount of Cache. MITTS (Memory Inter-arrival Time Traffic Shaping) is a distributed hardware mechanism which limits memory traffic at the source (Core or LLC). MITTS shapes memory traffic based on memory request inter-arrival time using novel hardware, enabling fine-grain bandwidth allocation. In an IaaS system, MITTS enables Cloud customers to express their memory distribution needs and pay commensurately. In a general purpose multicore program, MITTS can be used to optimize for memory system throughput and fairness. MITTS has been implemented in the 32nm 25-core Princeton Piton Processor [HotChip 2016], as well as the open source OpenPiton [ASPLOS 2016] processor framework. The Sharing Architecture and MITTS provide fine-grain hardware configurability, which improves economic efficiency in IaaS Clouds.
Yanqi Zhou is the first graduate student of Prof. David Wentzlaff. Her research area is computer architecture, operating system, and parallel computing. She got her Bachelor’s degrees in Electrical Engineering, Computer Engineering, and Mathematics from University of Michigan and Shanghai Jiao Tong. As a research intern, she worked at Microsoft Research for two summers. Apart from research, she enjoys playing tennis, basketball, swimming, and yoga. As a music lover, she has been playing the violin for over ten years.
Due to the large amount of power that datacenters and HPC systems consume, energy-efficiency improvements are critical for sustainable performance scaling of these systems. Memories pose a looming power bottleneck for future data centers and HPCs due to both application-level and micro-architectural trends. For example, in high memory capacity servers systems, memory can occupy up to 40% of server power budget; similarly, for future exa-scale supercomputers, memory power has been projected to occupy 40-70% of the power budget per computing node. To tackle the memory power problem, I propose common-case optimized memory design, which reduces the energy overheads of common-case memory accesses (e.g., fault free accesses) at the expense of increasing the energy overheads for uncommon case accesses (e.g., accesses to faulty locations). Since common-case accesses are much more frequent than uncommon-case accesses, this tradeoff significantly reduces overall memory energy overheads. In the talk, I will describe common-case optimized memory architectures both for off-chip main memory and on-chip cache memories; the proposed common-case optimized off-chip main memory architecture improves memory energy efficiency by 30-50% while the proposed common-case optimized on-chip main memory architecture improves processor-core-wide energy efficiency by 16%.
Xun Jian is a PhD candidate in the Electrical and Computer Engineering department at the University of Illinois at Urbana-Champaign. He works in the area of computer architecture with special focus on server architectures for improving the future scaling of data centers and HPC systems. His graduate work has been recognized by several best paper awards (SRC TECHCON 2014, IEEE CAL 2013, SRC TECHCON 2015) and has led to technology transfer to industry (patent rights acquired by Empire Technology Development LLC). He was one of the invitees to the Heidelberg Laureate Forum and is a recipient of the M. E. Van Valkenburg Graduate Research Award, the highest award in ECE Illinois to recognize graduate research excellence in the areas of circuits, systems, or computers.
Deep learning has spawned a wide range of AI applications that are changing our lives. However, deep neural networks are both computationally and memory intensive, thus they are power hungry when deployed on embedded systems and data centers with a limited power budget. To address this problem, I will present an algorithm and hardware co-design methodology for improving the efficiency of deep learning.
I will first introduce "Deep Compression", which can compress deep neural network models by 10–49× without loss of prediction accuracy for a broad range of CNN, RNN, and LSTMs. The compression reduces both computation and storage. Next, by changing the hardware architecture and efficiently implementing Deep Compression, I will introduce EIE, the Efficient Inference Engine, which can perform decompression and inference simultaneously, saving a significant amount of memory bandwidth. By taking advantage of the compressed model and being able to deal with an irregular computation pattern efficiently, EIE achieves 13× speedup and 3000× better energy efficiency over GPU. Finally, I will revisit the inefficiencies in current learning algorithms, present DSD training, and discuss the challenges and future work in efficient methods and hardware for deep learning.
Song Han is a Ph.D. candidate supervised by Prof. Bill Dally at Stanford University. His research focuses on energy-efficient deep learning, at the intersection between machine learning and computer architecture. He proposed the Deep Compression algorithm, which can compress neural networks by 10–49× while fully preserving prediction accuracy. He designed the first hardware accelerator that can perform inference directly on a compressed sparse model, which results in significant speedup and energy saving. His work has been featured by O’Reilly, TechEmergence, TheNextPlatform, and Embedded Vision, and it has impacted the industry. He led research efforts in model compression and hardware acceleration that won the Best Paper Award at ICLR’16 and the Best Paper Award at FPGA’17. Before joining Stanford, Song graduated from Tsinghua University.
Mobile computing is experiencing a technological renaissance, and the Web is Florence. Throughout the past decade, the Web has redefined the way people retrieve information, communicate with one another, and extract insights. Although rife with opportunity, the energy-constrained nature of mobile devices is a major roadblock to the potential that next-generation Web technologies promise.
In this talk, I will show a path for achieving an energy-efficient mobile Web by rethinking the conventional abstractions across the hardware/software interface along with deep introspection of Web domain knowledge. In particular, I will describe an energy-efficient mobile processor architecture specialized for Web technologies as well as programming language support that empowers Web developers to make calculated trade-offs between energy-efficiency and end-user quality-of-service. Together, they form the core of my hardware-software co-design philosophy toward the next major milestone of the Web evolution: the Watt-Wise Web. As computing systems in the Internet-of-things era increasingly rely on fundamental Web technologies while operating under even more stringent energy constraints, Watt-Wise Web is here to stay.
Yuhao Zhu is a Visiting Research Fellow in the School of Engineering and Applied Sciences at Harvard University and a final year Ph.D. candidate in the Department of Electrical and Computer Engineering at The University of Texas at Austin. He is interested in designing and prototyping better hardware and software systems to make next-generation edge and cloud computing fast, energy-efficient, intelligent, and safe. His dissertation work on energy-efficient mobile computing has been supported by the Google Ph.D. fellowship. His paper awards include the Best of Computer Architecture Letters in 2014 and IEEE Micro TopPicks in Computer Architecture Honorable Mention in 2016.
Society has become so dependent on computing power that any inefficiencies in the way that we process information can considerably impede productivity and quality of life. Three emerging trends pose challenges to the design of more efficient computer systems. First, energy constraints are becoming more strict amidst the rising interest in IoT and mobile computing. Yet traditional architectures waste a great deal of energy ensuring exactness for the naturally approximate applications that run on these systems (e.g., noisy sensor input, user-subjective output). Second, data sets are growing to enormous proportions due to the rapid gathering of information in modern devices. We can no longer rely on data being readily available in on-chip storage. Third, the active chip area is diminishing at smaller technology nodes due to thermal and power density limitations in process technology scaling. We can no longer fully utilize all on-chip hardware resources simultaneously. In this talk, I present new architectural techniques that tackle these challenges by recognizing that they stem from fundamental gaps in the way that data is contextualized in hardware. The goal of a processor is to process real-world information; yet in modern architectures, hardware perceives data as nothing more than bits. First, I show that awareness of the type of information encoded in the bits enables approximation of data values for greater efficiency under strict energy constraints. Second, I show that awareness of the location of information enables more concise caching of massive data sets. Third, I show that awareness of the significance of information enables better scheduling of computations based on their impact on the quality of the final result, improving utilization of precious on-chip resources. These ideas aim to mitigate fundamental inefficiencies in the data movement, storage and compute of today’s systems.
Joshua San Miguel is a doctoral candidate in Electrical and Computer Engineering at the University of Toronto, where he is advised by Professor Natalie Enright Jerger. He received a BASc in Engineering Science with Honours at the University of Toronto in 2012. His research spans broadly across topics in computer architecture, touching on caches, memory systems, branch prediction, computation models and networks-on-chip, with his dissertation focusing on approximate computing. He is an author of numerous conference and journal papers, fostering several collaborations with Cornell University, IBM Research, INRIA, NVIDIA Research and the University of Washington. His work has received a HiPEAC Paper Award and a NOCS Best Paper Nomination and has been cited twice by IEEE Micro Top Picks as among the top contributions to computer architecture research in 2015 and 2016 (honorable mention). He was also a recipient of the IBM Ph.D. Fellowship in 2016.
Uncertainties can cause significant performance degradations and functional failures in numerous engineering systems. Examples include (but are not limited to) nanoscale devices and systems with fabrication process variations, robot control without full knowledge of design and/or environmental parameters, energy systems with weather-dependent renewable energy sources, and magnetic resonance imaging (MRI) with incomplete and noisy scanning data. Modeling, controlling and optimizing these problems are generally data-intensive: one has to generate and analyze a huge amount of costly data in a parameter space. This often leads to the notorious curse of dimensionality: the complexity grows extremely fast with the number of uncertainty or/and design parameters.
This talk introduces some of our fast non-Monte-Carlo techniques and software for estimating the uncertain performance of engineering systems. The main application is variability analysis of nanoscale IC, MEMS and integrated photonics. Extended applications include energy systems and MRI. These techniques can accelerate a lot of uncertainty-aware optimization, control, and data inference tasks (e.g., yield optimization of silicon chips, robust control of robots and power systems, electrical property tomography using MRI data). The first part will present some fast algorithms to simulate nonlinear dynamic systems influenced by a small number of uncertain parameters. The second part will present some high-dimensional algorithms to predict the performance uncertainties of an engineering system influenced by many random parameters.
Zheng Zhang is a postdoc associate with MIT, where he received his Ph.D. degree in EECS in 2015. He is interested in high-dimensional uncertainty analysis and data inference for nanoscale devices and systems, and for other applications including hybrid systems (e.g., power systems and robots) and MRI. Dr. Zhang received the 2016 ACM Outstanding Ph.D. Dissertation Award in Electronic Design Automation, the 2015 Doctoral Dissertation Seminar Award from the Microsystems Technology Laboratory of MIT, and the 2014 Best Paper Award from IEEE Transactions on CAD of Integrated Circuits and Systems. He is a TPC member of Design Automation Conference (DAC) and International Conference on Computer-Aided Design (ICCAD).