Over recent years, parallel computing has emerged in mobile devices, mainstream computers, and high-end servers as a means to reach aggressive performance targets while also maintaining acceptable power and thermal characteristics. As a result, current computer systems design — both at the software and hardware level — revolves around effective management of many different shared resources, such as processor cores, memory hierarchy capacity and bandwidth, or an overall power budget. This talk will discuss my group's research in power-aware computing across several scales, particularly highlighting examples where effective management of shared resources leads to power, performance or parallelism improvements.
Margaret Martonosi is Professor of Computer Science at Princeton University, where she has been on the faculty since 1994. She also holds an affiliated faculty appointment in Princeton EE. Martonosi's research interests are in computer architecture and the hardware/software interface, with particular focus on power-efficient systems and mobile computing. In the field of processor architecture, she has done extensive work on power modeling and management and on memory hierarchy performance and energy. This has included the development of the Wattch power modeling tool, the first architecture level power modeling infrastructure for superscalar processors. In the field of mobile computing and sensor networks, Martonosi led the Princeton ZebraNet project, which included two real-world deployments of tracking collars on Zebras in Central Kenya. In addition to numerous publications, she has co-authored a technical reference book on Power-Aware Computing and six granted US patents. Martonosi is a fellow of both IEEE and ACM. In 2010, she received Princeton University's Graduate Mentoring Award.
Emerging, fast non-volatile memories are around 1000s of times faster than conventional disks in terms of latency, and they offer enormous gains in bandwidth as well. Fully leveraging these technologies will require far-reaching changes in how storage systems operate. To understand the impact of this increased storage performance, we have developed a prototype high-performance storage system called Moneta. Our experience with Moneta shows that system and application software designed for a world of slow disks is a poor fit for storage devices based on these new technologies. Moneta's hardware interface and software stack work together to remove software overheads such as disk-centric IO scheduling, contentious locks, and system call overheads. Moneta also provides a generic facility for removing file system overheads almost entirely. The combination of these optimizations reduces latency for a 4KB read request from 25.5us to 7.9us and increases sustained bandwidth for small requests by 26 times. We compare Moneta to a range of storage devices based on disks, flash memory, and advanced non-volatile memories, and find that further work is required at the application level to fully leverage the potential of these new memories.
Steven Swanson is an assistant professor in the Department of Computer Science and Engineering at the University of California, San Diego and the director of the Non-volatile Systems Laboratory. His research interests include the systems, architecture, security, and reliability issues surrounding non-volatile, solid-state memories. He also co-leads projects to develop low-power co-processors for irregular applications and to devise software techniques for using multiple processors to speed up single-threaded computations. In previous lives he has worked on scalable dataflow architectures, ubiquitous computing, and simultaneous multithreading. He received his PhD from the University of Washington in 2006.
This talk will introduce a new programming/architectural execution model for parallel threads. Unlike threads in conventional programming models, data-triggered threads are initiated on a change to a memory location. This enables increased parallelism and the elimination of redundant, unnecessary computation. This talk will focus primarily on the latter. We'll show that 78% of all loads fetch redundant data, leading to a high incidence of redundant computation. By expressing computation through data-triggered threads, that computation is executed once when the data changes, and is skipped whenever the data does not change. The set of C SPEC benchmarks show performance speedup of up to 5.9X, and averaging 46%.
Dean Tullsen is a professor in the computer science and engineering department at UCSD. He received his PhD from the University of Washington in 1996, where he introduced the concept of simultaneous multithreading (hyper-threading). He has continued to work in the area of computer architecture and back-end compilation, where with various co-authors he has introduced many new ideas to the research community, including threaded multipath execution, symbiotic job scheduling for multithreaded processors, dynamic critical path prediction, speculative precomputation, heterogeneous multi-core architectures, conjoined core architectures, and event-driven simultaneous code optimization. He is a Fellow of the IEEE.
The Berkeley "Par Lab" was established in 2008 to address perhaps the greatest ever challenge in computing systems: the end of sequential processor performance scaling and the resultant need to move to parallel computing everywhere. Our ambitious goal is “to enable most programmers to productively write correct, portable, efficient software for manycore processors that will scale with the number of cores”. We formed a large co-located team of faculty and students collaborating to tackle this problem from applications down to architecture. In this talk, I will give an overview of our research agenda and describe the progress we have made in the first three years of the project.
Krste Asanovic is currently an Associate Professor in the Computer Science Division at UC Berkeley. Asanovic received a PhD in Computer Science from UC Berkeley in 1998 then joined the faculty at MIT, receiving tenure in 2005. He returned to join the faculty at Berkeley in 2007, where he co-founded the Berkeley Parallel Computing Laboratory ("Par Lab").
Scale-out architectures supporting flexible, incremental growth in capacity are common for computing and storage. However, the network remains the last bastion of the traditional scale-up approach, where increasing performance requires increasing levels of specialization at tremendous cost and complexity. Today, the network is often the weak link in data center application performance and reliability. In this talk, we summarize our work in bringing scale out growth of capacity to data center networks. With a focus on the UCSD Triton architecture, we explore issues in managing the network as a single plug-and-play virtualizable fabric scalable to hundreds of thousands of ports and petabits per second of aggregate bandwidth.
For more information: Triton Data Center Networking
Amin Vahdat is a Professor and holds the Science Applications International Corporation Chair in the Department of Computer Science and Engineering at the University of California San Diego. He is also the Director of UCSD's Center for Networked Systems. Vahdat's research focuses broadly on computer systems, including distributed systems, networks, and operating systems. He received his PhD in Computer Science from UC Berkeley under the supervision of Thomas Anderson after spending the last year and a half as a Research Associate at the University of Washington. Before joining UCSD, he was on the faculty at Duke University. He is a past recipient of the NSF CAREER award, the Alfred P. Sloan Fellowship and the Duke University David and Janet Vaughn Teaching Award.
For more information: Amin Vahdat, UCSD
We will present the potential of using Programmable Analog Signal processing techniques for impacting low-power portable applications like imaging, audio processing, and speech recognition. The range of analog signal processing functions available results in many potential opportunities to incorporate these analog signal processing systems with digital signal processing systems for improved overall system performance. Programmable, dense analog techniques enable these approaches, based upon programmable transistor approaches. We show experimental evidence for the factor of 1000 to 10,000 power efficiency improvement for programmable analog signal processing compared to custom digital implementations. We will have particular focus on configurable analog techniques, in particular Large-Scale Field Programmable Analog Arrays (FPAA).
Paul Hasler is an Associate Professor in the School of Electrical and Computer Engineering at Georgia Institute of Technology. Dr. Hasler received his M.S. and B.S.E. in Electrical Engineering from Arizona State University in 1991, and received his Ph.D. From California Institute of Technology in Computation and Neural Systems in 1997. His current research interests include low power electronics, mixed-signal system ICs, floating-gate MOS transistors, adaptive information processing systems, "smart" interfaces for sensors, cooperative analog-digital signal processing, device physics related to submicron devices or floating-gate devices, and analog VLSI models of on-chip learning and sensory processing in neurobiology. Dr. Hasler received the NSF CAREER Award in 2001, and the ONR YIP award in 2002. Dr. Hasler received the Paul Raphorst Best Paper Award, IEEE Electron Devices Society, 1997, IEEE CICC best paper award, 2005, Best student paper award, IEEE Ultrasound Symposium, 2006, and IEEE ISCAS Sensors best paper award, 2005. Dr. Hasler is a Senior Member of the IEEE.
Modern software is fraught with bugs that allow hackers to overtake and control computing systems. The current approach to dealing with these attacks invites the criminal hacker into the software design cycle, resulting in great losses to customers and developers. In this talk, I'll introduce a better approach to building secure software that leverages security vulnerability analysis to find the bugs that hackers love, long before they can be exploited. This approach to secure software development is a powerful one, but also very expensive. To address this issue, I will introduce the Testudo project that is developing novel hardware and software technologies to massively scale the performance of security vulnerability analyses by pushing them into the field where they run continuously but with imperceptible performance impact.
Todd Austin is a Professor of Electrical Engineering and Computer Science at the University of Michigan in Ann Arbor. His research interests include computer architecture, secure and reliable system design, hardware and software verification, and performance analysis tools and techniques. Prior to joining academia, Todd was a Senior Computer Architect in Intel's Microcomputer Research Labs, a product-oriented research laboratory in Hillsboro, Oregon. Todd is the first to take credit (but the last to accept blame) for creating the SimpleScalar Tool Set, a popular collection of computer architecture performance analysis tools. In 2002, Todd was a Sloan Research Fellow, and in 2007 he received the ACM Maurice Wilkes Award "for innovative contributions in Computer Architecture including the SimpleScalar Toolkit and the DIVA and Razor architectures." Todd received his PhD in Computer Science from the University of Wisconsin in 1996.
In this talk I give an overview of the algorithms we have developed at UCSD to significantly lower the energy consumption in computing systems. We derived optimal power management strategies for stationary workloads that have been implemented both in HW and SW. Run-time adaptation can be done via an online learning algorithm that selects among a set of policies. We generalize the algorithm to include thermal management since we found that minimizing the power consumption does not necessarily reduce the overall energy costs. To reduce the performance costs typically associated with state of the art thermal management techniques, we developed a new set of proactive management policies. The experimental results using real datacenter workloads on an actual multicore system show that our proactive technique is able to dramatically reduce the adverse effects of temperature by over 60%. Most recently we have shown that symbiotic scheduling of workloads in virtualized environments can lead to average 15% energy savings with 20% performance benefit in high utilization scenarios.
I will also present some of the recent work we had done to address the energy savings in battery powered and energy harvesting systems. We are designing a new kind of “citizen infrastructure”, CitiSense, as an end-to-end health and environmental information system with near real-time data streams and feedback loops from the system to the sensing, processing, and actuation infrastructure. We have developed adaptive algorithms to tradeoff accuracy of computation versus the available energy for such systems, while taking into account the energy harvesting capabilities.
Tajana Simunic Rosing is currently an Assistant Professor in Computer Science Department at UCSD. Her research interests are energy efficient computing, embedded and wireless systems. Tajana’s work on event driven dynamic power management laid the mathematical foundations for the engineering problem, devised a globally optimal solution and more importantly defined the framework for future researchers to approach these kinds of problems in embedded system design. Her recent results demonstrate the importance of joint power and thermal management in multicore server systems in order to minimize the overall energy cost. Furthermore, she developed a novel class of proactive thermal management policies that can lower the incidence of hot spots in multicore processors by up to 60% with no performance impact. Her current work is focused on developing energy efficient scheduling policies for virtualized server environments and on energy efficiency in population area healthcare networks.
From 1998 until 2005 she was a full time research scientist at HP Labs while also leading research efforts at Stanford University. She finished her PhD in EE in 2001 at Stanford, concurrently with finishing her Masters in Engineering Management. Her PhD topic was dynamic management of power consumption. Prior to pursuing the PhD, she worked as a senior design engineer at Altera Corporation. She obtained the MS in EE from University of Arizona. Her MS thesis topic was high-speed interconnect and driver-receiver circuit design. She has served at a number of Technical Paper Committees, and is currently an Associate Editor of IEEE Transactions on Mobile Computing. In the past she has been an Associate Editor of IEEE Transactions on Circuits and Systems.
Margaret Martonosi, Professor, Princeton University
"Energy-Efficient Computing: The Role of Parallelism"

Steven Swanson, Assistant Professor, UC San Diego
"Moneta: A Fast Storage Array Architecture for Next-Generation Non-Volatile Memories"

Dean Tullsen, Professor, UCSD
"Data Triggered Threads — Eliminating Redundant Computation"

Krste Asanović, Associate Professor, UC Berkeley
"The Berkeley Parallel Computing Laboratory"

Amin M. Vahdat, Professor, UCSD
"Scale Out Networking in the Data Center"

Paul E. Hasler, Associate Professor, Georgia Tech
"Programmable and Configurable Analog Signal Processing for Low-power Sensor Systems"

Todd Austin, Professor, University of Michigan
"Squash Your Security Bugs, Before They Squash You!"

Tajana Simunic Rosing, Assistant Professor, UC San Diego
"Energy Efficient Computing"