by
Hong Xu
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
c Copyright 2013 by Hong Xu
Abstract
E cient Workload and Resource Management in Datacenters
Hong Xu
Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
2013
This dissertation focuses on developing algorithms and systems to improve the e ciency of operating mega datacenters with hundreds of thousands of servers. In particular, it seeks to address two challenges: First, how to distribute the workload among the set of datacenters geographically deployed across the wide area? Second, how to manage the server resources of datacenters using virtualization technology?
In the first part, we consider the workload management problem in geo-distributed datacenters. We first present a novel distributed workload management algorithm that jointly considers request mapping, which determines how to direct user requests to an appropriate datacenter for processing, and response routing, which decides how to select a path among the set of ISP links of a datacenter to route the response packets back to a user. In the next chapter, we study some key aspects of cost and workload in geodistributed datacenters that have not been fully understood before. Through extensive empirical studies of climate data and cooling systems, we make a case for temperature aware workload management, where the geographical diversity of temperature and its impact on cooling energy e ciency can be used to reduce the overall cooling energy.
Moreover, we advocate for holistic workload management for both interactive and batch jobs, where the delay-tolerant elastic nature of batch jobs can be exploited to further reduce the energy cost. A consistent 15% to 20% cooling energy
Bibliography: NSDI, 2010. placing jobs at cooling-e cient locations in the data center,” in Proc. USENIX ATC, 2007. [11] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Athena Scientific, 1997. [14] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004. and services research,” in Proc. USENIX HotCloud, 2009. [18] Z. Cao, Z. Wang, and E. Zegura, “Performance of hashing-based schemes for Internet load balancing,” in Proc. IEEE INFOCOM, 2000. Inte. Netw. Manag. (IM), 2009. Manag. (IM), 2011. [22] Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gautam, “Managing server energy and operational costs in hosting centers,” in Proc. ACM Sigmetrics, 2005. [30] J. Dean, “Underneath the covers at google: Current systems and future directions,” In Google I/O, 2008. [31] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. USENIX OSDI, 2004. Dropbox: Understanding personal cloud storage services,” in Proc. ACM IMC, 2012. precision cooling system sales brochure,” http://tinyurl.com/c7e8qxz, 2012.