DEJAN S. MILOJICIC†, FRED DOUGLIS‡, YVES PAINDAVEINE††, RICHARD WHEELER‡‡ and SONGNIAN ZHOU*
† HP Labs, ‡ AT&T Labs–Research, †† TOG Research Institute, ‡‡ EMC, and *University of Toronto and Platform Computing
Abstract
Process migration is the act of transferring a process between two machines. It enables dynamic load distribution, fault resilience, eased system administration, and data access locality. Despite these goals and ongoing research efforts, migration has not achieved widespread use. With the increasing deployment of distributed systems in general, and distributed operating systems in particular, process migration is again receiving more attention in both research and product development. As high-performance facilities shift from supercomputers to networks of workstations, and with the ever-increasing role of the World Wide Web, we expect migration to play a more important role and eventually to be widely adopted. This survey reviews the field of process migration by summarizing the key concepts and giving an overview of the most important implementations. Design and implementation issues of process migration are analyzed in general, and then revisited for each of the case studies described: MOSIX, Sprite, Mach and Load Sharing Facility. The benefits and drawbacks of process migration depend on the details of implementation and therefore this paper focuses on practical matters. This survey will help in understanding the potentials of process migration and why it has not caught on. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems - network operating systems; D.4.7 [Operating Systems]: Organization and Design - distributed systems; D.4.8 [Operating Systems]: Performance: measurements; D.4.2 [Operating Systems]: Storage Management - distributed memories. Additional Key Words and Phrases: process migration, distributed systems, distributed operating systems, load distribution.
1
References: and Acyclic Garbage Collection. Proceedings of the Symposium on Principles of Distributed Computing, pages 135-146. Shapiro, M., Gautron, P., and Mosseri, L. (July 1989). Persistence and Migration for C++ Objects. Proceedings of the ECOOP 1989–European Conference on Object-Oriented Programming. Shivaratri, N. G. and Krueger, P. (May-June 1990). Two Adaptive Location Policies for Global Scheduling Algorithms. Proceedings of the 10th International Conference on Distributed Computing Systems, pages 502–509. Shivaratri, N., Krueger, P., and Singhal, M. (December 1992). Load Distributing for Locally Distributed Systems. IEEE Computer, pages 33–44. Shoham, Y. (1997). An Overview of Agent-oriented Programming. in J.M. Bradshaw, editor, Software Agents, pages 271–290. MIT Press. Shoch, J. and Hupp, J. (March 1982). The Worm Programs Early Experience with Distributed Computing. Communications of the ACM, 25(3):172–180. Shub, C. (February 1990). Native Code Process-Originated Migration in a Heterogeneous Environment. Proceedings of the 18th ACM Annual Computer Science Conference, pages 266–270. Singhal, M. and Shivaratri, N. G. (1994). Advanced Concepts in Operating Systems. McGraw Hill. Sinha, P., Maekawa, M., Shimuzu, K., Jia, X., Ashihara, Utsunomiya, N., Park, and Nakano, H. (August 1991). The Galaxy Distributed Operating System. IEEE Computer, 24(8):34–40. Skordos, P. (August 1995). Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations. Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing. Smith, J. M. (July 1988). A Survey of Process Migration Mechanisms. Operating Systems Review, 22(3):28–40. Smith, J. M. and Ioannidis, J. (1989). Implementing Remote fork() with Checkpoint-Restart. IEEE Technical Committee on Operating Systems Newsletter, 3(1):15–19. Smith, P. and Hutchinson, N. (May 1998). Heterogeneous Process Migration: The Tui System. Software—Practice and Experience, 28(6):611–639. Soh, J. and Thomas, V. (1987). Process Migration for Load Balancing in Distributed Systems. TENCON, pages 888– 892. Squillante, M. S. and Nelson, R. D. (May 1991). Analysis of Task Migration in Shared-Memory Multiprocessor Scheduling. Proceedings of the ACM SIGMETRICS Conference, 19(1):143–155. Stankovic, J. A. (1984). Simulation of the three Adaptive Decentralized Controlled Job Scheduling algorithms. Computer Networks, pages 199–217. Steensgaard, B. and Jul, E. (December 1995). Object and Native Code Thread Mobility. Proceedings of the 15th Symposium on Operating Systems Principles, pages 68–78. Steketee, C., Zhu, W., and Moseley, P. (June 1994). Implementation of Process Migration in Amoeba. Proceedings of the 14th International Conference on Distributed Computer Systems, pages 194–203. Stone, H. (May 1978). Critical Load Factors in Two-Processor Distributed Systems. IEEE Transactions on Software Engineering, SE-4(3):254–258. Stone, H. S. and Bokhari, S. H. (July 1978). Control of Distributed Processes. IEEE Computer, 11(7):97–106. Stumm, M. (1988). The Design and Implementation of a Decentralized Scheduling Facility for a Workstation Cluster. Proceedings of the Second Conference on Computer Workstations, pages 12–22. Sun Microsystems (July 1998). JiniTM Software Simplifies Network Computing. http://www.sun.com/980713/jini/feature.jhtml Svensson, A. (May-June 1990). History, an Intelligent Load Sharing Filter. Proceedings of the 10th International Conference on Distributed Computing Systems, pages 546– 553. Swanson, M., Stoller, L., Critchlow, T., and Kessler, R. (April 1993). The Design of the Schizophrenic Workstation System. Proceedings of the third USENIX Mach Symposium, pages 291–306. Tanenbaum, A.S., Renesse, R. van, Staveren, H. van., Sharp, G.J., Mullender, S.J., Jansen, A.J., and van Rossum, G. (December 1990). Experiences with the Amoeba Distributed Operating System. Communications of the ACM, 33(12):46-63. Tanenbaum, A. (1992). Modern Operating Systems. Prentice Hall, Englewood Cliffs, New Jersey. Tardo, J. and Valente, L. (February 1996). Mobile Agent Security and Telescript. Proceedings of COMPCON’96, pages 52–63. Teodosiu, D., (1999) End-to-End Fault Containment in Scal- 47 able Shared-Memory Multiprocessors. Ph.D. Thesis, Technical Report, Stanford University. Theimer, M. H. and Hayes, B. (June 1991). Heterogeneous Process Migration by Recompilation. Proceedings of the 11th International Conference on Distributed Computer Systems, pages 18–25. Theimer, M. and Lantz, K. (November 1988). Finding Idle Machines in a Workstation-Based Distributed System. IEEE Transactions on Software Engineering, SE-15(11):1444– 1458. Theimer, M., Lantz, K., and Cheriton, D. (December 1985). Preemptable Remote Execution Facilities for the V System. Proceedings of the 10th ACM Symposium on OS Principles, pages 2–12. Tracey, K. M. (April 1991). Processor Sharing for Cooperative Multi-task Applications. Ph.D. Thesis, Technical Report, Department of Electrical Engineering, Notre Dame, Indiana. Tritscher, S. and Bemmerl, T. (February 1992). Seitenorientierte Prozessmigration als Basis fuer Dynamischen Lastausgleich. GI/ITG Pars Mitteilungen, no 9, pages 58–62. Tschudin, C. (April 1997). The Messenger Environment M0– a condensed description. In Mobile Object Systems: Towards the Programmable Internet, LNCS 1222, Springer Verlag, pages 149–156. van Dijk, G. J. W. and van Gils, M. J. (March 1992). Efficient process migration in the EMPS multiprocessor system. Proceedings 6th International Parallel Processing Symposium, pages 58–66. van Renesse, R., Birman, K. P., and Maffeis, S. (April 1996). Horus: A flexible Group Communication System. Communication of the ACM, 39(4):76–85. Vaswani, R. and Zahorjan, J. (October 1991). The implications of Cache Affinity on Processor Scheduling for Multiprogrammed Shared Memory Multiprocessors. Proceedings of the Thirteenth Symposium on Operating Systems Principles, pages 26–40. Venkatesh, R. and Dattatreya, G. R. (August 1990). Adaptive Optimal Load Balancing of Loosely Coupled Processors with Arbitrary Service Time Distributions. Proceedings of the 1990 International Conference on Parallel Processing, I:22–25. Vigna, G. (December 1998). Mobile Agents Security, LNCS 1419, Springer Verlag, to appear. Vitek, I., Serrano, M., and Thanos, D. (April 1997). Security and Communication in Mobile Object Systems. In Mobile Object Systems: Towards the Programmable Internet, LNCS 1222, Springer Verlag, pages 177–200. Walker, B., Popek, G., English, R., Kline, C., and Thiel, G. (October 1983). The LOCUS Distributed Operating System. Proceedings of the 9th Symposium on Operating Systems Principles, 17(5):49–70. Walker, B. J. and Mathews, R. M. (Winter 1989). Process Migration in AIX’s Transparent Computing Facility (TCF). IEEE Technical Committee on Operating Systems Newsletter, 3(1)(1):5–7. Wang, Y.-T. and Morris, R. J. T. (March 1985). Load Sharing in Distributed Systems. IEEE Transactions on Computers, C-34(3):204–217. Wang, C.-J., Krueger, P., and Liu, M. T. (May 1993). Intelligent Job Selection for Distributed Scheduling. Proceedings of the 13th International Conference on Distributed Computing Systems, pages 288–295. Welch, B. B. and Ousterhout, J. K. (June 1988). Pseudo-Devices: User-Level Extensions to the Sprite File System. Proceedings of the USENIX Summer Conference, pages 7–49. Welch, B. (April 1990). Naming, State Management and UserLevel Extensions in the Sprite Distributed File System. Ph.D. Thesis, Technical Report UCB/CSD 90/567, CSD (EECS), University of California, Berkeley. White, J. (1997). Telescript Technology: An Introduction to the Language. White Paper, General Magic, Inc., Sunnyvale, CA. Appeared in Bradshaw, J., Software Agents, AAAI/ MIT Press. White, J.E., Helgeson, S., and Steedman, D.A. (February 1997). System and Method for Distributed Computation Based upon the Movement, Execution, and Interaction of Processes in a Network. United States Patent no. 5603031. Wiecek, C. A. (April 1992). A Model and Prototype of VMS Using the Mach 3.0 Kernel. Proceedings of the USENIX Workshop on Micro-Kernels and Other Kernel Architectures, pages 187–204. Wong, R., Walsh, T., and Paciorek, N. (April 1997). Concordia: An Infrastructure for Collaborating Mobile Agents. Proceedings of the First International Workshop on Mobile Agents, LNCS 1219, Springer Verlag, pages 86–97. Xu, J. and Hwang, K. (November 1990). Heuristic Methods for Dynamic Load Balancing in a Message-Passing Supercomputer. Proceedings of the Supercomputing’90, pages 888– 897. Zajcew, R., Roy, P., Black, D., Peak, C., Guedes, P., Kemp, B., LoVerso, J., Leibensperger, M., Barnett, M., Rabii, F., and Netterwala, D. (January 1993). An OSF/1 UNIX for Massively Parallel Multicomputers. Proceedings of the Winter USENIX Conference, pages 449–468. Zayas, E. (November 1987a). Attacking the Process Migration Bottleneck. Proceedings of the 11th Symposium on Operating Systems Principles, pages 13–24. Zayas, E. (April 1987b). The Use of Copy-on-Reference in a Process Migration System. Ph.D. Thesis, Technical Report CMU-CS-87-121, Carnegie Mellon University. Zhou, D. (1987) A Trace-Driven Simulation Study of Dynamic Load Balancing. Ph.D. Thesis, Technical Report UCB/ CSD 87/305, CSD (EECS), University of California, Berkeley. Zhou, S. and Ferrari, D. (September 1987). An Experimental Study of Load Balancing Performance. Proceedings of the 7th IEEE International Conference on Distributed Com- 48 puting Systems, pages 490–497. Zhou, S. and Ferrari, D. (September 1988). A Trace-Driven Simulation Study of Dynamic Load Balancing. IEEE Transactions on Software Engineering, 14(9):1327–1341. Zhou, S., Zheng, X., Wang, J., and Delisle, P. (December 1994). Utopia: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems. Software-Practice and Experience. Zhu, W. (March 1992). The Development of an Environment to Study Load Balancing Algorithms, Process migration and load data collection. Ph.D. Thesis, Technical Report, University of New South Wales. Zhu, W., Steketee, C., and Muilwijk, B. (1995). Load Balancing and Workstation Autonomy on Amoeba. Australian Computer Science Communications (ACSC’95), 17(1):588–597. 49