Jason Maassen and Henri E. Bal
Dept. of Computer Science, Vrije Universiteit
Amsterdam, The Netherlands jason@cs.vu.nl, bal@cs.vu.nl
ABSTRACT
Tightly coupled parallel applications are increasingly run in
Grid environments. Unfortunately, on many Grid sites the ability of machines to create or accept network connections is severely limited by firewalls, network address translation
(NAT) or non-routed networks. Multi homing further complicates connection setup and machine identification. Although ad-hoc solutions exist for some of these problems, it is usually up to the application’s user to discover the cause of the connectivity problems and find a solution. In this paper we describe SmartSockets,
1
a communication library that lifts this burden by automatically discovering the connectivity problems and solving them with as little support from the user as possible.
Categories and Subject Descriptors: C.2.4 [Distributed
Systems]: Distributed applications
General Terms: Algorithms, Design, Reliability
Keywords: Connectivity Problems, Grids, Networking, Parallel Applications
1. INTRODUCTION
Parallel applications are increasingly run in Grid environments. Unfortunately, on many Grid sites the ability of machines to create or accept network connections is severely limited by network address translation (NAT) [14, 26] or firewalls [15]. There are even sites that completely disallow any direct communication between the compute nodes and the rest of the world (e.g., the French Grid5000 system [3]).
In addition, multi homing (machines with multiple network addresses) can further complicate connection setup.
For parallel applications that require direct communication between their components, these limitations have hampered the transition from traditional multi processor or cluster systems to Grids. When a combination of Grid sites is used, serious connectivity problems are often
References: Area Network. IEEE Micro, 15(1):29–36, Jan. 1995. Computational Methods in Science and Technology, 12, 2006. Proceedings of the 2005 USENIX Technical Conference, 2005. [14] P. Francis. Is The Internet Going Nutss? IEEE Internet Computing, 7(6):94–96, 2003. RFC 2979, Oct. 2000. Addison Wesley, Reading (MA), USA, 1995. Grid Computing. In Proc. of 20th International Parallel and Distributed Processing Symposium (IPDPS-2006), April 2006. Internet Measurement Conference (IMC), 2005. York, NY, USA, 2004. ACM Press. In Proc. of 20th International Parallel and Distributed Processing Symposium (IPDPS-2006), April 2006. Feb. 1996. Network Address Translators (NATs). RFC 3489, Mar. 2003. Framework. RFC 3303, Aug. 2002.