Nay Myo Sandar School of Information Technology Shinawatra University, Bangkok, 10400 Thailand Email: uniquechan014@gmail.com
Abstract—In traditional computing, with the development and popularity of Information Technology, we have been creating and storing many data in our computers. This leads to big data (e.g., terabytes or petabytes). Data comes from everywhere such as business, commerce, education, manufacturing, and communication services. For the sake of growing data, cloud computing became as a popular distributed computing environment for big data processing since it provides resources and capabilities of Information Technology (e.g., applications, storages, communication, collaboration, infrastructure) through services offered by cloud service provider. These large amount of data can be transferred to a single sink site (e.g., AWS, Google datacenters, etc) in a scenario where data can be moved over both internet and by shipping storage devices (e.g., external or hot-plug drives, or SSDs) via companies such as Fedex, UPS, USPS, etc in order to reduce total dollar cost as well as total transfer latency of collective dataset. In this paper, we explore the problem of satisfying latency deadline (i.e., transfer finishes within a day) while minimizing dollar cost and investigate heuristic algorithm for adjusting optimally the tradeoff between the cost and latency of internet and shipping transfer for large datasets under uncertainty of shipping time and size of data based on stochastic programming technique with two stage recourse. Keywords—Big Data, Cloud Computing, Stochastic Programming
I. INTRODUCTION Cloud computing growth has attracted by various communities such as researchers, student, business, consumer, and government organization. Big data is the main reason for coming of cloud computing since we upload lots of data in the size of PETA bytes which required lots of