!
!
!
Introduction!
!
It is a challenge for some large-scale websites such as popular SNS websites and e-business websites to deal with problems that are brought about from increasing number of clients. And traditional architectures of a website and relational databases have limits on dealing with large number of concurrent users. Take taobao.com as an example. (taobao.com is a popular website in China similar to eBay, and IP/PV per day for taobao.com now is about
24,120,000/455,868,000.) At the first stage, when taobao.com just started up, the architecture it took is just LAMP (Linux+Apache+MySQL+PHP); with the number of users keeping increasing, taobao.com replaced MySQL with Oracle and updated the web server into a IBM mainframe to meet the needs; in the end, taobao.com has to develop its own distributed system as the number of users exploded in recent years. The same thing happens to many other companies.
For example, Google develops Google File System[1] and builds Bigtable[2] and MapReduce[3] computing framework on top of it for processing massive data; Amazon designs several distributed storage systems like Dynamo[4]; and Facebook uses Hive[5] and HBase for data analysis, and uses HayStack[6] for the storage of photos.!
!
These distributed systems often have some important features, such as availability, scalability, high performance/throughput and so on. These features can be useful for a large-scale website.!
!
However, there are two major problems when a company is developing and deploying such distributed systems,!
!
1)
Tremendous human and other resource have to be invested into the development of such distributed systems. Besides, when moving data from an existent system to the new one, the company has to take extra efforts on data moving.!
!
2)
Interfaces between the old system and the new one may not be compatible. For example, as most distributed