The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google
Niek Linnenbank
Faculty of Science Vrije universiteit nlk800@few.vu.nl
March 17, 2010
The Google File System Outline
1 Introduction 2 Architecture 3 Measurements 4 Latest Work 5 Conclusion
The Google File System Introduction
Size of the Internet
6,767,805,208 people on earth 1,733,993,741 people on the internet 5,000,000 terabytes of data (Eric Schmidt, 2005)
The Google File System Introduction
Top 10 Search Provider in US, January 2010
RANK 1 2 3 4 5 6 7 8 9 10 PROVIDER ALL SEARCH GOOGLE SEARCH YAHOO SEARCH MSN SEARCH AOL SEARCH ASK.COM SEARCH MY WEB SEARCH SEARCH COMCAST SEARCH YELLOW PAGES SEARCH NEXTAG SEARCH BIZRATE SEARCH SEARCHES (000) 10,272,099 6,805,424 1,488,476 1,116,546 251,762 194,161 112,356 59,608 35,101 34,736 20,123 SHARE 100.0 66.3 14.5 10.9 2.5 1.9 1.1 0.6 0.3 0.3 0.2
The Google File System Introduction
The Google Way
Google does web indexing (and more) Cheap commodity hardware Patented PageRank(tm) technology
The Google File System Introduction
Google Filesystem
Scalable distributed filesystem Designed for cheap clusters Capable of storing hundreds of terabytes
The Google File System Architecture
Assumptions
Component failures are the norm Inexpensive commodity hardware Large files Files mutated with appends Workload typically large streaming reads and appends
The Google File System Architecture
Design
One master process keeps file metadata. Files are split into chunks. Multiple chunkservers to store chunks. Multiple clients may access concurrently. POSIX-a-like API (create, read, write, append, delete)
The Google File System Architecture
Design client chunk data
chunk locations
chunk server
master
chunk server
chunk server
chunk server
chunk server
chunk server
chunk server
chunk server
The Google File System Architecture