History of File Structure

History of File Structures
I. Early Work
• Early Work assumed that files were on tape.
• Access was sequential
– The cost of access grew in direct proportion to the size of the file.

II. The emergence of Disks and Indexes
• As files grew very large, sequential access was not a good solution.
• Disks allowed for direct access.
– Indexes made it possible to keep a list of keys and pointers in a small file that could be searched very quickly.
– With the key and pointer, the user had direct access to the large, primary file.

III. The emergence of Tree Structures
• Indexes also have a sequential flavor
• When they grow too much, they also become difficult to manage.
• The idea of using tree structures to manage the index emerged in the early 60’s.
• However, trees can grow very unevenly as records are added and deleted
• Resulting in long searches requiring many disk accesses to find a record.

IV. Balanced Trees
• In 1963, researchers came up with the idea of AVL trees for data in memory.
• However, AVL trees did not apply to files
• Because they work well when tree nodes are composed of single records rather than dozens or hundreds of them.
• In the 1970’s came the idea of B-Trees which require an O(logkN) access time
• Where N is the number of entries in the file and k is the number of entries indexed in a single block of the B-Tree structure
• B-Trees can guarantee that we can find an entry among millions of others with only 3 or 4 trips to the disk.

V. Hash Tables
• Retrieving entries in 3 or 4 accesses is good
• But it does not reach the goal of accessing data with a single request.
• Hashing was a good way to reach this goal with files that do not change size greatly over time.
• Recently, Extendible Dynamic Hashing guarantees one or at most two disk accesses no matter how big a file becomes.

History of File Structure

You May Also Find These Documents Helpful

Nt1310 Unit 4 Exercise 1

Nt1310 Unit 4 Exercise 1

ECET 370 Week 5 Lab 5

ECET 370 Week 5 Lab 5

Cse 373 Final Note

Cse 373 Final Note

Pt1420 Unit 1 Assignment 2

Pt1420 Unit 1 Assignment 2

United States Bankruptcy Law and Java Methods Answers

United States Bankruptcy Law and Java Methods Answers

Comp 220

Comp 220

Multiple Choice Questions about Medisoft

Multiple Choice Questions about Medisoft

Unit 8 Learning Journal Entry

Unit 8 Learning Journal Entry

Data Base Management System

Data Base Management System

Parallel Arrays

Parallel Arrays

Coronel PPT Ch01

Coronel PPT Ch01

Ai Viva Questions

Ai Viva Questions

computer science record

computer science record

Cobol

Cobol

Modern Database Management

Modern Database Management

Related Topics