Performance of a database can be greatly impacted by the manner in which data is loaded. This fact is true regardless of when the data is loaded; whether loaded before the application(s) begin accessing the data, or concurrently while the application(s) are accessing the data. This paper will present various strategies for locating data as it is loaded into the database and detail the performance implications of those strategies.
Data Clustering, Working Sets, and Performance
With ObejctStore access to persistent data can perform at in-memory speeds. In order to achieve in-memory speeds, one needs cache affinity. Cache affinity is the generic term that describes the degree to which data accessed within a program overlaps with data already retrieved on behalf of a previous request. Effective data clustering allows for better, if not optimal, cache affinity.
Data density is defined as the proportion of objects within a given storage block that are accessed by a client during some scope of activation. Clustering is a technique to achieve high data density. The working set is defined as the set of database pages a client needs at a given time. ObjectStore is a page-based architecture which performs best when the following goals are met:
• Minimize the number of pages transferred between the client and server
• Maximize the use of pages already in the cache
In order to achieve these goals, the working set of the application should be optimal. The way to achieve an optimal working set is via data clustering. With good data clustering more data can be accessed in fewer pages; thus a high data density rate is obtained. A higher data density results in a smaller working set as well as a better chance of cache affinity. A smaller working set results in fewer page transfers. The following sections in this paper will explain several clustering patterns/techniques for achieving better performance via cache affinity, higher data density and a smaller