Case Study Questions:
1. Why was it so difficult for the IRS to analyze the taxpayer data it had collected?
Initially, IRS data were stored in legacy systems designed to process tax return forms efficiently and organized in many different formats, including hierarchical mainframe databases, Oracle relational databases, and non-database “flat” files. The data in the older style hierarchical databases and “flat” files were nearly impossible to query and analyze and could not easily be combined with the relational data.
2. What kind of challenges did the IRS encounter when implementing its CDW? What management, organization, and technology issues had to be addressed?
The challenges the IRS encountered when it implemented its CDW include:
Management: Convincing the organization to undergo a sweeping upgrade like a data warehouse implementation was not easy, since government agencies are normally risk-adverse and resist changes. Data warehouses require extensive effort to keep up-to-date.
Organization: The structure of data wasn’t consistent because of tax law changes through the years. This made integration of the data a complicated process. The sheer amount of data that the CDW was slated to manage was far more than anything the IRS had previously handled. Data warehouses tend to require extensive amounts of money to keep up-to-date.
Technology: The CDW has grown in capacity from three terabytes at its creation in the late 1990s to approximately 150 terabytes of data. The most important feature of the data warehouse was that it be sufficiently large to accommodate multiple terabytes of data, but also accessible enough to allow queries of its data using many different tools. The components that the IRS selected allowed CDW to do that. Conversion of the legacy data to the new system was not a uniform process.
3. How did the CDW