The concept of information cleansing / scrubbing is to improve the quality of organizational information and thus the effectiveness of decision making businesses must formulate a strategy to keep information clean. This is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.
Specialized software tools use sophisticated algorithms to parse, standardize, correct, match and consolidate data warehouse information. This is vitally important because data warehouses often contain information from several different databases, some of which can be external to the organization.
In a data warehouse, information cleansing occurs first during the ETL process and second on the information once it is in the data warehouse. Companies can choose information cleansing software from several different vendors including Oracles, SAS, Ascential Software, and Group1 Software. Ideally, scrubbed information is error free and consistent.
Text Book - Business Driven Technology - Baltzan/Philips - Page 100 - 101 Definition: Data Cleaning
A process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of detected errors and omissions. The process may include format checks, completeness checks, reasonableness checks, limit checks, review of the data to identify outliers (geographic, statistical, temporal or environmental) or other errors, and assessment of data by subject area experts (e.g. taxonomic specialists). These processes usually result in flagging, documenting and subsequent checking and correction of suspect records. Validation checks may also involve checking for compliance against applicable standards, rules, and conventions.
The general framework for data cleaning (after Maletic & Marcus 2000) is: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error