DM Defined Is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner
Process of analyzing data from different perspectives and summarizing it into useful information
A class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior.
DM Defined The relationships and summaries derived are referred to as models or patterns. Examples include linear equations, rules, clusters, graphs, tree structures and recurrent patterns in time series.
Utilizes observational data as opposed to experimental data. Data that have already been colleted for some purpose other than data mining analysis.
The relationship and structures sort, should be novel. Its of little point regurgitating unless the ‘confirmatory hypothesis’ is used.
“Concepts”
• Definition: A “concept” is a set of objects, symbols or events grouped together because they share certain characteristics.
Concept set, class, group, cluster, roughly
• Classical View: Concept Set with well defined deterministic inclusion rules. E.g. A home owner is a good credit risk.
• Probabilistic View: A set with probabilistic inclusion rules. E.g. A home owner has an 80% chance of being a good credit risk.
• Exemplar View: this states that a given instance is determined to be an example of a particulalr concept if the instance is “similar enough” to a set of “one or more known examples” of the concept. Eg. Mr. Smith owns his own home and is a good credit risk.
Example: An Investment Dataset
Possible Business Questions
“Supervised” Leaning
In last two questions, we distinguish ONE of the attributes that we would like to be able to determine from the values of the others.
• What characteristics distinguish between Online and Broker investors? (DISCRIMINATION). (Transaction method