Initially a connection is established. To perform the task of performing data mining through excel first a connection needs to be established to sql server. Server used is infodata.tamu.edu.
Classification- Builds a model that describes the class (target) attribute as a function of input attributes. The outcome is a decision tree or a neural network or a logistics regression.
Below a series of screen shots, using classification and setting “OCCUPATION” as the target attribute, analysis is done and results are interpreted.
Here, excluding ID and occupation, all the other rows are selected. ID would not be of much importance in this case at this stage, so ignoring it would be better.
A screenshot of the result generation by the classification function under data mining can be seen.
The result generated is a decision tree covering Income, region and Education with various outcomes.
Analysis: Income >=90000 1. In the income group of customers earning more than 90000, there is a probability that 17.07% people are in clerical profession and 16.65% are in the management profession. 2. Only a 12.32% customers would be in the manual occupation
Analysis (Income >=58000 and <= 90000): 1. Customer earning income in the range 58000 to 90000 would mostly be in the professional, management and skilled manual occupation. 2. Customers more than 60 years, 16.63% fall in the management profession. 3. Customer less than 60 years, mostly are in professional occupation and 27.3% fall in skilled labor category. 4. Apart from the 3 occupations, no other category is present.
Analysis: Income >=26000 and <=42000 1. Customers not ‘High School’ (but some other field like Graduate degree or partial high school) and do not reside in the ‘Pacific’ region are mostly in clerical jobs. About 48.48% are present in clerical occupation and 37% in skilled
Bibliography: * Adriaans, P and D Zantinge (1996). Data Mining. Harlow, England and elsewhere: Addison-Wesley. * Bordon, VMH (1995). Segmenting Student Markets with a Student Satisfaction and Priorities Survey. Research in Higher Education 16:2, 115-138. * Neville, PG. (1999). “Decision Trees for Predictive Modeling,” SAS Technical Report, The SAS Institute. * Data Mining Algorithms. (n.d.). Retrieved from Data Mining Algorithms: http://msdn.microsoft.com/en-us/library/ms175595.aspx * Data mining SSAS. (n.d.). Retrieved from Data mining SSAS: http://msdn.microsoft.com/en-us/library/bb510516.aspx * Statistical_classification Wiki. (n.d.). Retrieved from Statistical_classification Wiki: http://en.wikipedia.org/wiki/Statistical_classification