The report focuses on data mining approach to predict human wine taste preferences. A large data set is considered with white and red wine samples (“Vinho Verde” wine from Portugal). The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
Datasets Considered:
Each record contains 12 attributes. Each record contains a set of attributes and one attribute (quality) is the class.Theattributes considered are:
1 - fixed acidity, numeric
2 - volatile acidity, numeric
3 - citric acid, numeric
4 - residual sugar, numeric
5 – chlorides, numeric
6 - free sulfur dioxide, numeric
7 - total sulfur dioxide, numeric
8 – density, numeric
9 – pH, numeric
10 – sulphates, numeric
11 – alcohol, numeric
12 – R/W, nominal – R= red, W = white
Class: quality (score between 0 and 10)
Software used - WEKA 3.6.9
A data set of 6497 instances was considered for training. The entire set of data was again considered for cross validation of the model created from training data. From the data set, 21 records was chosen for prediction of class.
Data mining technique used
Classification technique has been used for the project which incorporates analysis of training set and test set to determine the relationship between various attributes with the class and also determines the accuracy of the training set analysis and test set analysis. Random sample is considered later for the prediction using the model built. Multilayer perceptron model has been used to make the prediction.
Training Data
Log obtained in WEKA software for the training set
=== Run information ===