The task in this assignment was to correctly classify each instance’s genre of music. Taking this in mind I decided the best way to evaluate my models was to use the correctly classified instances percentage, as the way of judging which model was better than the others. I started by training all the models we had covered during the lectures then used the Multilayer Perception model that I had researched. I began training these models using their pre-set parameters. I used the correctly classified instance percentage to help me decide on the models that I would explore further. I chose the models that had scored over 90% when predicting the training data. This left …show more content…
me with KNN and Multilayer – perception.
I then started to change the parameter settings of these models to try and see the best results. This included the changing of my cross validation folds to avoid overfitting.
Once I had exhausted parameter changes I started to explore combing the models using different parameters and sets of combinations.
Before this point I hadn’t looked into the attributes. I wasn’t getting any better results so I decided to pre-process the data before trialling models. I looked further into the attributes and started to explore the select attribute functions. Using these filters I trimmed down my attributes 3 times (120, 80, 50) using the same models and combinations I had previously used. This then led me to my best performing model.
Q2: Describe your best performing model.
In your own words explain the algorithm/intuition behind it. What was your worst performing model? Why do you think it did not perform well?
Best performing model
My best performing model was the using the multilayer perception classifier with a learning rate of 0.2 . I filtered down the attributes to the most useful 120. I chose 120 because it when I ran the ‘filterattributeeval’ search with the ‘ranker’ function I took all the attributes ranked 0.1 or higher. I felt that anything ranked below 0.1 could be an interference with the accuracy of the prediction. Filtering my attributes to a lower amount also helped to speed up the run time of the classifier as it didn’t have to compute as many parameters.
How the model works (Beri, 2013)
The multilayer perception algorithm is a network made up of many neurons which is split into many layers:
The input layer – the number of neurons depends on the number of inputs/instances we are trying to predict.
The hidden layers – can be one or more layers between the input and output layers. Its job is to convert the input into the output or in the case of my assignment, place the input into the correct genre of
music.
The output – what we wanted to predict. Each input in its correct genre of music.
Multilayer perception uses backwards propagation to help train the data. (Bengio, 2013) Simply put backwards propagation helps the algorithm learn from previous mistakes and corrects them when made. As this is a supervised learning task we know the desired outputs. Every time there is an input it is looked at and compared to the output expected (that we already know). If there is an error it is then sent back to the previous layer and adjusted. This is repeated until the correct output has been achieved. Once the algorithm has run through all the training data it is considered ready to tackle new data as it has learned from its mistakes and from the target labels.
Worst performing model
My worst performing model used the Naïve Bayes classifier. The main reason being this classifier assumes that each input attribute is individual. With this data set the some of the attributes co-inside with each other so when evaluated on their own they hinder the accuracy of the prediction. The amount of instances in the data set also caused contributed to the poor performance of the model.
Q3: If you had one more month for modelling, what approaches would you try to improve the prediction accuracy? Why do you think those approaches may improve the performance?
If I had one more month for modelling, I would try to understand the attributes better. This would involve cleaning the data to get rid of missing values or potentially trying to find a value for them, this would make the dataset more complete meaning that it would be easier for algorithms to work. I would also look into filtering out attributes which weren’t very useful in classifying what type of genre the clip was.
During that time I would also look into the combination of attributes. This would help to reduce the size of my data set as well as get rid of some of the unhelpful noise. I feel that this would help to improve performance because the combination of certain attributes could help to present better information about the target labels. SQL would provide a more unrestricted method of exploring the data set.
I would also look into whether the accuracy improves if you take the average values of certain attributes for example parameter 38 or if there is a better accuracy rate if they are taken as individual attributes, parameters 4-37.
Having a month extra to try and improve my predication accuracy would allow me to explore different versions of machine learning software such as R and MATLAB or try to use ScikitLearn for Python. (Hartl, 2012 and Grennan, 2014) They are all more powerful and flexible than WEKA. This would provide me with opportunities to run more complex models in an efficient manner. These packages allow the importation of libraries which provides a bigger degree of freedom to clean, explore and transform my data sets. There is also a lot more freedom when editing algorithms.
Q4: Imagine you have been employed by an online search company, specializing in categorizing audio clips. You have been asked to recommend one model that the company will implement for automatic genre recognition in the clips. What would you recommend? You need to provide arguments, why this recommendation would be the best choice for the company.
Having evaluated a range of predictive models for automatic music genre recognition, I would recommend the K-Nearest neighbour (KNN) method for categorizing audio clips. I believe that using this algorithm will lead to accurate and efficient automatic genre recognition using the company’s datasets.
Why am I making this recommendation?
When I performed my tests I explored different prediction algorithms using a range of parameter settings. I used different parameter settings and pre-processed the data to see how different models performed. The best performing models were Multi-layer perception and KNN.
I will explain how the algorithms work, their advantages and flaws. I will explain why KNN is the model that I recommend; the model which is best suited to the company’s needs.
Multi-layer perception (Rita, 2006 and Singh and Singh Chauhan, 2009)
The multilayer perception algorithm is a network made up of many neurons which is split into many layers:
The input layer – the number of neurons depends on the number of inputs/instances we are trying to predict.
The hidden layers – can be one or more layers between the input and output layers. Its job is to convert the input into the output or in the case of the business’ requirements, place the input into the correct genre of music.
The output – what we wanted to predict. Each input in its correct genre of music.
Multilayer perception uses backwards propagation to help train the data. Simply put backwards propagation helps the algorithm learn from previous mistakes and corrects them when made. As this is a supervised learning task we know the desired outputs. Every time there is an input it is looked at and compared to the output expected (that we already know). If there is an error it is then sent back to the previous layer and adjusted. This is repeated until the correct output has been achieved. Once the algorithm has run through all the training data it is considered ready to tackle new data as it has learned from its mistakes and from the target labels.
Multi-layer perception models are considered as one of the most complex predictive modelling algorithms. They have the ability to learn how to do the task based on the training data or the experience and the predictions are entirely based on what they learned during the training phase. As the model starts to learn more and more about the data it will start to fine tune new data coming in based on previous data it has come across creating a chance for overfitting. This model can also be slow to try but cleaning the new data to get rid of any missing data values could help to increase the speed of the model as it would have less to deal with.
K-Nearest neighbour (KNN) (Bafandeh Imandoust and Bolandraftar, 2013)
The K-Nearest Neighbour algorithm is a method for classifying objects based on closest training examples in the feature space. Its predictions are based on distance. A new example arrives, the distance from the new example to each training example (neighbours) is measured and the class label from the nearest neighbour is used.
There two different ways to judge the distance between neighbours, Manhattan and Euclidean distance. KNN is simple and effective. Although run-time can be slower it can also handle noisy training data and large training data sets. Classification can also be slow if dealing with a large dataset due to the need to compute the distance of each instance to all of the training data. Choosing the parameters can be an issue as it is not clear which type of distance to use or how many nearest neighbours to use.
Conclusion – why I recommend KNN
The adaptive learning aspect of multi-layer perception puts this model above the rest as it is constantly learning and will easily adapt to new audio clips coming in but I wouldn’t recommend this model as it is slow to train and the company might not have the luxury in time to wait for the model to finish. KNN is simple and can handle large sets of training and classify large data quicker. It can be affected by noise in the data sets because all attributes contribute to the classification but by careful attribute selection it can be avoided. KNN is a white box approach to predictive modelling so it is interpretable meaning that the predictions can be explained.
Taking these points into consideration I think that KNN would be the best model for this company. The simplicity of the model wouldn’t require them to use much time or money in implementing it. The ability for it to handle large data sets also makes it a lot more suited to this company. As they are an online search company they will be have a large database of clips. The speed at which it would be able to classify these clips will be important; customers using online services will want a fast service.