Marc A. LeMoine
Northwestern University
435-CIS_SEC61 Introduction to Predictive Analytics & Data Collection
August 24, 2014
Executive Summary A reproach of Dr. Miller’s initial study on historical movie taglines. This follow-up analysis considered movie taglines between 1979 and 2014 which relates to my own personal “movie watching years”. The goal was to employ additional strategies including stemming and looking at various combinations of clustering algorithms, pairwise distance metrics and words extracted to create the terms by document matrix to understand impact on cluster efficiency. Ultimately looking to answer the question of how movies classes may have changed over the last 35 years based on movie taglines and does it seem consistent with my own observation over the past 35 years. The final model resulted in 6 clusters named as follows: American_Music, Fear_Evil, Action_Minded, Anything_Fat, Forced_Away , Beyond_Criminal . A chart of standardized text measures is provided at the right. This chart allows for a couple of conclusions:
1. For the most part there has been consistency over the period of time with convergence towards the end
2. We saw a rise into the 2000s of what appear to be health related reality shows
3. The last few years show a spike up in criminal drama
In addressing the question how have the classes changed over the last 35 years there seems to be a lot of neutrality and not a clear conclusion as I had expected. Further work on the clustering and feature extraction may help to improve.
Results of Analysis
The analysis started with a review of the taglines to determine if there were additional words that could be added to the stoplist that could sharpen up the clusters in terms of the meaning at the end. A few additional words were added given that the subject was movies. For example actor, actress and movie did not seem to add much to the goal of creating themes from the