ABSTRACT
The Internet Movie Database (IMDb) is an online database of information related to movies, television shows, stars, etc. We chose to do our project from 2008 to 2011 year’s movie database. We extracted data like Movie, Director, Star, Image Url, Studio from the IMDb website. For this extraction of data we used a tool named Mozenda. After the data extraction, the data was analyzed. For a particular star, his/her movie, director, studio with whom the star has worked was shown. A Graphical User Interface (GUI) for the same was developed. According to this GUI, when the user selects a Star his/her respective movies, directors, studios are displayed. A graph for the extracted data is also shown. For this a tool named NodeXL is used. This graph is having star and movie as the nodes and an edge is the relation between the star and the movie which shows that the star has worked in the movie and vice versa.
DATA EXTRACTION TOOL: MOZENDA
This tool was used to extract the web data. In the Mozenda agent builder, the url www.imdb.com was entered. The website page gets loaded in the agent builder. One can navigate through the pages from where to extract the data. We chose to extract data from January 2008 to April 2011. So the url for January 2008’s webpage (http://www.imdb.com/nowplaying/2008/1/) was entered. After the January 2008’s webpage is loaded, start new Agent from this page on the agent builder is clicked. As we have to extract the same set of data like movie name, director, image, studio for each movie, Create list of items on the agent builder is clicked. The movie names of the first two movies on the webpage are selected. Then a dialog box appears. A respective filed name like Movie is given. Same procedure is repeated for Director, Studio, Image Url. As we want to extract same type of data from multiple pages, Add list pager on the agent builder is clicked and then next month is clicked. Now the software