User Id is : 2011143967
Tags Hits
FIFA World Cup 2014
2186
Web Exclusive
1647
Football
1320
Special
1296
Brazil
1224
Uruguay
845
Argentina
775
Netherlands
756
Greece
703
Method 1
Tag Counting: In this method, we simply count the number of hits that user has generated on a tag. For example, data is stored as shown in the table above. Then we select the top 3-4 tags that the user interested in, in order to profile the user.
Advantages:
Simple and fast algorithms
Disadvantages:
The relationship among the tags is missing.
This information is very important. Take the above example for instance.
The user may have read an article “Fifa World Cup 2014 Football Special Web Exclusive”. But, if we read the data from tag counting method, we can’t make out the relation between “Fifa World Cup 2014”, “Football” ,“Special” and “Web Exclusive”. Especially, the words “Special” and “Web Exclusive”
Carry no meaning themselves, unless we combine them with other tags – which only the user knows.
Hence, we are quite a bit away from what the user was actually looking for.
The volume of data is quite high.
This method generates vast amounts of data – one set for each new tag - which is down-scalable. Processing a down-scaled data is essentially more accurate and up to the point.
Does not give any importance to the season of the visit.
Time/season in which a user visits the web page is quite important. Think this way. A user is mad about cricket. He does not even think about football or any other sports for that matter. Be it the next day of soccer world cup final, he would still read a news about Ranji Trophy. Now, considering this is the season of football world cup, if a user visits pages of cricket, it tells much more about the user than simply saying user has visited a couple of cricket-related pages.
The time user spends on each page.
Apparently, it tells a lot more about the user. A user spending