Joeran Beel, Stefan Langer, Bela Gipp, and Andreas Nürnberger. 2014. The Architecture and Datasets of Docear’s Research Paper Recommender System. In Proceedings of the 3rd International Workshop on Mining Scientific Publications (WOSP 2014) at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014). Downloaded from http://www.docear.org.
The Architecture and Datasets of Docear’s
Research Paper Recommender System
Joeran Beel
Stefan Langer
Otto-von-Guericke University
Dept. of Computer Science
Magdeburg
Germany
Docear
Magdeburg
Germany
beel@ovgu.org
langer@docear.org
ABSTRACT
In the past few years, we have developed a research paper recommender system for our reference management software
Docear. In this paper, we introduce the architecture of the recommender system and four datasets. The architecture comprises of multiple components, e.g. for crawling PDFs, generating user models, and calculating content-based recommendations. It supports researchers and developers in building their own research paper recommender systems, and is, to the best of our knowledge, the most comprehensive architecture that has been released in this field.
The four datasets contain metadata of 9.4 million academic articles, including 1.8 million articles publicly available on the Web; the articles’ citation network; anonymized information on 8,059 Docear users; information about the users’ 52,202 mind-maps and personal libraries; and details on the 308,146 recommendations that the recommender system delivered. The datasets are a unique source of information to enable, for instance, research on collaborative filtering, content-based filtering, and the use of reference management and mind-mapping software.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – information filtering.
General Terms
Algorithms, Design, Experimentation
Keywords
Dataset,
Citations: dcr_doc_id_54421) (Figure 4). This allows to apply weighting schemes, such as TF-IDF to citations, i.e