The Architecture and Datasets of Docear’s Research Paper...

The Architecture and Datasets of Docear’s Research Paper Recommender System

Pre-print of
Joeran Beel, Stefan Langer, Bela Gipp, and Andreas Nürnberger. 2014. The Architecture and Datasets of Docear’s Research Paper Recommender System. In Proceedings of the 3rd International Workshop on Mining Scientific Publications (WOSP 2014) at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014). Downloaded from http://www.docear.org.

The Architecture and Datasets of Docear’s
Research Paper Recommender System
Joeran Beel

Stefan Langer

Otto-von-Guericke University
Dept. of Computer Science
Magdeburg
Germany

Docear
Magdeburg
Germany

beel@ovgu.org

langer@docear.org

ABSTRACT
In the past few years, we have developed a research paper recommender system for our reference management software
Docear. In this paper, we introduce the architecture of the recommender system and four datasets. The architecture comprises of multiple components, e.g. for crawling PDFs, generating user models, and calculating content-based recommendations. It supports researchers and developers in building their own research paper recommender systems, and is, to the best of our knowledge, the most comprehensive architecture that has been released in this field.
The four datasets contain metadata of 9.4 million academic articles, including 1.8 million articles publicly available on the Web; the articles’ citation network; anonymized information on 8,059 Docear users; information about the users’ 52,202 mind-maps and personal libraries; and details on the 308,146 recommendations that the recommender system delivered. The datasets are a unique source of information to enable, for instance, research on collaborative filtering, content-based filtering, and the use of reference management and mind-mapping software.

Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – information filtering.

General Terms
Algorithms, Design, Experimentation

Keywords
Dataset,

Citations: dcr_doc_id_54421) (Figure 4). This allows to apply weighting schemes, such as TF-IDF to citations, i.e

The Architecture and Datasets of Docear’s Research Paper Recommender System

You May Also Find These Documents Helpful

psy452

psy452

COMM 111 study guide midterm

COMM 111 study guide midterm

Identifying Scholarly Peer Reviewed J

Identifying Scholarly Peer Reviewed J

Scavenger Hunt

Scavenger Hunt

Discussion 3 4

Discussion 3 4

Credibility Assessment of Peer-reviewed and Non-peer-reviewed Sources

Credibility Assessment of Peer-reviewed and Non-peer-reviewed Sources

sandel paper

sandel paper

Effective and Efficient Ways of Finding The Best Resources For Research Needs

Effective and Efficient Ways of Finding The Best Resources For Research Needs

President and Congress

President and Congress

Pubmed Databases

Pubmed Databases

Popular And Scholarly Article Analysis

Popular And Scholarly Article Analysis

Alice Walker: the Achievement of the Short Story

Alice Walker: the Achievement of the Short Story

How to Use a Computerized Index to Do Research

How to Use a Computerized Index to Do Research

Google Scholar and PubMed as Scholarly Content Database: A User's Perspective

Google Scholar and PubMed as Scholarly Content Database: A User's Perspective

Bibliometric laws

Bibliometric laws

Related Topics