Common Map of Academia

Goal

Our goal is to collect and organize publicly accessible bibliographic metadata and research information into the Common Map of Academia. We aim to lower the barriers of entry to scientometric studies by performing the onerous task of data preparation and releasing its results to public under an open license.

Data Sources

Common Map of Academia is based solely on publicly available sources. In terms of volume, the two largest data sources are: web pages processed via CommonCrawl and repositories harvested using the OAI-PMH protocol (both are already ingested). But we have appetite for more: PDF files linked from the web pages, Directory of Open Access Journals, arXiv.org, DBLP, NPG Linked Data, Open Access subset of PubMed Central, and many more.

Do you know other publicly available sources of information? Please contact us!

Under the Hood

We employ state-of-the-art machine learning techniques for document deduplication, author name disambiguation, keyword extraction, document analysis (page segmentation, zone classification), etc. Thanks to Apache Hadoop and a modest cluster we are able to handle tens of millions of records.

Download and Experience

Feel free to download the compressed RDF file containing COMAC data. Use them in your data processing projects.

Explore data visually with COMAC Navigator for better understanding the connections present in COMAC data.



 Some part of this software were created with cofinancing from the European Union funds as part of the European Regional Development Funds