The Value of Connected Systems

This post is written by Tim Bowman, Professor at Wayne State University.

First up was Joe Wass, the Principal R&D Engineer at CrossRef, who presented information about the development of a transparent and open data source developed at CrossRef called Crossref Event Data. Joe describes CrossRef and its business of registering content and issuing DOIs. He notes that CrossRef contains approximately 100 million items.

What they are trying to create with this new data system is intended to be for everyone; they aren’t creating metrics, just making links. The new data they capture is referred to as an “event” and is coming from outside both Crossref and scholarly publishing; it’s not something easy to track because these event links aren’t consistent, there’s a diverse range of platforms, there is no “publisher” to curate, and there is no structured metadata. In addition, the plethora of event links and contexts are always changing. An “event” is an occasion in which they see a link. They are tracking Reddit, Twitter, Wiki,, wordpress, blogs, and Stack Exchange. This new data source contains artifacts; a snapshot of what they know at the time of capture. They also have evidence records and logs; everything they do at the time of capture, including failures and successes. All the code is open source and all the data is free under CC license; all versions. This allows for reproducibility in research. The bottom line: Crossref will collect this event data, it’s our (as researchers) job to interpret it. (

Next up was Jason Priem, from ImpactStory and co-developer of Unpaywall and Buzzpaper, who presented readership data gathered from the use of Unpaywall ( He started by describing that we are ready for a leap from viewing an online artifact as a static source of information to a period where we can view data from the network involved with the events relating to online artifacts. What got Jason excited about at the beginning of the altmetric movement was discovering how many people are reading a “thing?” Current sources of readership data include PLOS and other publishers, COUNTER, special arrangement with publishers (MESUR), Mendeley (by proxy), and CrossRef DOI resolutions. Unpaywall, a browser extension that is free, open source, web extension to help people find legal, open access copies of scholarship, now enters the picture. Unpaywall is powered by oaDOI; oaDOI is a database of 90M papers with DOIs and links to OA version of ~15M papers. oaDOI API handles between 500k and 2M requests per day.

Unpaywall has 100k+ active users from 100+ countries, has captured 4.6M total events, is seeing 80k new usage events per day, and has tracked 2M DOIs with an event. Unpaywall can track events, not just counts; it offers fine time granularity and IP-derived categorizations (such as geo location). Jason presented Unpaywall data and compared it with tweets, as described by audience; to do this, Unpaywall data can be viewed by IP and tell if readers are on university campus in order to compare university readers versus those outside of the university. The data also provides researchers with the ability to look at country, network of co-readership, article recommendations, and prediction.

Lastly, Kornelia Junge, who is the Senior Research Manager at Wiley, presented information on navigating the space between open science and business intelligence. Kornelia started by arguing that all research isn’t being captured by altmetrics. Kornelia then went on to state that corporations don’t publish articles, they publish patents; she called for us to pay closer attention to these corporate researchers. She found that authors who published throughout 8-year period had an average of 1 article per year. Output from academics who started in 2013, is less than 1 article per 2 years. When examining corporate research, she argued that authors don’t necessarily publish standalone articles, instead they may publish a report, patent, clinical trial, white paper, annotated dataset, or email. Some of these take more work than an article, others less. She then went on to examine how many pieces of information exist worldwide for corporate researchers? The problem with finding this answer is that corporate researchers don’t share the data. So instead, she focused on how many researchers exist? Data from UNESCO Institute of Statistics revealed that researcher counts (Headcounts) cover 148 countries; researcher counts by country (FTE) was 130; UNESCO presented data by sector also; this included business, higher education, government, and private nonprofit. This dataset represents 97% of world population. She found that the majority of researchers work in industry, followed by academia. Kornelia estimated that there are over 9 million researchers, with more than half of them working in corporate environments.

She then looked at what we know about this large population and argued that corporate research must make a difference for the business, that they need to minimize time-effort to produce results, and that there was a difference between the types of tools available to corporate researchers in large and small companies. In addition, these researchers cannot trust unknown sources when performing research. In this corporate world, she argued that researchers are limited by nondisclosure agreements, no independent peer review, and have no insight into competitor research. The quality of output might suffer because of this. She argued that corporate research can be more transparent and would be benefit for companies and society. They should be encouraged to publish more in peer-reviewed journals and work as reviewers and editorial board members. In addition, corporations should sponsor research information infrastructure for society.

This was a great session and involved some very interesting new tools and data sets, which are being made available to the research community. In addition, Kornelia’s presentation on corporate researchers sheds a spotlight on a vast community of scientists who remain relatively unknown.