4:AM Hack Day retrospective

This year’s hack-day was cast as a do-a-thon. Although the origin of the term ‘hackday’ (and ‘hackathon’) borrows the word ‘hack’ from computer programming (in the “messy prototype” sense), the value in a hack-day doesn’t lie in the code that lays strewn around at the end of the day, but the ideas that were explored. Stacy and I hosted this year’s, and from the start we decided that we should broaden the appeal, encouraging anyone who had an idea or a desire to explore them, to participate.

Over the course of the Altmetrics17 Workshop and 4:AM Conference, it became clear that many of the ideas flying around concerned societal hurdles much as they did technical ones. Two ideas that came out particularly strongly were, “Can we fairly make assessments based on social media interaction when various social groups, in particular women, are subject to mistreatment online?” and “How do we track the outputs of people who don’t participate in ‘mainstream’ scholarly publishing norms?”.

We started the day with a roster of ideas, submitted from conference attendees, remote attendees, and participants in the hack day. After several rounds of votes and discussion, we whittled the submissions into four groups:

1: Using “motionchart” to plot the Reddit data from Crossref Event Data.

2: How do you capture metrics on research outputs that don’t have a DOI?

3: What would feminist approaches to altmetrics look like?

4: What proportion of scholarly links shared on Reddit are or have open access versions, according to the oaDOI API.

Over the course of the day we talked, hacked, wrote and ate. Pleasingly, we had a remote participant who did some interesting work into Reddit.

Given the sheer number of ideas that were generated and refinements that were made, we could never have hoped to cover all of the topics initially suggested at the start of the do-a-thon. The do-a-thon working document shows all the ground we covered exploring just the four above!

Toward the end of the day, each group showed their findings. Some of these projects will run into longer pieces of research, and some participants are already talking about the next steps. I have summarised each project below from the notes and contributions, and hope that we’ll see longer write-ups and research projects that had their roots at 4:AM.

Using “motionchart” to plot the Reddit data from Crossref Event Data

From Ola Andersson: “This is the visualization implementation of Reddit event data that Hans Zijlstra and I developed. It is displaying Reddit data taken from the Crossref Event Data. The visualization tool, motionchart, is developed by Google. Take a moment and play around with the tool components and see different ways of visualizing the event data. By looking at the source HTML code in the web browser, it is possible to see how the tool can be used with a combination of javascript and HTML.”

How do you capture altmetrics for peer reviewed content that isn’t assigned a DOI or other indicator? How can we improve coverage as a community?

Many publications don’t have a DOI or similar identifier, and most altmetrics aggregators need some kind of identifier to accurately track when content is mentioned online. The lack of DOIs could be because of a lack of funding resources (in some countries, funds compete for basic needs), technical skills, interest, awareness or understanding.

A government or institution-led policy would help communicate the importance of, and possibly mandate the use of, DOIs. There is a role for DOI Registration Agencies, such as Crossref and DataCite. Crossref in particular has an active and growing community outreach effort and are working with increasing numbers of parties to assign identifiers and take on the responsibilities necessary to make them work.

There are a number of outstanding questions: Why do publishers who are aware of Crossref not assign DOIs for publications like Masters and PhD theses? Can we identify where these researchers are publishing so we can bring them on-board? How feasible is it to assign DOIs to different types of output, including presentations, proceedings, and preprints with DataCite and Crossref. The infrastructure is already in place. Could it be used more widely, and if so, why not already?

Even if general best practice were applied across the board, there are also other potentially confounding factors that should be taken into account. We need some kind of analysis of disciplines so we can try to quantify subject-specific skew. We need to talk to professional societies, agencies and researchers to get their insight about the lack of uptake. They in turn might be able to identify significant publications which may form a representative corpus of currently unregistered works.

DOIs aren’t the be-all and end-all. If there is a better, easier way to track content, for example some kind of stable URL, would that be a reasonable replacement for a DOI for this kind of tracking?

Some tentative solutions were suggested: DOI mediation with local country agencies to bring the message into places that Crossref doesn’t currently touch. Allow organisations to pool resources, bringing the ability to assign DOIs to new places. Better co-ordination of different publications or departments of a given institution, to improve consistency of approach.

This is a sizeable question, and the notes, which you can find in the working document, go into more detail.

What would feminist approaches to altmetrics look like?

Much of this group’s time was spent rereading canonical feminist theory (cf. Crenshaw’s “intersectionality” corpus) and discussing how it could apply to understanding and providing altmetrics and, relatedly, how female academics participate in engagement given the challenges of erasure and harassment endemic to online life.

We discussed some purely technical means of addressing various challenges for altmetrics: automated detection and flagging or suppression of altmetrics data that contains harassing mentions; identifying and reporting on gender parity for departments and organizations’ research, vis a vis altmetrics and support given for online engagement training; and crowdsourcing the identification of useful and not useful (i.e. harassing) mentions in altmetrics data.

We also took an hour to investigate whether gender studies research was disproportionately the target of high-profile, trollish Twitter users (who we won’t name here). We did this by exporting from Altmetric Explorer all papers mentioned by one such user in particular, then used a quick-and-dirty script to query the Altmetric API for the related publisher-assigned subjects for the journals that published those papers.












After plotting the related subjects (seen above), we realized that gender studies research – being interdisciplinary – often is not categorized under the “gender studies” subject area by publishers, or is published in other disciplinary journals. So, while our initial approach disproved our hypothesis, we’re going to continue tinkering to confirm the subjects most likely to be targeted by trolls who share research online.

What proportion of scholarly links shared on Reddit are, or have, open access versions, according to the oaDOI API?

Reddit was mentioned a few times during the conference and, although it’s not a novel source it piqued some people’s interest. Both Altmetric.com’s Explorer and Crossref Event Data capture Reddit links to articles, so we collected data from both sources.

All Reddit links from September that could be identified as a DOI were collected from each source, and submitted to the oaDOI service API. Note that this isn’t an official canonical source, and it somewhat conflates different versions of the same article. But this prototype was an interesting experiment, and there was value both in getting an approximate headline number and in connecting scholarly APIs.

The oaDOI API gave a value of “true” (when the content at the DOI was recognised as Open Access, or there was an alternative free version available) “false” or “unknown” when the DOI wasn’t recognised.

The headline figures were that using the Reddit links from Crossref Event Data, 23% of links were Open Access and 76% were not. Using the links from Altmetric.com, 21% were Open Access and 58% were not, with 20% unknown. Given that we didn’t have much time to look into the detail, it was interesting to note that data derived from the two sources generally agreed, and that most links shared on Reddit were not Open Access by any available measure.

This gave rise to some further conversations about possible hypotheses.

One suggestion was that the people sharing the links in question were members of institutions with access. If this were true, it would mean that Reddit, which is a very large site with many members and specialised communities (“subreddits”), had cliques of people discussing articles that only they had access to and the broader public did not. This raised questions about the behaviour of people using a public medium to discuss “private” content, challenging the idea of “citizen scientists”, suggesting instead closed communities.

Another suggestion was that much of the posting might be attributable to promotional link spam from publishers, for whom a link on Reddit is both valuable and has no opportunity cost.

Both hypotheses could be tested by looking at the pool of authors who post links, the degree of discussion attached to each, and the cliquiness of the network of people who participate in these discussions. These could be measured against the license of the content being discussed.

Also notable was Bianca Kramer’s remote participation in our group’s topic, which she documented at https://github.com/bmkramer/4amhack_reddit_OADOI

The discussion document is available here, and is both more detailed and more fragmented!

See you next year!

Overall, we had a lot of fun (and based on the feedback we’ve received from do-a-thon participants, it seems they did, too!) We look forward to continuing our exploration at the 5:AM do-a-thon!

We’d like to thank the conference organisers and the Social Media Lab at the Ted Rogers School of Management at Ryerson University for hosting us and giving us a great opportunity to dig into these ideas.

Joe Wass & Stacy Konkiel