Research data repositories suitable for LSHTM researchers

Publishing data to accompany your journal articles

Data sharing can help you to increase the impact of your research. Studies by Piwowar, Day and Fridsma (2007) and Piwowar and Vision (2013) have found that journal articles with accompanying data receive more citations in comparison to those with no accompanying data, and that data is often used in new research, leading to the original creators being cited in data reuse papers. In this blog post I’ll discuss how you can publish resources – data, processing scripts, code and other material – with journals and digital repositories, and consider new opportunities offered by data journals.

Option 1: Publish open resources as supplemental material

The first and simplest approach is to make resources available as supplementary information with the paper itself. Depending upon the journal, these files may be hosted on the journal’s server or a repository service, such as Dryad and Figshare, and referenced in the paper.

Journal supplementary resources

This approach is appropriate for resources that can be made openly available and offers some time-saving, since the journal staff handle the file hosting process on the authors’ behalf. However, there are limitations – some journals specify size limitations on the files that they will host on their server (PLOS indicate files should be a maximum of 10MB) and few journals assign a Digital Object Identifier (DOI) to these resources (at present). Authors wishing to cite data as a research output in its own right are therefore forced to cite the URL or the resource as a subset of the paper.

Option 2: Deposit open/restricted resources with a digital repository

A second approach is to upload your resources to a digital repository and cite the Digital Object Identifier (DOI) in the article. Digital repositories are often viewed simply as file hosting services by many researchers, but they provide functionality beyond file storage. Many repositories provide content preservation, ensuring that content remains usable in the long-term by converting it in new file formats and ensuring documentation is up-to-date. They also publish structured XML metadata describing the resource’s content, which makes it easier for researchers to locate new content relevant to their research through research data catalogues and other third party services.

Journals that encourage authors to publish resources in a digital repository are often non-prescriptive in the one that should be used, simply stating that it should be able to assign a persistent identifier (such as a DOI) and maintain the content for at least 10 years. However, they do establish conditions on how resources are made available, encouraging authors to publish them as open access under a Creative Commons Zero or Creative Commons Attribution (CC-BY) licence, where feasible. The following image illustrates the data deposit process.

Data deposit process

LSHTM staff, students and their collaborators are allowed to deposit digital resources with the School’s research data repository, LSHTM Data Compass.  In addition to providing access to open data, the repository offers a controlled access mechanism for restricted data, whereby interested parties can express interest in a dataset and open up a dialogue with the data custodian/corresponding author to gain access.

Other appropriate services can be found through the Registry of Research Data Repositories. These include domain-specific archives such as the UK Data Service, content-specific systems such as Flow Repository and PlasmoDB, and the many general repositories such as Dryad, Figshare and Zenodo.

Citing digital resources in your journal article

Journals often have specific requirements on how digital resources are cited in papers. Consult the LSHTM RDM pages for guidance on writing a Data Access Statement and citing data in your Reference List. A small number of journals prohibit all forms of non-paper citation. In these circumstances, it’s possible to cite data indirectly by publishing a Data Paper which describes the digital resource.

Taking the next step: Publishing a data paper

If you’ve deposited data in a digital repository, you can also publish a Data Paper that builds upon this work. A data paper, sometimes referred to as a Data Descriptor or Date Note, is a new form of scholarly publication which describes the information content and reuse potential of one or more data collections. Its purpose is to build awareness and use of data made available to the wider research community, rather than discuss a research argument or hypothesis. Data creators who publish a data paper will gain academic credit by having their paper published in a peer-reviewed journal.

Data papers may be published in several locations: journals such as Biodiversity Data Journal, F1000Research, PLOS and GigaScience accept several types of article, including data papers. In addition, there are a small, but growing number of data journals in existence, including the Journal of Open Health Data, Journal of Open Research Software, and Nature’s Science Data. Each journal provides a document template and guidance material that describes the key areas for authors to cover. The final publication are typically made available in an XML format for web presentation and as a PDF document. Article Processing Charges (APCs) for publication vary significantly, ranging from £100 for Open Health Data to £890 for Scientific Data. However, in all cases (so far) the data journal has followed open access principles, publishing the paper under a Creative Commons Attribution licence, whereby the author retains copyright but grants first publication rights to the journal.

Further advice

Data-based publication are a new and developing area. If you would like to deposit data with a digital repository or publish a data paper, you can contact the LSHTM Research Data Management Service for advice by emailing

Print Friendly, PDF & Email

No comments yet.

Leave a comment