This post introduces the steps needed by academic and non-academics to make their computational analysis in R citable.
Although there are overwhelming resources about licensing and citation for R software packages, there’s less attention paid to making non-package (data science) code in R citable. Academics and researchers who want to embrace Open Science practices are mostly unaware of how to make their R code citable before publishing in academic journals and what kind of license they may use to protect the intellectual property of their work.
Books and published journal articles have always been supplemented with DOIs (digital object identifiers), a key element in the process of research and academic discourse but there’s less attention paid to unpublished computational analysis, which is usually abandoned in GitHub (or in someone’s else hard drive).
As a researcher, we often spend long time planning and designing our project, collecting and processing data but we often can’t publish all the results and analysis if it deemed uninteresting to the journal reviewers.
As a researcher, if I find an interesting analysis in a GitHub repository (which I always do), I can’t cite it as:
From github.com/BatoolMM/MetagenomicsAnalysis
This is because the repository is inconsistent changes, the URL is unstable, and there’s no metadata (e.g. author, date, …) associated with the repository. Therefore, it’s best practice to generate a DOI and attach a metadata plus a license to the repository to generate a citation similar to:
Batool Almarzouq. (2021, June). Metagenomic analysis to the soil in Saudi Arabia. Zenodo. doi.org/10.5281/zenodo.4942110
A digital object identifier (DOI) is a persistent identifier or a unique ID to permanently identify a data, a software, an article or document and link to it on the web. These DOI are designed so your DOI links don’t break when a website gets updated.
DOIs are generated by publishing organizations or open-access repositories such as Zenodo. Zenodo is a general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN. The previous citation was produced by Zenodo. “Zenodo helps researchers receive credit by making the research results citable. Citation information is also passed to DataCite and onto the scholarly aggregators” 1.
I can’t stress how important to create a license to any project you initiate. The license outline how other researchers can use your data or analysis. Without a license, the code is unusable by others, even if it has been publicly posted on GitHub. Adding a license to any R project is made extremely easy with usethis
package. Most package developers are familiar with usethis
package, which can also be extremely useful for non-package projects. You can add any license using a single line:
use_mit_license("My Name")
#> ✓ Setting License field in DESCRIPTION to 'MIT + file LICENSE'
#> ✓ Writing 'LICENSE'
#> ✓ Writing 'LICENSE.md'
#> ✓ Adding '^LICENSE\\.md$' to '.Rbuildignore'
There are many types of licenses but it is not the main focus of this article. You can read more about them from the Open Source Initiative.
It is best practice to use git or a type of version control when doing any kind of computational analysis. A version control system (VCS) allows you to track the iterative changes you make to your code or project. If you are not familiar with git or its one online hosting site, GitHub (https://github.com), I’d recommend that you go through this carpentry lesson which introduce git to novice coders.
Again there’s is an abundant resources and tools to use git within R, one of which is usethis
package. You can read more about it here.
This is the step where you generate the DOI. You can use Zen4R
to create the DOI from R/RStudio. This package was created by Emmanuel Blondel, which provides an interface to the Zenodo e-infrastructure API. The required steps are explained in this wiki but you can also do the same thing from Zenodo itself. You start by logging in to Zenodo with your GitHub, then creating a release to your GitHub repository.
In three simple steps within Zenodo, you can link GitHub and generate a DOI. A very good tutorial by the Carpentry is available here.
Add your citation to CITATION.md
or README.md
in your github repository. You can also copy a badge from Zenodo with DOI to your README.md
. Either way, you must add the DOI to the GitHub repository.
This way, your research outputs can be indexed, cited, and tracked, giving certainty to your scientific work.
This article was inspired by a talk about Open Data from Esther Plomp in the Open Life Science Program Cohort 3. This is a link to the talk in YouTube with captions.
Find out more about Zenodo from their Website.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/BatoolMM/Batool-s-Blabber/blob/master/_posts/2021-06-23-make-your-computational-analysis-citable/make-your-computational-analysis-citable.Rmd , unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Almarzouq (2021, March 18). Batool's Blabber: Make Your Computational Analysis Citable. Retrieved from https://batool-blabber.netlify.app/posts/2021-06-23-make-your-computational-analysis-citable/
BibTeX citation
@misc{almarzouq2021make, author = {Almarzouq, Batool}, title = {Batool's Blabber: Make Your Computational Analysis Citable}, url = {https://batool-blabber.netlify.app/posts/2021-06-23-make-your-computational-analysis-citable/}, year = {2021} }