rOpenSci aims to support packages that support reproducible research and managing the data lifecycle for scientists. Packages submitted to rOpenSci should fit into one or more of the following categories. If you are unsure whether your package fits into one of these categories, please open an issue as a pre-submission inquiry (Examples).
As as this is a living document, these categories may change through time and not all previously onboarded packages would be in-scope today. While we strive to be consistent, we evaluate packages on a case-by-case basis and may make exceptions.
data retrieval: Packages for accessing and download data from online sources with scientific applications. Our definition of scientific applications is broad, including data storage services, journals, and other remote servers, as many data sources may be of interest to researchers. However, retrieval packages should be focused on data sources / topics, rather than services. For example a general client for Amazon Web Services data storage would not be in-scope. (Examples: rotl, gutenbergr)
data extraction: Packages that aid in retrieving data from unstructured sources such as text, images and PDFs, as well as parsing scientific data types and outputs from scientific equipment. Statistical/ML libraries for modeling or prediction are typically not included in this category, but trained models that act as utilities (e.g., for optical character recognition), may qualify. (Examples: tabulizer, robotstxt, genbankr)
database access: Bindings and wrappers generic database APIs (Example: rrlite)
data munging: Packages for processing data from formats above. This area does not include broad data manipulations tools such as reshape2 or tidyr, but rather tools for handling data in specific scientific formats. (Example: plateR)
data deposition*: Packages that support deposition of data into research repositories, including data formatting and metadata generation. (Examples EML)
reproducibility*: Tools that facilitate reproducible research. This includes packages that facilitate use of version control, provenance tracking, automated testing of data inputs and statistical outputs, citation of software and scientific literature. It does not include general tools for literate programming (e.g., R markdown extensions not under the previous topics). (Example assertr)
In addition, we have some specialty topics with a slightly broader scope.
text analysis: We are currently piloting a sub-specialty area for text analysis which includes implementation of statistical/ML methods for analyzing or extracting text data. This does not include packages with new methods, but only implementation or wrapping of previously published methods. As this is a pilot, the scope for this area is not fully defined and we are still developing a reviewer base and process for this area. Please open a pre-submission inquiry if you are considering submitting a package that falls under this topic.
Packages should be general in that they should solve a problem as broadly as possible while maintaining a coherent user interface and code base. For instance if several data sources use an identical API, we prefer a package that provides access to all the data sources, rather than just one.
Here are some types of packages we are unlikely to accept:
We encourage submitting packages not accepted to rOpenSci to submit to CRAN, BioConductor, as well as other R package development initiatives (e.g., cloudyr), and software journals such as JOSS, JSS, or the R journal.
Note that not all packages developed internally by rOpenSci or through our events or collaborations are in-scope for onboarding process.
rOpenSci encourages competition among packages, forking and re-implementation as they improve options of users overall. However, as we want packages in the rOpenSci suite to be our top recommendations for the tasks they perform, we aim to avoid duplication of functionality of existing R packages in any repo without significant improvements. An R package that replicates the functionality of an existing R package may be considered for inclusion in the ROpenSci suite if it significantly improves on alternatives in any repository (RO, CRAN, BioC) by being:
These factors should be considered as a whole to determine if the package is a significant improvement. A new package would not meet this standard only by following our package guidelines while others do not, unless this leads to a significant difference in the areas above.
We recommend that packages highlight differences from and improvements over overlapping packages in their README and/or vignettes.
We encourage developers whose packages are not accepted due to overlap to still consider submittal to other repositories or journals.
Package authors will continue to maintain and develop their software after acceptance into rOpenSci. Unless explicitly added as collaborators, rOpenSci’s staff will not interfere much with day to day operations. However, this team may intervene with critical bug fixes, or address urgent issues if package authors do not respond in a timely manner.
Authors of contributed packages essentially maintain the same ownership they had prior to their package joining the rOpenSci suite. Contributors will have write access to their repositories, but will need an rOpenSci staff member to add any new contributors.
In the unlikely scenario that a contributor of a package requests removal of their package from the suite, we retain the right to maintain a version of the package in our suite for archival purposes.
rOpenSci strives to develop and promote high quality research software. To ensure that your software meets our criteria, we review all of our submissions as part of the onboarding process, and even after acceptance will continue to chime in with improvements and bug fixes.
Despite our best efforts to support contributed software, errors are the responsibility of individual maintainers. Buggy, unmaintained software may be removed from our suite at any time.
If package maintainers do not respond in a timely manner to requests for package fixes from CRAN or from us, we will remind the maintainer a number of times, but after 3 months (or shorter time frame, depending on how critical the fix is) we will make the changes ourselves.
The above is a bit vague, so the following are a few areas of consideration.
foois depended on by 1 or more packages on CRAN, and
foois broken, and thus would break its reverse dependencies.
barmay not have reverse dependencies on CRAN, but is widely used, thus quickly fixing problems is of greater importance.
hellois not on CRAN, or on CRAN, but has no reverse dependencies.
worldneeds some fixes. The maintainer has responded but is simply very busy with a new job, or other reason, and will attend to soon.
We urge package maintainers to make sure they are receiving GitHub notifications, as well as making sure emails from rOpenSci staff and CRAN maintainers are not going to their spam box. In addition, join the rOpenSci Slack https://ropensci.signup.team/ to chat to rOpenSci staff and the greater rOpenSci community.
Should authors abandon the maintenance of an actively used package in our suite, we will consider petitioning CRAN to transfer package maintainer status to rOpenSci.