The rewards of reusable machine learning code
The rewards of reusable machine learning code"
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
Research papers can make a long-lasting impact when the code and software tools supporting the findings are made readily available and can be reused and built on. Our reusability reports
explore and highlight examples of good code sharing practices. The availability of increasingly advanced deep learning tools in the past two decades has transformed scientific research in
most, if not all, disciplines. ‘AI for Science’ is a popular theme at key machine learning conferences such as NeurIPS and ICML and in academic as well as industry research labs. Example
application areas include protein design, materials discovery, precision medicine, quantum computing and analysis of complex dynamical systems such as in engineering or climate modelling.
Research efforts that involve well-designed machine learning tools can be of long-lasting value to the research community and beyond, provided that the methods, datasets and code are clearly
described and shared. In recent years, we have observed a clear rise in standards regarding availability of code and data in submitted manuscripts, which is good news for open science and
reproducibility. Our editorial policies mandate that authors share code used to produce results that are central to the main claims. Code should be made available to referees during the peer
review process, and then released publicly upon publication. We ask referees to review the code and, if possible, to try and run it and reproduce the findings in a paper. To facilitate this
process, authors have the option to upload their code in the form of executable compute capsules via the Code Ocean platform. This enables referees to access code without needing to install
various libraries or software packages. Accessing Code Ocean compute capsules that accompany a manuscript has been simplified, as weblinks are now integrated in our online manuscript
system. Similar weblinks are also provided to upload data on the Figshare repository. The availability of code and data, and the reproducibility of the main research findings, are important
goals during the peer review process; however, the gold standard is ensuring that code can be re-implemented, extended and reused by other researchers on different datasets and applications.
Making code fully reusable can be a tall order as research groups may not have the resources to develop and maintain well-structured code repositories and accompanying documentation. At a
minimum, we expect authors to provide a clear README file to describe the code and its intended uses, an overview of dependencies, and some example data. If applicable, a description of the
pre-training process and a pre-trained model should be provided. Furthermore, authors should provide a license to explain terms of use and redistribution and mint a digital object identifier
(DOI) to ensure that a permanent version exists that is associated with the published paper. To highlight the value of high-quality code developments, we introduced an article format in
2020 known as ‘reusability reports’1, which are dedicated to testing robustness, extendability and reusability of previously published code. So far, 12 reusability reports have been
published and we are encouraged by the consistently positive feedback from authors and referees. We regularly send out invitations to write a reusability report linked to selected accepted
papers that have promising code development. But we are also happy to receive proposals for such articles, which can be linked to papers published in _Nature Machine Intelligence_ or
elsewhere. However, for critical comments on articles that highlight issues with reproducibility or other technical problems, authors should refer to the Matters Arising format. Reusability
reports undergo peer review and count as primary research articles. Since last year, reusability reports are based on our regular research article type to enable more space and ensure a
similar editorial and peer review process. In contrast to regular Articles, we do not assess novelty in reusability reports but instead examine whether reusability is tested in technically
correct and interesting ways, and whether clear value is added with respect to the original article. To highlight some examples, in a reusability report in this issue, Tao Xu et al. test a
recent bilinear attention model for predicting drug–target interactions, with adaptability across domains Xu et al. study and highlight this generalization capability of the model and also
apply the method on a task not explored in the original publication — the prediction of cell line–drug responses. In another example2, Yingying Cao et al. tested and re-used code from
PENCIL, a supervised method to identify cell populations with specific phenotypes from single-cell RNA data. They found that the method can be combined with an approach called gene set
variation analysis to predict responses to immune checkpoint blockade therapy in several skin cancer datasets2. As a last example, Yuhe Zhang et al.3 revisited a deep learning method that
uses a parameterized physical forward model to reconstruct holographic images and extended it to non-perfect optical systems (for example, subjected to noise or blurring) by incorporating
system-specific response functions in the forward propagator. The development of good quality code and software, which can be re-implemented and extended to new data, even beyond the
original scope, can catalyse further research and inspire new directions. As our colleagues at _Nature Computational Science_ wrote a few years ago4, efforts in code and software development
deserve more praise and recognition, to start with by ensuring that code is clearly shared, discoverable, and citable. As we expand our series of reusability reports, we hope to contribute
to a virtuous circle of high standards in code development and sharing, and more recognition and reward for such efforts. REFERENCES * _Nat. Mach. Intell_. 2, 729 (2020). * Cao, Y., Chang,
T. G., Sahni, S. & Ruppin, E. _Nat. Mach. Intell._ 6, 307–314 (2024). Article Google Scholar * Zhang, Y., Ritschel, T. & Villanueva-Perez, P. _Nat. Mach. Intell._ 6, 284–290
(2024). Article Google Scholar * _Nat. Comput. Sci_. 1, 89 (2021). Download references RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE The rewards of
reusable machine learning code. _Nat Mach Intell_ 6, 369 (2024). https://doi.org/10.1038/s42256-024-00835-5 Download citation * Published: 24 April 2024 * Issue Date: April 2024 * DOI:
https://doi.org/10.1038/s42256-024-00835-5 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not
currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative
Trending News
Appointments Vacant | NatureAccess through your institution Buy or subscribe This is a preview of subscription content, access via your institution ...
Misuse of statistics in social sciencesScientific Correspondence Published: 01 December 1985 Misuse of statistics in social sciences JOHN ROSS1 Nature volume ...
Synthesis of Thermoplastic Xylan-Lactide Copolymer with Amidine-Mediated Organocatalyst in Ionic LiquidHemicelluloses, the major non-cellulose polysaccharides in wood component, are a renewable and biodegradable resource, r...
[Withdrawn] ESFA Update: 26 October 2022 - GOV.UKCorrespondence ESFA Update: 26 October 2022 Latest information and actions from the Education and Skills Funding Agency ...
Phytohormone ethylene-responsive Arabidopsis organ growth under light is in the fine regulation of Photosystem II deficiency-inducible AKIN10 expressiFor photoautotrophic plants, light-dependent photosynthesis plays an important role in organismal growth and development...
Latests News
The rewards of reusable machine learning codeResearch papers can make a long-lasting impact when the code and software tools supporting the findings are made readily...
Factors affecting coarse fish recruitmentResearch and analysis FACTORS AFFECTING COARSE FISH RECRUITMENT Research in to the factors affecting coarse fish recruit...
Cervical cancer stem cells manifest radioresistance: association with upregulated ap-1 activityABSTRACT Transcription factor AP-1 plays a central role in HPV-mediated cervical carcinogenesis. AP-1 has also been impl...
The Real Show Begins After the Oscars Have Been WonTwo lanes of Canon Drive will be closed and a mammoth tent erected over the street for the arriving limousines at Spago ...
Factory gets loan under new programThe new owners of a Pacoima shutters factory are the first entrepreneurs in that community to receive funding under a ne...