The rewards of reusable machine learning code

Nature

The rewards of reusable machine learning code"


Play all audios:

Loading...

Research papers can make a long-lasting impact when the code and software tools supporting the findings are made readily available and can be reused and built on. Our reusability reports


explore and highlight examples of good code sharing practices. The availability of increasingly advanced deep learning tools in the past two decades has transformed scientific research in


most, if not all, disciplines. ‘AI for Science’ is a popular theme at key machine learning conferences such as NeurIPS and ICML and in academic as well as industry research labs. Example


application areas include protein design, materials discovery, precision medicine, quantum computing and analysis of complex dynamical systems such as in engineering or climate modelling.


Research efforts that involve well-designed machine learning tools can be of long-lasting value to the research community and beyond, provided that the methods, datasets and code are clearly


described and shared. In recent years, we have observed a clear rise in standards regarding availability of code and data in submitted manuscripts, which is good news for open science and


reproducibility. Our editorial policies mandate that authors share code used to produce results that are central to the main claims. Code should be made available to referees during the peer


review process, and then released publicly upon publication. We ask referees to review the code and, if possible, to try and run it and reproduce the findings in a paper. To facilitate this


process, authors have the option to upload their code in the form of executable compute capsules via the Code Ocean platform. This enables referees to access code without needing to install


various libraries or software packages. Accessing Code Ocean compute capsules that accompany a manuscript has been simplified, as weblinks are now integrated in our online manuscript


system. Similar weblinks are also provided to upload data on the Figshare repository. The availability of code and data, and the reproducibility of the main research findings, are important


goals during the peer review process; however, the gold standard is ensuring that code can be re-implemented, extended and reused by other researchers on different datasets and applications.


Making code fully reusable can be a tall order as research groups may not have the resources to develop and maintain well-structured code repositories and accompanying documentation. At a


minimum, we expect authors to provide a clear README file to describe the code and its intended uses, an overview of dependencies, and some example data. If applicable, a description of the


pre-training process and a pre-trained model should be provided. Furthermore, authors should provide a license to explain terms of use and redistribution and mint a digital object identifier


(DOI) to ensure that a permanent version exists that is associated with the published paper. To highlight the value of high-quality code developments, we introduced an article format in


2020 known as ‘reusability reports’1, which are dedicated to testing robustness, extendability and reusability of previously published code. So far, 12 reusability reports have been


published and we are encouraged by the consistently positive feedback from authors and referees. We regularly send out invitations to write a reusability report linked to selected accepted


papers that have promising code development. But we are also happy to receive proposals for such articles, which can be linked to papers published in _Nature Machine Intelligence_ or


elsewhere. However, for critical comments on articles that highlight issues with reproducibility or other technical problems, authors should refer to the Matters Arising format. Reusability


reports undergo peer review and count as primary research articles. Since last year, reusability reports are based on our regular research article type to enable more space and ensure a


similar editorial and peer review process. In contrast to regular Articles, we do not assess novelty in reusability reports but instead examine whether reusability is tested in technically


correct and interesting ways, and whether clear value is added with respect to the original article. To highlight some examples, in a reusability report in this issue, Tao Xu et al. test a


recent bilinear attention model for predicting drug–target interactions, with adaptability across domains Xu et al. study and highlight this generalization capability of the model and also


apply the method on a task not explored in the original publication — the prediction of cell line–drug responses. In another example2, Yingying Cao et al. tested and re-used code from


PENCIL, a supervised method to identify cell populations with specific phenotypes from single-cell RNA data. They found that the method can be combined with an approach called gene set


variation analysis to predict responses to immune checkpoint blockade therapy in several skin cancer datasets2. As a last example, Yuhe Zhang et al.3 revisited a deep learning method that


uses a parameterized physical forward model to reconstruct holographic images and extended it to non-perfect optical systems (for example, subjected to noise or blurring) by incorporating


system-specific response functions in the forward propagator. The development of good quality code and software, which can be re-implemented and extended to new data, even beyond the


original scope, can catalyse further research and inspire new directions. As our colleagues at _Nature Computational Science_ wrote a few years ago4, efforts in code and software development


deserve more praise and recognition, to start with by ensuring that code is clearly shared, discoverable, and citable. As we expand our series of reusability reports, we hope to contribute


to a virtuous circle of high standards in code development and sharing, and more recognition and reward for such efforts. REFERENCES * _Nat. Mach. Intell_. 2, 729 (2020). * Cao, Y., Chang,


T. G., Sahni, S. & Ruppin, E. _Nat. Mach. Intell._ 6, 307–314 (2024). Article  Google Scholar  * Zhang, Y., Ritschel, T. & Villanueva-Perez, P. _Nat. Mach. Intell._ 6, 284–290


(2024). Article  Google Scholar  * _Nat. Comput. Sci_. 1, 89 (2021). Download references RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE The rewards of


reusable machine learning code. _Nat Mach Intell_ 6, 369 (2024). https://doi.org/10.1038/s42256-024-00835-5 Download citation * Published: 24 April 2024 * Issue Date: April 2024 * DOI:


https://doi.org/10.1038/s42256-024-00835-5 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not


currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative


Trending News

Appointments Vacant | Nature

Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ...

Misuse of statistics in social sciences

Scientific Correspondence Published: 01 December 1985 Misuse of statistics in social sciences JOHN ROSS1  Nature volume ...

Synthesis of Thermoplastic Xylan-Lactide Copolymer with Amidine-Mediated Organocatalyst in Ionic Liquid

Hemicelluloses, the major non-cellulose polysaccharides in wood component, are a renewable and biodegradable resource, r...

[Withdrawn] ESFA Update: 26 October 2022 - GOV.UK

Correspondence ESFA Update: 26 October 2022 Latest information and actions from the Education and Skills Funding Agency ...

Phytohormone ethylene-responsive Arabidopsis organ growth under light is in the fine regulation of Photosystem II deficiency-inducible AKIN10 expressi

For photoautotrophic plants, light-dependent photosynthesis plays an important role in organismal growth and development...

Latests News

The rewards of reusable machine learning code

Research papers can make a long-lasting impact when the code and software tools supporting the findings are made readily...

Factors affecting coarse fish recruitment

Research and analysis FACTORS AFFECTING COARSE FISH RECRUITMENT Research in to the factors affecting coarse fish recruit...

Cervical cancer stem cells manifest radioresistance: association with upregulated ap-1 activity

ABSTRACT Transcription factor AP-1 plays a central role in HPV-mediated cervical carcinogenesis. AP-1 has also been impl...

The Real Show Begins After the Oscars Have Been Won

Two lanes of Canon Drive will be closed and a mammoth tent erected over the street for the arriving limousines at Spago ...

Factory gets loan under new program

The new owners of a Pacoima shutters factory are the first entrepreneurs in that community to receive funding under a ne...

Top