Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts

Nature

Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts"


Play all audios:

Loading...

ABSTRACT The study of structure–spectrum relationships is essential for spectral interpretation, impacting structural elucidation and material design. Predicting spectra from molecular


structures is challenging due to their complex relationships. Here we introduce NMRNet, a deep learning framework using the SE(3) Transformer for atomic environment modeling, following a


pretraining and fine-tuning paradigm. To support the evaluation of nuclear magnetic resonance chemical shift prediction models, we have established a comprehensive benchmark based on


previous research and databases, covering diverse chemical systems. Applying NMRNet to these benchmark datasets, we achieve competitive performance in both liquid-state and solid-state


nuclear magnetic resonance datasets, demonstrating its robustness and practical utility in real-world scenarios. Our work helps to advance deep learning applications in analytical and


structural chemistry. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution


Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12


digital issues and online access to articles $99.00 per year only $8.25 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices


may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support


SIMILAR CONTENT BEING VIEWED BY OTHERS A DEEP LEARNING MODEL FOR PREDICTING SELECTED ORGANIC MOLECULAR SPECTRA Article 13 November 2023 TRANSPEAKNET FOR SOLVENT-AWARE 2D NMR PREDICTION VIA


MULTI-TASK PRE-TRAINING AND UNSUPERVISED LEARNING Article Open access 20 February 2025 RAPID PROTEIN ASSIGNMENTS AND STRUCTURES FROM RAW NMR SPECTRA WITH THE DEEP LEARNING TECHNIQUE ARTINA


Article Open access 18 October 2022 DATA AVAILABILITY Source data for Fig. 2 and Extended Data Figs. 1, 2 and 4 are available with this Brief Communication. All structural datasets used for


pretraining are publicly accessible. The Aflow dataset32 is available at https://aflowlib.org/, the Materials Project dataset34 is accessible at https://next-gen.materialsproject.org/ and


the CSD dataset33 is accessible at https://www.ccdc.cam.ac.uk/. All processed NMR datasets used for fine tuning are available via Zenodo at https://doi.org/10.5281/zenodo.13317524 (ref. 44).


CODE AVAILABILITY The NMRNet code is available via GitHub at https://github.com/Colin-Jay/NMRNet and via Zenodo at https://doi.org/10.5281/zenodo.14741405 (ref. 45) under an open-source


license. The trained model parameters are available via Zenodo at https://doi.org/10.5281/zenodo.13317524 (ref. 44). A demo notebook of NMRNet is available at


https://bohrium.dp.tech/notebooks/38356712597, and an online service is available at https://ai4ec.ac.cn/apps/nmrnet and https://bohrium.dp.tech/apps/nmrnet001. REFERENCES * Xue, X. et al.


Advances in the application of artificial intelligence-based spectral data interpretation: a perspective. _Anal. Chem._ 95, 13733–13745 (2023). Article  Google Scholar  * Lu, X.-Y. et al.


Deep learning-assisted spectrum–structure correlation: state-of-the-art and perspectives. _Anal. Chem._ 96, 7959–7975 (2024). Article  Google Scholar  * Hu, G. & Qiu, M. Machine


learning-assisted structure annotation of natural products based on MS and NMR data. _Nat. Prod. Rep._ 40, 1735–1753 (2023). Article  Google Scholar  * Smith, S. G. & Goodman, J. M.


Assigning stereochemistry to single diastereoisomers by GIAO NMR calculation: the DP4 probability. _J. Am. Chem. Soc._ 132, 12946–12959 (2010). Article  Google Scholar  * Tsai, Y. -H. et al.


ML-_J_-DP4: an integrated quantum mechanics–machine learning approach for ultrafast NMR structural elucidation. _Org. Lett._ 24, 7487–7491 (2022). Article  Google Scholar  * Jonas, E.,


Kuhn, S. & Schlörer, N. Prediction of chemical shift in NMR: a review. _Magn. Reson. Chem._ 60, 1021–1031 (2022). Article  Google Scholar  * Cortés, I., Cuadrado, C., Hernández Daranas,


A. & Sarotti, A. M. Machine learning in computational NMR-aided structural elucidation. _Front. Nat. Prod._ 2, 1122426 (2023). Article  Google Scholar  * Gerrard, W. et al.


Impression–prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy. _Chem. Sci._ 11, 508–515 (2020). Article  Google


Scholar  * Yang, Z., Chakraborty, M. & White, A. D. Predicting chemical shifts with graph neural networks. _Chem. Sci._ 12, 10802–10809 (2021). Article  Google Scholar  * Kuhn, S. &


Schlörer, N. E. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2—a free in-house NMR database with integrated lims for academic service


laboratories. _Magn. Reson. Chem._ 53, 582–589 (2015). Article  Google Scholar  * Gupta, A., Chakraborty, S. & Ramakrishnan, R. Revving up 13C NMR shielding predictions across chemical


space: benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules. _Mach. Learn. Sci. Technol._ 2, 035010 (2021). Article  Google Scholar  * Jonas, E.


& Kuhn, S. Rapid prediction of NMR spectral properties with quantified uncertainty. _J. Cheminform._ 11, 50 (2019). Article  Google Scholar  * Zou, Z. et al. A deep learning model for


predicting selected organic molecular spectra. _Nat. Comput. Sci._ 3, 957–964 (2023). Article  Google Scholar  * Atwi, R. et al. An automated framework for high-throughput predictions of NMR


chemical shifts within liquid solutions. _Nat. Comput. Sci._ 2, 112–122 (2022). Article  Google Scholar  * Paruzzo, F. M. et al. Chemical shifts in molecular solids by machine learning.


_Nat. Commun._ 9, 4501 (2018). Article  Google Scholar  * Lin, M. et al. Unravelling the fast alkali–ion dynamics in paramagnetic battery materials combined with NMR and deep-potential


molecular dynamics simulation. _Angew. Chem._ 133, 12655–12661 (2021). Article  Google Scholar  * Lin, M., Fu, R., Xiang, Y., Yang, Y. & Cheng, J. Combining NMR and molecular dynamics


simulations for revealing the alkali–ion transport in solid-state battery materials. _Curr. Opin. Electrochem._ 35, 101048 (2022). Article  Google Scholar  * Lin, M. et al. A machine


learning protocol for revealing ion transport mechanisms from dynamic NMR shifts in paramagnetic battery materials. _Chem. Sci._ 13, 7863–7872 (2022). Article  Google Scholar  * Zhou, G. et


al. Uni-Mol: a universal 3D molecular representation learning framework. In _Proc. International Conference on Learning Representations_ (eds Yan, L. et al.) (ICLR, 2023). * Kwon, Y., Lee,


D., Choi, Y.-S., Kang, M. & Kang, S. Neural message passing for nmr chemical shift prediction. _J. Chem. Inf. Model._ 60, 2024–2030 (2020). Article  Google Scholar  * Han, J. et al.


Scalable graph neural network for nmr chemical shift prediction. _Phys. Chem. Chem. Phys._ 24, 26870–26878 (2022). Article  Google Scholar  * Cordova, M. et al. A machine learning model of


chemical shifts for chemically and structurally diverse molecular solids. _J. Phys. Chem. C_ 126, 16710–16720 (2022). Article  Google Scholar  * Liu, S. et al. Multiresolution 3D-densenet


for chemical shift prediction in NMR crystallography. _J. Phys. Chem. Lett._ 10, 4558–4565 (2019). Article  Google Scholar  * Jeong, K. et al. Precisely predicting the 1H and 13C NMR


chemical shifts in new types of nerve agents and building spectra database. _Sci. Rep._ 12, 20288 (2022). Article  Google Scholar  * Gao, P., Zhang, J., Peng, Q., Zhang, J. & Glezakou,


V.-A. General protocol for the accurate prediction of molecular 13C/1H nmr chemical shifts via machine learning augmented DFT. _J. Chem. Inf. Model._ 60, 3746–3754 (2020). Article  Google


Scholar  * Wu, A. et al. Elucidating structures of complex organic compounds using a machine learning model based on the 13C NMR chemical shifts. _Precis. Chem._ 1, 57–68 (2023). Article 


Google Scholar  * Ai, W.-J. et al. A very deep graph convolutional network for 13C NMR chemical shift calculations with density functional theory level performance for structure assignment.


_J. Nat. Prod._ 87, 743–752 (2024). Article  Google Scholar  * Vergnet, J., Saubanère, M., Doublet, M.-L. & Tarascon, J.-M. The structural stability of P2-layered Na-based electrodes


during anionic redox. _Joule_ 4, 420–434 (2020). Article  Google Scholar  * Landrum, G. et al. Rdkit. _Zenodo_ https://doi.org/10.5281/zenodo.14779836 (2024). * Larsen, A. H. et al. The


Atomic Simulation Environment—a Python library for working with atoms. _J. Phys. Condens. Matter_ 29, 273002 (2017). Article  Google Scholar  * Ong, S. P. et al. Python Materials Genomics


(pymatgen): a robust, open-source Python library for materials analysis. _Comput. Mater. Sci._ 68, 314–319 (2013). Article  Google Scholar  * Curtarolo, S. et al. Aflow: an automatic


framework for high-throughput materials discovery. _Comput. Mater. Sci._ 58, 218–226 (2012). Article  Google Scholar  * Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. The


cambridge structural database. _Acta Cryst. B_ 72, 171–179 (2016). Article  Google Scholar  * Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating


materials innovation. _APL. Mater._ 1, 011002 (2013). Article  Google Scholar  * Cordova, M. et al. ShiftML. _Zenodo_ https://doi.org/10.5281/zenodo.6782653 (2022). * Luo, W. et al. Bridging


machine learning and thermodynamics for accurate p_K_a prediction. _JACS Au_ 4, 3451–3465 (2024). Article  Google Scholar  * Yao, L. et al. Node-aligned graph-to-graph: elevating


template-free deep learning approaches in single-step retrosynthesis. _JACS Au_ 4, 992–1003 (2024). Article  Google Scholar  * Abramson, J. et al. Accurate structure prediction of


biomolecular interactions with AlphaFold3. _Nature_ 630, 493–500 (2024). Article  Google Scholar  * Zhang, D. et al. DPA-2: a large atomic model as a multi-task learner. _NPJ Comput. Mater._


10, 293 (2024). Article  Google Scholar  * Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In _Proc.


NAACL-HLT_ (eds Burstein, J. et al.) (Association for Computational Linguistics, 2019). * Fang, X. et al. MolParser: end-to-end visual recognition of molecule structures in the wild.


Preprint at https://arxiv.org/abs/2411.11098v2 (2024). * Bergwerf, H. Molview: an attempt to get the cloud into chemistry classrooms. _Comm. Comput. Chem. Educ._ 9, 1–9 (2015). Google


Scholar  * Momma, K. & Izumi, F. Vesta 3 for three-dimensional visualization of crystal, volumetric and morphology data. _J. Appl. Crystallogr._ 44, 1272–1276 (2011). Article  Google


Scholar  * Xu, F. et al. NMRNet dataset. _Zenodo_ https://doi.org/10.5281/zenodo.13317524 (2024). * Xu, F. NMRNet v1.0.0 code. _Zenodo_ https://doi.org/10.5281/zenodo.14741405 (2025).


Download references ACKNOWLEDGEMENTS We thank Y. Ren and J. Zhang for their contributions to the design of the manuscript’s cover. We thank Y. Tang and J. Qiu for his valuable improvements


to the schematic diagram. We are also grateful for the insightful discussions and suggestions from Y. Liu, J. Zou, Y. Zhuang, Y. Jin, F. Fu, W. Luo, G. Zhou and J. Wang. F.T. acknowledges


the National Key R&D Program of China (grant no. 2024YFA1210804) and a startup fund at Xiamen University. J.C. acknowledges the National Natural Science Foundation of China (grant nos.


22225302, 92470201, 22021001, 92461312, 21991151, 21991150, 92161113 and 22411560277), the Fundamental Research Funds for the Central Universities (20720220009), Laboratory of AI for


Electrochemistry (AI4EC), IKKEM (grant nos. RD2023100101 and RD2022070501). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * State Key Laboratory of Physical Chemistry of Solid Surface, College


of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China Fanjie Xu, Feng Wang, Zhong-Qun Tian & Jun Cheng * DP Technology, Beijing, China Fanjie Xu, Wentao Guo, Lin Yao, 


Hongshuai Wang, Zhifeng Gao & Linfeng Zhang * Department of Chemistry, University of California, Davis, CA, USA Wentao Guo * Pen-Tung Sah Institute of Micro-Nano Science and Technology,


Xiamen University, Xiamen, China Fujie Tang * Laboratory of AI for Electrochemistry, Tan Kah Kee Innovation Laboratory, Xiamen, China Fujie Tang, Zhong-Qun Tian & Jun Cheng * Institute


of Artificial Intelligence, Xiamen University, Xiamen, China Fujie Tang & Jun Cheng * AI for Science Institute, Beijing, China Linfeng Zhang & Weinan E * Center for Machine Learning


Research, Peking University, Beijing, China Weinan E * School of Mathematical Sciences, Peking University, Beijing, China Weinan E Authors * Fanjie Xu View author publications You can also


search for this author inPubMed Google Scholar * Wentao Guo View author publications You can also search for this author inPubMed Google Scholar * Feng Wang View author publications You can


also search for this author inPubMed Google Scholar * Lin Yao View author publications You can also search for this author inPubMed Google Scholar * Hongshuai Wang View author publications


You can also search for this author inPubMed Google Scholar * Fujie Tang View author publications You can also search for this author inPubMed Google Scholar * Zhifeng Gao View author


publications You can also search for this author inPubMed Google Scholar * Linfeng Zhang View author publications You can also search for this author inPubMed Google Scholar * Weinan E View


author publications You can also search for this author inPubMed Google Scholar * Zhong-Qun Tian View author publications You can also search for this author inPubMed Google Scholar * Jun


Cheng View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS J.C., F.T. and Z.G. contributed to the design of the work. F.X. and W.G. completed


data collection and cleaning. F.X. developed the NMRNet code. F.X., W.G. and Z.G. contributed to the software development. F.X., W.G. and F. T. performed data analysis. All authors


participated in the discussion and wrote the manuscript. CORRESPONDING AUTHORS Correspondence to Fujie Tang, Zhifeng Gao or Jun Cheng. ETHICS DECLARATIONS COMPETING INTERESTS The authors


declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Computational Science_ thanks Joshua D. Hartman, Nav Nidhi Rajput and the other, anonymous, reviewer(s) for their


contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the _Nature Computational Science_ team. Peer reviewer reports are available.


ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. EXTENDED DATA EXTENDED DATA


FIG. 1 PERFORMANCE OF NMRNET IN LIQUID-STATE NMR PREDICTION. NMRNet’s correlation scatter plots of predicted versus experimental chemical shifts for (A) 1H, (B) 11B, (C) 13C, (D) 15N, (E)


17O, and (F) 19F in the nmrshiftdb2-2024 dataset. (G) Comparison of the prediction error (MAE) for different elements in the nmrshiftdb2-2024 dataset when using (marked as ‘w/ pre-training’)


or not using pre-trained (marked as ‘w/o pre-training’) weights and when predicting a single element versus all elements simultaneously. (H) Comparison of prediction error (MAE) for


different elements in the nmrshiftdb2-2024 dataset using different proportions of the training set, noting that the data volume for the 19F element does not support a 0.1% setting. To


facilitate the presentation, both the horizontal and vertical axes are scaled logarithmically. Comparison of prediction error (MAE) for different elements in (I) the nmrshiftdb2-2018 dataset


and (J) the QM9NMR dataset against previous studies. Note that DetaNet has not reported results for 19F. Source data EXTENDED DATA FIG. 2 PERFORMANCE OF NMRNET IN SOLID-STATE NMR


PREDICTION. NMRNet’s correlation scatter plots of predicted versus DFT-calculated chemical shifts (chemical shieldings) for (A) 1H, (B) 13C, (C) 15N, and (D) 17O in the ShiftML1 dataset. (E)


Distribution of chemical shifts for four elements in the ShiftML1 dataset. (F) The impact of four strategies on the prediction error (MAE) for 1H in the ShiftML1 dataset using NMRNet.


Samples represent individual atoms with labeled chemical shifts; each data point corresponds to the absolute error between predicted and actual shifts. The sample sizes (n) is 29,913. S1-S3


utilized pre-trained weights on molecular dataset, differing in their use of the unit cell with intra-cell distance matrix, the unit cell with global distance matrix, and cutoff radius = 6 Å


as the local environment for a single atom, respectively. S4 modifies the pre-training in S3 to pre-training with the cutoff format on a large-scale crystal database. (G) Comparison of the


prediction error (RMSE) for different elements in the ShiftML1 dataset using NMRNet with previous studies. (H) NMRNet’s correlation scatter plot of predicted versus DFT-calculated chemical


shifts (chemical shieldings) for 23Na in the NN-NMR dataset. Source data EXTENDED DATA FIG. 3 SIX EXAMPLES USED IN CONFIGURATION DETERMINATION. The top three are for the structure revision


task, and the bottom three are for the chiral isomer identification task. EXTENDED DATA FIG. 4 STRUCTURAL REPRESENTATIONS BY NMRNET. Local structural representations of and their


relationship with chemical shifts for all Na+ in P2-type Na2/3(Mg1/3Mn2/3)O2 using t-SNE for the (A) pre-trained NMRNet and (B) fine-tuned NMRNet. (C) Extract the interaction information


between each central atom (represented as Na1) and its local environment (Na13Mg8Mn16O39) from the results of the 64-head attention mechanism of the Transformer, each head’s results are


represented as a separate row, and these results are then concatenated together. Identical elements are arranged in ascending order based on their distances from the central atom. The darker


color in the visualization indicates stronger correlations between the central atom and its local environment. (D) A unit cell of Na2/3(Mg1/3Mn2/3)O2. (E) The local environment of Na


extracted from the infinite crystal structure corresponding to the unit cell in (D). Source data SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION Supplementary Notes 1 and 2, Figs. 1–14,


Tables 1–31 and additional references. PEER REVIEW FILE SOURCE DATA SOURCE DATA FIG. 2 Statistical source data for Fig. 2a,c. SOURCE DATA EXTENDED DATA FIG. 1 Statistical source data for


Extended Data Fig. 1a–j. SOURCE DATA EXTENDED DATA FIG. 2 Statistical source data for Extended Data Fig. 2a–h. SOURCE DATA EXTENDED DATA FIG. 4 Statistical source data for Extended Data Fig.


4a–c. RIGHTS AND PERMISSIONS Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or


other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Reprints and


permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Xu, F., Guo, W., Wang, F. _et al._ Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance


chemical shifts. _Nat Comput Sci_ 5, 292–300 (2025). https://doi.org/10.1038/s43588-025-00783-z Download citation * Received: 14 August 2024 * Accepted: 26 February 2025 * Published: 28


March 2025 * Issue Date: April 2025 * DOI: https://doi.org/10.1038/s43588-025-00783-z SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get


shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative


Trending News

Peripheral blood regulatory T cells in patients with diffuse systemic sclerosis (SSc) before and after autologous hematopoietic SCT: a pilot study

The present pilot study aims to evaluate the frequency and the function of regulatory T (Treg) cells in patients with di...

'The Boss' Movie Trailer - AARP

2:45 AARP Videos Entertainment 'The Boss' Movie Trailer - AARP A titan of industry is sent to prison after she's caught ...

Why roger scruton is a better man than his critics | thearticle

The juggernaut of grievance which attempted to take out Sir Roger Scruton earlier in the year seems, after all, not to h...

An account of the alcyonarians collected by the royal indian marine survey ship “investigator” in the indian ocean

ABSTRACT THE first part of the memoir of the Alcyonarians of the Indian Ocean was published in 1906, and reviewed in NAT...

Diary of Societies | Nature

ARTICLE PDF RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Diary of Societies. _Na...

Latests News

Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts

ABSTRACT The study of structure–spectrum relationships is essential for spectral interpretation, impacting structural el...

11 quick questions for actress sharon gless | members only access

Not many of her fans know that Sharon Gless almost did not become the cool, collected Christine Cagney of the 1980s prim...

Is the girl swimming 550 km for clean ganga fooling people? - scoopwhoop

Eleven-year-old Shraddha Shukla, who is currently undertaking the task of swimming 550 km over 10 days from Kanpur to Va...

Ankita lokhande steals a kiss from boyfriend vicky jain at a friend's wedding - video goes viral

Post her bitter break-up with 'Pavitra Rishta' co-star Sushant Singh Rajput, Ankita Lokhande has found love ag...

The aarp minute: september 6, 2022

Memorial Day Sale! Join AARP for just $11 per year with a 5-year membership Join now and get a FREE gift. Expires 6/4  G...

Top