Genome content predicts the carbon catabolic preferences of heterotrophic bacteria

Nature

Genome content predicts the carbon catabolic preferences of heterotrophic bacteria"


Play all audios:

Loading...

ABSTRACT Heterotrophic bacteria—bacteria that utilize organic carbon sources—are taxonomically and functionally diverse across environments. It is challenging to map metabolic interactions


and niches within microbial communities due to the large number of metabolites that could serve as potential carbon and energy sources for heterotrophs. Whether their metabolic niches can be


understood using general principles, such as a small number of simplified metabolic categories, is unclear. Here we perform high-throughput metabolic profiling of 186 marine heterotrophic


bacterial strains cultured in media containing one of 135 carbon substrates to determine growth rates, lag times and yields. We show that, despite high variability at all levels of taxonomy,


the catabolic niches of heterotrophic bacteria can be understood in terms of their preference for either glycolytic (sugars) or gluconeogenic (amino and organic acids) carbon sources. This


preference is encoded by the total number of genes found in pathways that feed into the two modes of carbon utilization and can be predicted using a simple linear model based on gene counts.


This allows for coarse-grained descriptions of microbial communities in terms of prevalent modes of carbon catabolism. The sugar–acid preference is also associated with genomic GC content


and thus with the carbon–nitrogen requirements of their encoded proteome. Our work reveals how the evolution of bacterial genomes is structured by fundamental constraints rooted in


metabolism. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access


Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 digital


issues and online access to articles $119.00 per year only $9.92 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be


subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR


CONTENT BEING VIEWED BY OTHERS TAXONOMIC AND ENVIRONMENTAL DISTRIBUTION OF BACTERIAL AMINO ACID AUXOTROPHIES Article Open access 22 November 2023 LINKING PROKARYOTIC GENOME SIZE VARIATION TO


METABOLIC POTENTIAL AND ENVIRONMENT Article Open access 27 March 2023 FREQUENCY OF CHANGE DETERMINES EFFECTIVENESS OF MICROBIAL RESPONSE STRATEGIES Article Open access 18 September 2023


DATA AVAILABILITY All growth and genomic data are available at https://doi.org/10.17632/xfh8t8568g.1. All isolates are available from either M.G. (Europe) or O.X.C. (USA) on request. All


genome assemblies are available under BioProjects PRJNA319196 and PRJNA478695, with the exception of strains 1A06 (PRJNA318805), 12B01 (PRJNA13568), 13B01 (PRJNA318805), DSS-3 (BioSample


SAMN02604003) as well as AS40, AS56, AS88 and AS94 (PRJNA996876). Source data are provided with this paper. CODE AVAILABILITY All code needed to reproduce the figures are available at


https://doi.org/10.17632/xfh8t8568g.1. REFERENCES * Huttenhower, C. et al. Structure, function and diversity of the healthy human microbiome. _Nature_ 486, 207–214 (2012). Article  CAS 


Google Scholar  * Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. _Nature_ 551, 457–463 (2017). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Sunagawa, S. et al. Structure and function of the global ocean microbiome. _Science_ 348, 1261359 (2015). Article  PubMed  Google Scholar  * Pontrelli, S. et al. Metabolic


cross-feeding structures the assembly of polysaccharide degrading communities. _Sci. Adv._ 8, eabk3076 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Gralka, M., Szabo, R.,


Stocker, R. & Cordero, O. X. Trophic interactions and the drivers of microbial community assembly. _Curr. Biol._ 30, R1176–R1188 (2020). Article  CAS  PubMed  Google Scholar  * Pollak,


S. et al. Public good exploitation in natural bacterioplankton communities. _Sci. Adv._ 7, eabi4717 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Moran, M. A. The global


ocean microbiome. _Science_ 350, aac8455 (2015). Article  PubMed  Google Scholar  * Datta, M. S., Sliwerska, E., Gore, J., Polz, M. F. & Cordero, O. X. Microbial interactions lead to


rapid micro-scale successions on model marine particles. _Nat. Commun._ 7, 11965 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Enke, T. N. et al. Modular assembly of


polysaccharide-degrading marine microbial communities. _Curr. Biol._ 29, 1528–1535 (2019). Article  CAS  PubMed  Google Scholar  * Fahimipour, A. K. & Gross, T. Mapping the bacterial


metabolic niche space. _Nat. Commun._ 11, 4887 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Kehe, J. et al. Positive interactions are common among culturable bacteria.


_Sci. Adv._ 7, eabi7159 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Kirchman, D. L. The ecology of _Cytophaga_–_Flavobacteria_ in aquatic environments. _FEMS Microbiol.


Ecol._ 39, 91–100 (2002). CAS  PubMed  Google Scholar  * Buchan, A., LeCleir, G. R., Gulvik, C. A. & González, J. M. Master recyclers: features and functions of bacteria associated with


phytoplankton blooms. _Nat. Rev. Microbiol._ 12, 686–698 (2014). Article  CAS  PubMed  Google Scholar  * Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated


reconstruction of genome-scale metabolic models for microbial species and communities. _Nucleic Acids Res._ 46, 7542–7553 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Barberán, A., Caceres Velazquez, H., Jones, S. & Fierer, N. Hiding in plain sight: mining bacterial species records for phenotypic trait information. _mSphere_ 2, e00237-17 (2017).


Article  PubMed  PubMed Central  Google Scholar  * Mende, D. R. et al. ProGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of


prokaryotic genomes. _Nucleic Acids Res._ 48, D621–D625 (2020). CAS  PubMed  Google Scholar  * Sueoka, N. Correlation between base composition of deoxyribonucleic acid and amino acid


composition of protein. _Proc. Natl Acad. Sci. USA_ 47, 1141–1149 (1961). Article  CAS  PubMed  PubMed Central  Google Scholar  * Hellweger, F. L., Huang, Y. & Luo, H. Carbon limitation


drives GC content evolution of a marine bacterium in an individual-based genome-scale model. _ISME J._ 12, 1180–1187 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Shenhav,


L. & Zeevi, D. Resource conservation manifests in the genetic code. _Science_ 370, 683–687 (2020). Article  CAS  PubMed  Google Scholar  * Mende, D. R. et al. Environmental drivers of a


microbial genomic transition zone in the ocean’s interior. _Nat. Microbiol._ 2, 1367–1373 (2017). Article  CAS  PubMed  Google Scholar  * Musto, H. et al. Genomic GC level, optimal growth


temperature, and genome size in prokaryotes. _Biochem. Biophys. Res. Commun._ 347, 1–3 (2006). Article  CAS  PubMed  Google Scholar  * Estrela, S. et al. Functional attractors in microbial


community assembly. _Cell Syst._ 13, 29–42 (2022). Article  CAS  PubMed  Google Scholar  * Amarnath, K. et al. Stress-induced metabolic exchanges between complementary bacterial types


underly a dynamic mechanism of inter-species stress resistance. _Nat. Commun._ 14, 3165 (2023). Article  CAS  PubMed  PubMed Central  Google Scholar  * Estrela, S., Diaz-Colunga, J., Vila,


J. C., Sanchez-Gorostiaga, A., & Sanchez, A. Diversity begets diversity under microbial niche construction. Preprint at _bioRxiv_ https://doi.org/10.1101/2022.02.13.480281 (2022). *


Schink, S. J. et al. Glycolysis/gluconeogenesis specialization in microbes is driven by biochemical constraints of flux sensing. _Mol. Syst. Biol._ 18, e10704 (2022). Article  CAS  PubMed 


PubMed Central  Google Scholar  * Basan, M. et al. A universal trade-off between growth and lag in fluctuating environments. _Nature_ 584, 470–474 (2020). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Plucain, J. et al. Epistasis and allele specificity in the emergence of a stable polymorphism in _Escherichia coli_. _Science_ 343, 160–164 (2014). Article  Google


Scholar  * Blount, Z. D., Borland, C. Z. & Lenski, R. E. Historical contingency and the evolution of a key innovation in an experimental population of _Escherichia coli_. _Proc. Natl


Acad. Sci. USA_ 105, 7899–7906 (2008). Article  CAS  PubMed  PubMed Central  Google Scholar  * Le Gac, M., Plucain, J., Hindré, T., Lenski, R. E. & Schneider, D. Ecological and


evolutionary dynamics of coexisting lineages during a long-term experiment with _Escherichia coli_. _Proc. Natl Acad. Sci. USA_ 109, 9487–9492 (2012). Article  PubMed  PubMed Central  Google


Scholar  * Hershberg, R. & Petrov, D. A. Evidence that mutation is universally biased towards AT in bacteria. _PLoS Genet._ 6, e1001115 (2010). Article  PubMed  PubMed Central  Google


Scholar  * Ely, B. Genomic GC content drifts downward in most bacterial genomes. _PLoS ONE_ 16, e0244163 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Maddamsetti, R. &


Grant, N. A. Divergent evolution of mutation rates and biases in the long-term evolution experiment with _Escherichia coli_. _Genome Biol. Evol._ 12, 1591–1603 (2020). Article  CAS  PubMed


  PubMed Central  Google Scholar  * Yakovchuk, P., Protozanova, E. & Frank-Kamenetskii, M. D. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix.


_Nucleic Acids Res._ 34, 564–574 (2006). Article  CAS  PubMed  PubMed Central  Google Scholar  * Lassalle, F. et al. GC-content evolution in bacterial genomes: the biased gene conversion


hypothesis expands. _PLoS Genet._ 11, e1004941 (2015). Article  PubMed  PubMed Central  Google Scholar  * Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code.


_Science_ 370, 683–687 (2020). Article  CAS  PubMed  Google Scholar  * Smriga, S., Ciccarese, D. & Babbin, A. R. Denitrifying bacteria respond to and shape microscale gradients within


particulate matrices. _Commun. Biol._ 4, 570 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Gowda, K., Ping, D., Mani, M. & Kuehn, S. Genomic structure predicts


metabolite dynamics in microbial communities. _Cell_ 185, 530–546 (2022). Article  CAS  PubMed  Google Scholar  * Moran, M. A. et al. Genome sequence of _Silicibacter pomeroyi_ reveals


adaptations to the marine environment. _Nature_ 432, 910–913 (2004). Article  CAS  PubMed  Google Scholar  * Ben-Haim, Y. et al. _Vibrio coralliilyticus_ sp. nov., a temperature-dependent


pathogen of the coral _Pocillopora damicornis_. _Int. J. Syst. Evol. Microbiol._ 53, 309–315 (2003). Article  CAS  PubMed  Google Scholar  * Hehemann, J. H. et al. Adaptive radiation by


waves of gene transfer leads to fine-scale resource partitioning in marine microbes. _Nat. Commun._ 7, 12860 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Bankevich, A. et


al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. _J. Comput. Biol._ 19, 455–477 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.


_Genome Res._ 25, 1043–1055 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site


identification. _BMC Bioinform._ 11, 119 (2010). Article  Google Scholar  * Huerta-Cepas, J. et al. EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for


eukaryotic, prokaryotic and viral sequences. _Nucleic Acids Res._ 44, D286–D293 (2016). Article  CAS  PubMed  Google Scholar  * Zhang, H. et al. DbCAN2: a meta server for automated


carbohydrate-active enzyme annotation. _Nucleic Acids Res._ 46, W95–W101 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. &


Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. _Bioinformatics_ 36, 1925–1927 (2020). Article  CAS  Google Scholar  * Shen, W. & Ren, H. TaxonKit:


a practical and efficient NCBI taxonomy toolkit. _J. Genet. Genomics_ 48, 844–850 (2021). Article  PubMed  Google Scholar  * Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R.


COBRApy: COnstraints-based reconstruction and analysis for Python. _BMC Syst. Biol._ 7, 74 (2013). Article  PubMed  PubMed Central  Google Scholar  * Wolfram Mathematica v. 13.2 (Wolfram,


2022). * R: A Language and Environment for Statistical Computing (R Core Team, 2022). * Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. Ggtree: an R package for visualization and


annotation of phylogenetic trees with their covariates and other associated data. _Methods Ecol. Evol._ 8, 28–36 (2017). Article  Google Scholar  * Paradis, E. & Schliep, K. Ape 5.0: an


environment for modern phylogenetics and evolutionary analyses in R. _Bioinformatics_ 35, 526–528 (2019). Article  CAS  PubMed  Google Scholar  * Tamura, K., Stecher, G. & Kumar, S.


MEGA11: Molecular Evolutionary Genetics Analysis version 11. _Mol. Biol. Evol._ 38, 3022–3027 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Schliep, K. P. phangorn:


Phylogenetic analysis in R. _Bioinformatics_ 27, 592–593 (2011). Article  CAS  PubMed  Google Scholar  * Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a


reference resource for gene and protein annotation. _Nucleic Acids Res._ 44, D457–D462 (2016). Article  CAS  PubMed  Google Scholar  * Heinken, A. et al. Genome-scale metabolic


reconstruction of 7,302 human microorganisms for personalized medicine. _Nat. Biotechnol_. https://doi.org/10.1038/s41587-022-01628-0 (2023). * Heinken, A., Magnúsdóttir, S., Fleming, R. M.


T. & Thiele, I. DEMETER: efficient simultaneous curation of genome-scale reconstructions guided by experimental data and refined gene annotations. _Bioinformatics_ 37, 3974–3975 (2021).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. _Nat. Methods_ 13, 581–583 (2016). Article


  CAS  PubMed  PubMed Central  Google Scholar  * Hubert, B. SkewDB, a comprehensive database of GC and 10 other skews for over 30,000 chromosomes and plasmids. _Sci. Data_ 9, 92 (2022).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Lagadec, E., Småge, S. B., Trösse, C. & Nylund, A. Phylogenetic analyses of Norwegian Tenacibaculum strains confirm high bacterial


diversity and suggest circulation of ubiquitous virulent strains. _PLoS One_ 16, e0259215 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Ekborg, N. A. et al. Saccharophagus


degradans gen. nov., sp. nov., a versatile marine degrader of complex polysaccharides. _Int. J. Syst. Evol. Microbiol._ 55, 1545–1549 (2005). Article  CAS  PubMed  Google Scholar  Download


references ACKNOWLEDGEMENTS We thank S. Estrela (Yale University and Stanford University) for providing community composition data from their enrichment experiments (Fig. 4d); A. Sichert for


assembling genomes; and M. d. Bello, X. Shan, T. Hwa as well as all members of the Cordero laboratory and Simons PriME collaboration for their enriching discussions. We acknowledge funding


from the Simons Collaboration: Principles of Microbial Ecosystems (PriME) award number 542395 (O.X.C.) and Simons Foundation Postdoctoral Fellowship Award number 599207 (M.G.). AUTHOR


INFORMATION Author notes * Matti Gralka Present address: Systems Biology Group, Amsterdam Institute for Life and Environment (A-LIFE) and Amsterdam Institute of Molecular and Life Sciences


(AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands * Shaul Pollak Present address: Division of Microbial Ecology, Centre for Microbiology and Environmental Systems Science,


University of Vienna, Vienna, Austria AUTHORS AND AFFILIATIONS * Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Matti Gralka, 


Shaul Pollak & Otto X. Cordero Authors * Matti Gralka View author publications You can also search for this author inPubMed Google Scholar * Shaul Pollak View author publications You can


also search for this author inPubMed Google Scholar * Otto X. Cordero View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS M.G. designed the


study, performed all experiments, analysed all data and wrote the initial manuscript. S.P. analysed the genomic data from the proGenomes database. M.G., S.P. and O.X.C. discussed the


results. O.X.C. directed the project and edited the manuscript. CORRESPONDING AUTHORS Correspondence to Matti Gralka or Otto X. Cordero. ETHICS DECLARATIONS COMPETING INTERESTS The authors


declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Microbiology_ thanks Sara Mitri, Seppe Kuehn and the other, anonymous, reviewer(s) for their contribution to the


peer review of this work. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


EXTENDED DATA EXTENDED DATA FIG. 1 PHYLOGENETIC TREE OF ALL STRAINS USED IN THIS STUDY. The tree and taxonomy were created using the GTDB-tk classify workflow using standard parameters from


an alignment of 120 marker genes. The legend corresponds to expected substitutions per site. A full list of all strains is provided in Supplementary Table 1. EXTENDED DATA FIG. 2 OVERVIEW OF


GROWTH CHARACTERIZATION RESULTS. A, Number of carbon sources supporting growth per strain. B, Fraction of all strains that were able to use a given substrate as their sole carbon and energy


source. C, There was a lack of strong correlation between the number of carbon sources that support growth, growth rate and yield. Average yield (blue dots) and rate (red squares) binned by


the number of carbon sources that supported growth, shown as the mean ± s.d. (for a total of _n_ = 182 strains showing growth on at least one substrate). Lines and _P_ values are derived


from linear regressions. More generalist species (more carbon sources consumed) achieve slightly higher average yield but the effect size is likely not practically relevant. D, For each


condition (substrates × strain), we plotted the growth rate and yield, which are very slightly positively correlated (linear regression _P_ = 2 × 10−6, _R_2 = 0.005). Points on the far right


correspond to the maximal detectable growth rate given our spacing of experimental time points. E, Linear slopes for the per strain regression of yield with growth rate; only 3/186 strains


exhibited a statistically significant correlation (linear regression) between rate and yield. The vertical line corresponds to the slope of the regression over all conditions. EXTENDED DATA


FIG. 3 CORRELATION BETWEEN PHENOTYPE DISTANCE AND DIFFERENT GENOMIC DISTANCES. A–C, Phenotype distance, defined as the cosine distance between consumption vectors, as a function of genomic


distance between pairs of strains, where the genomic distance is the GTDB-tk distance (A), the Bray–Curtis distance between gene content (B; based on copy numbers of KEGG KO) or module


content (C; based on abundance of KEGG modules). Points are the mean ± s.d. of logarithmic bins; _n_ = 16,471 total comparisons. EXTENDED DATA FIG. 4 DETAILED PRINCIPAL COMPONENT ANALYSIS OF


THE GROWTH CHARACTERIZATION RESULTS. A, Principal component analysis of the full growth rate matrix, reproduced from Fig. 1 in the main text. B, Averaged loadings of fine-grained categories


of substrates normalized to unit length. Detailed loadings of all substrates in the principal component analysis in A. The full principal component analysis shows a clear separation of


preferences for organic (including alcohols and aromatics) and amino acids. C,D, Individual loadings per substrate for each principal component (PC; left). Note that all acids have negative


loadings on PC1 but all but two organic acids switch sign on PC2 relative to amino acids.Scatter plots of the first principal component (based on full growth rate matrix) versus the SAP as


defined in the main text, and the second principal component versus the amino acid–organic acid preference defined analogously (right). Each point is a different isolate, coloured by


taxonomic order (as in Fig. 1). _P_ values are derived from linear regressions. EXTENDED DATA FIG. 5 COMPARISON WITH EXTERNAL DATASETS. A, Re-analysis of data from Kehe and colleagues11. The


heat map corresponds to their extended data fig. 2 (final optical density in each condition) except with rows and columns sorted by cosine similarity. B, Principal component analysis of


this matrix shows the clustering of the two taxonomic orders and their alignment with the average loadings of acids and sugars. C, Phylogenetic tree based on GTDB-tk of species contained in


the IJSEM and DEMETER trait databases as well as proGenomes (by species name). Note that two large phyla, Actinobacteriota and Firmicutes, are not at all represented in our strain library.


EXTENDED DATA FIG. 6 REPRODUCIBILITY BETWEEN EXPERIMENTS. A, Smooth histograms of the pairwise correlation coefficients between the growth vectors of strains across all three experiments


(V1, V2, V3; V3 is the experiment primarily discussed in the main text). B, Scatter plots of the SAP measured for each strain between all three replicate experiments. _P_ values are derived


from linear regressions. EXTENDED DATA FIG. 7 THREE MEASURES OF PATHWAY ABUNDANCE AND THEIR INTERRELATIONS. Completeness, coverage, and duplication are defined in detail in Methods. A,


Predicting coverage from completeness (linear model) generally yields higher quality fits than predicting coverage from duplication. B, After correcting for completeness, duplication tends


to explain more of the residuals than completeness does after correcting for duplication. C, Neither duplication nor coverage of any individual pathway correlated very strongly with SAP, and


whether duplication or coverage of a given pathway was more predictive of SAP depended on the pathway. D, Illustrating the concept of functional duplication on the example of the galactose


degradation pathway (KEGG pathway ko00052). Shown is the central part of the pathway that converts lactose and other oligosaccharides first to β-d-galactose, which is transformed through


multiple steps to α-d-glucose-6-phosphate, which then enters glycolysis. For some reaction, we found multiple orthologues in the same strains (for example, up to six orthologues of K01785


(galM, aldose 1-epimerase, EC:5.1.3.3). These orthologues are not exact duplicates, as illustrated by the tree on the right. The tree is based on a multiple sequence alignment of all


sequences annotated K01785 across all strains. We have highlighted the six copies found in the _Zobellia_ strains A2M03, which are spread around the tree and often grouped with orthologues


found in distantly related species. In fact, across all highly duplicated orthologues (maximum number of orthologues per strains of at least six), the pairwise distance (computed from the


multiple sequence alignments for each KEGG orthologue using the dist.ml function of the phangorn package in R), was about equally likely to be greater between orthologues in the same strain


relative to orthologues in different strains, as it was to be smaller. Thus, ‘duplicated’ orthologues in a strain probably represent functional variants of different evolutionary origin.


E,F, Average distances between KEGG orthologues within and between strains for genes associated with sugar and acid catabolism. The KEGG orthologues in black have a more than 10% difference


between the two distances. Points represent the mean ± s.e.m.; the number of comparisons differs for each gene, from _n_ = 496 to _n_ = 179,101. G, Comparison between measured and predicted


growth on individual substrates. Predicted growth was derived from FBA simulations of genome-scale metabolic models created using CarveMe using standard parameters (no gapfilling). This


procedure yielded 58% correct predictions (vertical line), which was within the range of correct predictions achieved when the comparison was performed with shuffled labels (distribution,


obtained by shuffling labels 1,000 times, each time measuring the proportion of correct predictions). EXTENDED DATA FIG. 8 THE NUMBER OF POLYSACCHARIDE-DEGRADING ENZYMES CORRELATES WITH SAP.


A–D, Number of CAZymes (A,B, glycosyl hydrolases; and C,D, polysaccharide lyases) and their correlation with SAPs (B,D). B,D, The insets show −log10_P_ per order, the negative log10 of the


_P_ value obtained from linear regressions of CAZyme number with SAP within each order; −log10_P_ > 2 (vertical line) corresponds to a significant correlation at the 5% level, Bonferroni


corrected for multiple testing. B, The square symbols correspond to the squares in Fig. 1d. These are exceptions to the median metabolic preference per order, such as the acid-specialist


_Tenacibaculum_ genus in the Flavobacteriales, which includes fish pathogens60. Conversely, the orders Pseudomonadales and Rhodobacterales (commonly thought to specialize in simple


substrates13) tended to prefer acids (SAP < 0), but we also found the sugar-specialist Pseudomonadales genus _Saccharophagus_, which are known sugar degraders61. The Flavobacteriales and


Pseudomonadales strains with atypical phenotypes for their taxonomy tended to have fewer/more CAZymes than their close relatives, respectively. Small points correspond to individual


isolates, large points with error bars indicate the mean ± s.d. for each order (A,C, _n_ = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales), 32


(Flavobacteriales)) or SAP bin (B,D, total number of strains _n_ = 182). EXTENDED DATA FIG. 9 GENOMIC GC CONTENT AND CONSEQUENCES FOR NUTRIENT REQUIREMENTS. A,The GC content (measured across


all predicted coding regions) is relatively conserved at the order level across our strain library (_n_ = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales)


and 32 (Flavobacteriales)). B, The GC content predicts the carbon and nitrogen requirements per coded amino acid. All protein sequences were manually scored according to the number of carbon


and nitrogen atoms of each amino acid. C, Same data as Fig. 3b without binning: GC content is correlated with genomic GC content across the whole set of strains but not within orders,


possibly because GC content evolves very slowly and is thus relatively conserved below the order level. Notably, this correlation was much stronger than the correlation between GC content


and other basic characteristics of the genomes, such as the number of coding regions (linear model fit, _P_ = 0.2), and there was no practically significant difference between the GC content


of genes in sugar- and acid-catabolic pathways (E). D, Because of the correlation between GC content and both nutrient requirements and SAP, SAP is positively/negative correlated with the


number of carbon/nitrogen atoms per coded amino acid. Small points correspond to individual strains, large points with error bars indicate the mean ± s.d. for the five main orders. Lines and


_P_ values are derived from linear regressions. E, The average GC content of sugar- and acid-catabolic genes are very similar. Scatter plot of the GC content of all genes annotated as


sugar/acid genes (Supplementary Table 5), extracted from the genomes and averaged per strain. The line corresponds to equal GC content in sugar/acid genes. F, Residuals of the linear fit in


A, showing a weak but statistically significant (_P_ = 6 × 10−16) trend for high GC genomes to have a slightly higher GC content in sugar genes than acid genes. G, Example for the


correlation and linear regression of pathway abundance with GC content in more than _n_ = 11,000 diverse reference genomes (proGenomes). H, Extracting the linear regression coefficients


(slopes) for each pathway, all of which were highly significant, yields a picture similar to Fig. 2b, that is, sugar pathways tended to decrease and acid pathways tended to increase in


abundance as a function of GC content. The slopes for sugar (_n_ = 7) and acid (_n_ = 26) pathways are significantly different from each other (_t_-test, dof = 31, _T_ = −4.26, _P_ = 


0.00017). EXTENDED DATA FIG. 10 DETAILS OF ENRICHMENTS AND SYNTHETIC COMMUNITY EXPERIMENTS. A, Taxonomic distribution and distribution of SAPs in the synthetic communities, coloured by order


(Fla, Flavobacteriales; Vib, Vibrionales; Alt, Alteromonadales; Pse, Pseudomonadales; Rho, Rhodobacterales; Cyt, Cytophagales). B, Richness over time in synthetic communities growing on one


of four carbon sources (Fig. 4a). Points with error bars indicate the mean ± s.d. across six replicates. C, Abundance-weighted average GC content of communities enriched on acids or sugars.


Genome-average GC for individual OTUs was estimated using SkewDB (Methods). The distributions are statistically significantly different (two-sided Welch’s _t_-test


\(T=6.95,{\rm{dof}}=13.8,{P}=7.5\times {10}^{-6}\)). D, Final richness in synthetic communities growing on four different concentrations of GlcNAc. The communities consisted of a complex


mixture of strains, of which only about half were capable of consuming GlcNAc in monoculture (consumers). The remaining species (crossfeeders) therefore must have been crossfeeding on


metabolites excreted by the consumers. E,F, Average number of C or N atoms per coded amino acid in the communities, weighted by the abundance of each strain. Shown is the average over the


last five time points. Asterisks indicate significant differences between conditions (_P_ = {2, 0.2, 5.8, 6.2} × 10−6 from top to bottom in E and _P_ = {0.01, 3.0, 3.8, 1,4} × 10−5 from top


to bottom in F) in a two-tailed Mann–Whitney test (using Bonferroni correction for multiple testing). D–F,H, Small points correspond to replicates (including different dilution factors, _n_ 


= 12 points per condition), large points with error bars indicate the mean ± s.d. G, Functional composition of synthetic communities growing on four different concentrations of GlcNAc as the


sole carbon (but not nitrogen) source. Final species compositions are shown as bar charts, where each species is coloured according to its SAP. At low GlcNAc concentrations, more


acid-specialist species (negative SAP, green tones) dominated. This trend was driven not by a change in the relative abundance of consumers (which was roughly constant across conditions) but


by both consumers and crossfeeders with lower SAP dominating at lower carbon concentrations. H, This pattern was remained when perturbing the communities. All four replicate communities at


the intermediate dilution factor (grown for six cycles at the highest and lowest concentration (20 and 0.02 mM GlcNAc, respectively) were transferred into all of the other concentrations, in


parallel to the unperturbed communities. Consistently with the unperturbed observation, an increase/decrease in GlcNAc concentration led to an increase/decrease in cSAP, respectively. This


effect was overall stronger for more severe perturbation, for example, compare the 20 mM to 2 mM switched communities (yellow) to the 20 mM to 0.02 mM switched communities (red).


SUPPLEMENTARY INFORMATION REPORTING SUMMARY SUPPLEMENTARY TABLES 1–8 Supplementary Table 1. List of strains. Supplementary Table 2. List of substrates. Supplementary Table 3. Full dataset of


growth rates. Supplementary Table 4. KEGG pathways used for SAP predictions. Supplementary Table 5. KOs used for SAP predictions. Supplementary Table 6. List of sugar/acid KOs in our


strains. Supplementary Table 7. Predicted SAP for reference genomes. Supplementary Table 8. OTUs for synthetic communities on four carbon sources. SOURCE DATA SOURCE DATA FIGS. 1–4 Source


data for Figs. 1–4. RIGHTS AND PERMISSIONS Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the


author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.


Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Gralka, M., Pollak, S. & Cordero, O.X. Genome content predicts the carbon catabolic preferences of heterotrophic bacteria.


_Nat Microbiol_ 8, 1799–1808 (2023). https://doi.org/10.1038/s41564-023-01458-z Download citation * Received: 08 February 2023 * Accepted: 24 July 2023 * Published: 31 August 2023 * Issue


Date: October 2023 * DOI: https://doi.org/10.1038/s41564-023-01458-z SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry,


a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative


Trending News

Submerged valleys and barrier reefs

ABSTRACT As I have never visited the Pacific Islands, I do not attempt to bring their valleys under the same category as...

Attach mallya’s properties to pmla case: delhi court

On 9 July, the court had cancelled the exemption from personal appearance granted to Mallya on an application of the ED ...

Lawmakers | Airport Takeover | PBS

Lawmakers Special | 9m 19s Capitol Correspondent Donna Lowry keeps us up-to-date with live hits from the capitol and bre...

What to watch after 'the last of us'

There was a time when the idea of a video game adaptation sounded alarm bells, with all but guaranteed audience disappoi...

North korea and us war threats reignite - kim refuses to denuclearise

Kim Jong-un’s totalitarian regime has handed a heavy blow to Trump by vowing to hold on to its nuclear weapons. The herm...

Latests News

Genome content predicts the carbon catabolic preferences of heterotrophic bacteria

ABSTRACT Heterotrophic bacteria—bacteria that utilize organic carbon sources—are taxonomically and functionally diverse ...

Miss e smith v red recruit ltd and red temps ltd and red temps sales management ltd: 3200787/2016

MISS E SMITH V RED RECRUIT LTD AND RED TEMPS LTD AND RED TEMPS SALES MANAGEMENT LTD: 3200787/2016 Employment Tribunal de...

Why the Bihar outcome matters - Hindustan Times

WHY THE BIHAR OUTCOME MATTERS ByHT Correspondent Nov 10, 2020 09:21 AM IST THE FIRST POLL IN THE PANDEMIC WILL SHAPE POW...

Autophasenn: unsupervised physics-aware deep learning of 3d nanoscale bragg coherent diffraction imaging

ABSTRACT The problem of phase retrieval underlies various imaging methods from astronomy to nanoscale imaging. Tradition...

Does ssdi change at retirement age?

Memorial Day Sale! Join AARP for just $11 per year with a 5-year membership Join now and get a FREE gift. Expires 6/4  G...

Top