Genome content predicts the carbon catabolic preferences of heterotrophic bacteria
Genome content predicts the carbon catabolic preferences of heterotrophic bacteria"
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
ABSTRACT Heterotrophic bacteria—bacteria that utilize organic carbon sources—are taxonomically and functionally diverse across environments. It is challenging to map metabolic interactions
and niches within microbial communities due to the large number of metabolites that could serve as potential carbon and energy sources for heterotrophs. Whether their metabolic niches can be
understood using general principles, such as a small number of simplified metabolic categories, is unclear. Here we perform high-throughput metabolic profiling of 186 marine heterotrophic
bacterial strains cultured in media containing one of 135 carbon substrates to determine growth rates, lag times and yields. We show that, despite high variability at all levels of taxonomy,
the catabolic niches of heterotrophic bacteria can be understood in terms of their preference for either glycolytic (sugars) or gluconeogenic (amino and organic acids) carbon sources. This
preference is encoded by the total number of genes found in pathways that feed into the two modes of carbon utilization and can be predicted using a simple linear model based on gene counts.
This allows for coarse-grained descriptions of microbial communities in terms of prevalent modes of carbon catabolism. The sugar–acid preference is also associated with genomic GC content
and thus with the carbon–nitrogen requirements of their encoded proteome. Our work reveals how the evolution of bacterial genomes is structured by fundamental constraints rooted in
metabolism. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access
Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 digital
issues and online access to articles $119.00 per year only $9.92 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be
subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR
CONTENT BEING VIEWED BY OTHERS TAXONOMIC AND ENVIRONMENTAL DISTRIBUTION OF BACTERIAL AMINO ACID AUXOTROPHIES Article Open access 22 November 2023 LINKING PROKARYOTIC GENOME SIZE VARIATION TO
METABOLIC POTENTIAL AND ENVIRONMENT Article Open access 27 March 2023 FREQUENCY OF CHANGE DETERMINES EFFECTIVENESS OF MICROBIAL RESPONSE STRATEGIES Article Open access 18 September 2023
DATA AVAILABILITY All growth and genomic data are available at https://doi.org/10.17632/xfh8t8568g.1. All isolates are available from either M.G. (Europe) or O.X.C. (USA) on request. All
genome assemblies are available under BioProjects PRJNA319196 and PRJNA478695, with the exception of strains 1A06 (PRJNA318805), 12B01 (PRJNA13568), 13B01 (PRJNA318805), DSS-3 (BioSample
SAMN02604003) as well as AS40, AS56, AS88 and AS94 (PRJNA996876). Source data are provided with this paper. CODE AVAILABILITY All code needed to reproduce the figures are available at
https://doi.org/10.17632/xfh8t8568g.1. REFERENCES * Huttenhower, C. et al. Structure, function and diversity of the healthy human microbiome. _Nature_ 486, 207–214 (2012). Article CAS
Google Scholar * Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. _Nature_ 551, 457–463 (2017). Article CAS PubMed PubMed Central Google
Scholar * Sunagawa, S. et al. Structure and function of the global ocean microbiome. _Science_ 348, 1261359 (2015). Article PubMed Google Scholar * Pontrelli, S. et al. Metabolic
cross-feeding structures the assembly of polysaccharide degrading communities. _Sci. Adv._ 8, eabk3076 (2022). Article CAS PubMed PubMed Central Google Scholar * Gralka, M., Szabo, R.,
Stocker, R. & Cordero, O. X. Trophic interactions and the drivers of microbial community assembly. _Curr. Biol._ 30, R1176–R1188 (2020). Article CAS PubMed Google Scholar * Pollak,
S. et al. Public good exploitation in natural bacterioplankton communities. _Sci. Adv._ 7, eabi4717 (2021). Article CAS PubMed PubMed Central Google Scholar * Moran, M. A. The global
ocean microbiome. _Science_ 350, aac8455 (2015). Article PubMed Google Scholar * Datta, M. S., Sliwerska, E., Gore, J., Polz, M. F. & Cordero, O. X. Microbial interactions lead to
rapid micro-scale successions on model marine particles. _Nat. Commun._ 7, 11965 (2016). Article CAS PubMed PubMed Central Google Scholar * Enke, T. N. et al. Modular assembly of
polysaccharide-degrading marine microbial communities. _Curr. Biol._ 29, 1528–1535 (2019). Article CAS PubMed Google Scholar * Fahimipour, A. K. & Gross, T. Mapping the bacterial
metabolic niche space. _Nat. Commun._ 11, 4887 (2020). Article CAS PubMed PubMed Central Google Scholar * Kehe, J. et al. Positive interactions are common among culturable bacteria.
_Sci. Adv._ 7, eabi7159 (2021). Article CAS PubMed PubMed Central Google Scholar * Kirchman, D. L. The ecology of _Cytophaga_–_Flavobacteria_ in aquatic environments. _FEMS Microbiol.
Ecol._ 39, 91–100 (2002). CAS PubMed Google Scholar * Buchan, A., LeCleir, G. R., Gulvik, C. A. & González, J. M. Master recyclers: features and functions of bacteria associated with
phytoplankton blooms. _Nat. Rev. Microbiol._ 12, 686–698 (2014). Article CAS PubMed Google Scholar * Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated
reconstruction of genome-scale metabolic models for microbial species and communities. _Nucleic Acids Res._ 46, 7542–7553 (2018). Article CAS PubMed PubMed Central Google Scholar *
Barberán, A., Caceres Velazquez, H., Jones, S. & Fierer, N. Hiding in plain sight: mining bacterial species records for phenotypic trait information. _mSphere_ 2, e00237-17 (2017).
Article PubMed PubMed Central Google Scholar * Mende, D. R. et al. ProGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of
prokaryotic genomes. _Nucleic Acids Res._ 48, D621–D625 (2020). CAS PubMed Google Scholar * Sueoka, N. Correlation between base composition of deoxyribonucleic acid and amino acid
composition of protein. _Proc. Natl Acad. Sci. USA_ 47, 1141–1149 (1961). Article CAS PubMed PubMed Central Google Scholar * Hellweger, F. L., Huang, Y. & Luo, H. Carbon limitation
drives GC content evolution of a marine bacterium in an individual-based genome-scale model. _ISME J._ 12, 1180–1187 (2018). Article CAS PubMed PubMed Central Google Scholar * Shenhav,
L. & Zeevi, D. Resource conservation manifests in the genetic code. _Science_ 370, 683–687 (2020). Article CAS PubMed Google Scholar * Mende, D. R. et al. Environmental drivers of a
microbial genomic transition zone in the ocean’s interior. _Nat. Microbiol._ 2, 1367–1373 (2017). Article CAS PubMed Google Scholar * Musto, H. et al. Genomic GC level, optimal growth
temperature, and genome size in prokaryotes. _Biochem. Biophys. Res. Commun._ 347, 1–3 (2006). Article CAS PubMed Google Scholar * Estrela, S. et al. Functional attractors in microbial
community assembly. _Cell Syst._ 13, 29–42 (2022). Article CAS PubMed Google Scholar * Amarnath, K. et al. Stress-induced metabolic exchanges between complementary bacterial types
underly a dynamic mechanism of inter-species stress resistance. _Nat. Commun._ 14, 3165 (2023). Article CAS PubMed PubMed Central Google Scholar * Estrela, S., Diaz-Colunga, J., Vila,
J. C., Sanchez-Gorostiaga, A., & Sanchez, A. Diversity begets diversity under microbial niche construction. Preprint at _bioRxiv_ https://doi.org/10.1101/2022.02.13.480281 (2022). *
Schink, S. J. et al. Glycolysis/gluconeogenesis specialization in microbes is driven by biochemical constraints of flux sensing. _Mol. Syst. Biol._ 18, e10704 (2022). Article CAS PubMed
PubMed Central Google Scholar * Basan, M. et al. A universal trade-off between growth and lag in fluctuating environments. _Nature_ 584, 470–474 (2020). Article CAS PubMed PubMed
Central Google Scholar * Plucain, J. et al. Epistasis and allele specificity in the emergence of a stable polymorphism in _Escherichia coli_. _Science_ 343, 160–164 (2014). Article Google
Scholar * Blount, Z. D., Borland, C. Z. & Lenski, R. E. Historical contingency and the evolution of a key innovation in an experimental population of _Escherichia coli_. _Proc. Natl
Acad. Sci. USA_ 105, 7899–7906 (2008). Article CAS PubMed PubMed Central Google Scholar * Le Gac, M., Plucain, J., Hindré, T., Lenski, R. E. & Schneider, D. Ecological and
evolutionary dynamics of coexisting lineages during a long-term experiment with _Escherichia coli_. _Proc. Natl Acad. Sci. USA_ 109, 9487–9492 (2012). Article PubMed PubMed Central Google
Scholar * Hershberg, R. & Petrov, D. A. Evidence that mutation is universally biased towards AT in bacteria. _PLoS Genet._ 6, e1001115 (2010). Article PubMed PubMed Central Google
Scholar * Ely, B. Genomic GC content drifts downward in most bacterial genomes. _PLoS ONE_ 16, e0244163 (2021). Article CAS PubMed PubMed Central Google Scholar * Maddamsetti, R. &
Grant, N. A. Divergent evolution of mutation rates and biases in the long-term evolution experiment with _Escherichia coli_. _Genome Biol. Evol._ 12, 1591–1603 (2020). Article CAS PubMed
PubMed Central Google Scholar * Yakovchuk, P., Protozanova, E. & Frank-Kamenetskii, M. D. Base-stacking and base-pairing contributions into thermal stability of the DNA double helix.
_Nucleic Acids Res._ 34, 564–574 (2006). Article CAS PubMed PubMed Central Google Scholar * Lassalle, F. et al. GC-content evolution in bacterial genomes: the biased gene conversion
hypothesis expands. _PLoS Genet._ 11, e1004941 (2015). Article PubMed PubMed Central Google Scholar * Shenhav, L. & Zeevi, D. Resource conservation manifests in the genetic code.
_Science_ 370, 683–687 (2020). Article CAS PubMed Google Scholar * Smriga, S., Ciccarese, D. & Babbin, A. R. Denitrifying bacteria respond to and shape microscale gradients within
particulate matrices. _Commun. Biol._ 4, 570 (2021). Article CAS PubMed PubMed Central Google Scholar * Gowda, K., Ping, D., Mani, M. & Kuehn, S. Genomic structure predicts
metabolite dynamics in microbial communities. _Cell_ 185, 530–546 (2022). Article CAS PubMed Google Scholar * Moran, M. A. et al. Genome sequence of _Silicibacter pomeroyi_ reveals
adaptations to the marine environment. _Nature_ 432, 910–913 (2004). Article CAS PubMed Google Scholar * Ben-Haim, Y. et al. _Vibrio coralliilyticus_ sp. nov., a temperature-dependent
pathogen of the coral _Pocillopora damicornis_. _Int. J. Syst. Evol. Microbiol._ 53, 309–315 (2003). Article CAS PubMed Google Scholar * Hehemann, J. H. et al. Adaptive radiation by
waves of gene transfer leads to fine-scale resource partitioning in marine microbes. _Nat. Commun._ 7, 12860 (2016). Article CAS PubMed PubMed Central Google Scholar * Bankevich, A. et
al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. _J. Comput. Biol._ 19, 455–477 (2012). Article CAS PubMed PubMed Central Google Scholar *
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
_Genome Res._ 25, 1043–1055 (2015). Article CAS PubMed PubMed Central Google Scholar * Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site
identification. _BMC Bioinform._ 11, 119 (2010). Article Google Scholar * Huerta-Cepas, J. et al. EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for
eukaryotic, prokaryotic and viral sequences. _Nucleic Acids Res._ 44, D286–D293 (2016). Article CAS PubMed Google Scholar * Zhang, H. et al. DbCAN2: a meta server for automated
carbohydrate-active enzyme annotation. _Nucleic Acids Res._ 46, W95–W101 (2018). Article CAS PubMed PubMed Central Google Scholar * Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. &
Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. _Bioinformatics_ 36, 1925–1927 (2020). Article CAS Google Scholar * Shen, W. & Ren, H. TaxonKit:
a practical and efficient NCBI taxonomy toolkit. _J. Genet. Genomics_ 48, 844–850 (2021). Article PubMed Google Scholar * Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R.
COBRApy: COnstraints-based reconstruction and analysis for Python. _BMC Syst. Biol._ 7, 74 (2013). Article PubMed PubMed Central Google Scholar * Wolfram Mathematica v. 13.2 (Wolfram,
2022). * R: A Language and Environment for Statistical Computing (R Core Team, 2022). * Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T. Y. Ggtree: an R package for visualization and
annotation of phylogenetic trees with their covariates and other associated data. _Methods Ecol. Evol._ 8, 28–36 (2017). Article Google Scholar * Paradis, E. & Schliep, K. Ape 5.0: an
environment for modern phylogenetics and evolutionary analyses in R. _Bioinformatics_ 35, 526–528 (2019). Article CAS PubMed Google Scholar * Tamura, K., Stecher, G. & Kumar, S.
MEGA11: Molecular Evolutionary Genetics Analysis version 11. _Mol. Biol. Evol._ 38, 3022–3027 (2021). Article CAS PubMed PubMed Central Google Scholar * Schliep, K. P. phangorn:
Phylogenetic analysis in R. _Bioinformatics_ 27, 592–593 (2011). Article CAS PubMed Google Scholar * Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a
reference resource for gene and protein annotation. _Nucleic Acids Res._ 44, D457–D462 (2016). Article CAS PubMed Google Scholar * Heinken, A. et al. Genome-scale metabolic
reconstruction of 7,302 human microorganisms for personalized medicine. _Nat. Biotechnol_. https://doi.org/10.1038/s41587-022-01628-0 (2023). * Heinken, A., Magnúsdóttir, S., Fleming, R. M.
T. & Thiele, I. DEMETER: efficient simultaneous curation of genome-scale reconstructions guided by experimental data and refined gene annotations. _Bioinformatics_ 37, 3974–3975 (2021).
Article CAS PubMed PubMed Central Google Scholar * Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. _Nat. Methods_ 13, 581–583 (2016). Article
CAS PubMed PubMed Central Google Scholar * Hubert, B. SkewDB, a comprehensive database of GC and 10 other skews for over 30,000 chromosomes and plasmids. _Sci. Data_ 9, 92 (2022).
Article CAS PubMed PubMed Central Google Scholar * Lagadec, E., Småge, S. B., Trösse, C. & Nylund, A. Phylogenetic analyses of Norwegian Tenacibaculum strains confirm high bacterial
diversity and suggest circulation of ubiquitous virulent strains. _PLoS One_ 16, e0259215 (2021). Article CAS PubMed PubMed Central Google Scholar * Ekborg, N. A. et al. Saccharophagus
degradans gen. nov., sp. nov., a versatile marine degrader of complex polysaccharides. _Int. J. Syst. Evol. Microbiol._ 55, 1545–1549 (2005). Article CAS PubMed Google Scholar Download
references ACKNOWLEDGEMENTS We thank S. Estrela (Yale University and Stanford University) for providing community composition data from their enrichment experiments (Fig. 4d); A. Sichert for
assembling genomes; and M. d. Bello, X. Shan, T. Hwa as well as all members of the Cordero laboratory and Simons PriME collaboration for their enriching discussions. We acknowledge funding
from the Simons Collaboration: Principles of Microbial Ecosystems (PriME) award number 542395 (O.X.C.) and Simons Foundation Postdoctoral Fellowship Award number 599207 (M.G.). AUTHOR
INFORMATION Author notes * Matti Gralka Present address: Systems Biology Group, Amsterdam Institute for Life and Environment (A-LIFE) and Amsterdam Institute of Molecular and Life Sciences
(AIMMS), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands * Shaul Pollak Present address: Division of Microbial Ecology, Centre for Microbiology and Environmental Systems Science,
University of Vienna, Vienna, Austria AUTHORS AND AFFILIATIONS * Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Matti Gralka,
Shaul Pollak & Otto X. Cordero Authors * Matti Gralka View author publications You can also search for this author inPubMed Google Scholar * Shaul Pollak View author publications You can
also search for this author inPubMed Google Scholar * Otto X. Cordero View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS M.G. designed the
study, performed all experiments, analysed all data and wrote the initial manuscript. S.P. analysed the genomic data from the proGenomes database. M.G., S.P. and O.X.C. discussed the
results. O.X.C. directed the project and edited the manuscript. CORRESPONDING AUTHORS Correspondence to Matti Gralka or Otto X. Cordero. ETHICS DECLARATIONS COMPETING INTERESTS The authors
declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Microbiology_ thanks Sara Mitri, Seppe Kuehn and the other, anonymous, reviewer(s) for their contribution to the
peer review of this work. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
EXTENDED DATA EXTENDED DATA FIG. 1 PHYLOGENETIC TREE OF ALL STRAINS USED IN THIS STUDY. The tree and taxonomy were created using the GTDB-tk classify workflow using standard parameters from
an alignment of 120 marker genes. The legend corresponds to expected substitutions per site. A full list of all strains is provided in Supplementary Table 1. EXTENDED DATA FIG. 2 OVERVIEW OF
GROWTH CHARACTERIZATION RESULTS. A, Number of carbon sources supporting growth per strain. B, Fraction of all strains that were able to use a given substrate as their sole carbon and energy
source. C, There was a lack of strong correlation between the number of carbon sources that support growth, growth rate and yield. Average yield (blue dots) and rate (red squares) binned by
the number of carbon sources that supported growth, shown as the mean ± s.d. (for a total of _n_ = 182 strains showing growth on at least one substrate). Lines and _P_ values are derived
from linear regressions. More generalist species (more carbon sources consumed) achieve slightly higher average yield but the effect size is likely not practically relevant. D, For each
condition (substrates × strain), we plotted the growth rate and yield, which are very slightly positively correlated (linear regression _P_ = 2 × 10−6, _R_2 = 0.005). Points on the far right
correspond to the maximal detectable growth rate given our spacing of experimental time points. E, Linear slopes for the per strain regression of yield with growth rate; only 3/186 strains
exhibited a statistically significant correlation (linear regression) between rate and yield. The vertical line corresponds to the slope of the regression over all conditions. EXTENDED DATA
FIG. 3 CORRELATION BETWEEN PHENOTYPE DISTANCE AND DIFFERENT GENOMIC DISTANCES. A–C, Phenotype distance, defined as the cosine distance between consumption vectors, as a function of genomic
distance between pairs of strains, where the genomic distance is the GTDB-tk distance (A), the Bray–Curtis distance between gene content (B; based on copy numbers of KEGG KO) or module
content (C; based on abundance of KEGG modules). Points are the mean ± s.d. of logarithmic bins; _n_ = 16,471 total comparisons. EXTENDED DATA FIG. 4 DETAILED PRINCIPAL COMPONENT ANALYSIS OF
THE GROWTH CHARACTERIZATION RESULTS. A, Principal component analysis of the full growth rate matrix, reproduced from Fig. 1 in the main text. B, Averaged loadings of fine-grained categories
of substrates normalized to unit length. Detailed loadings of all substrates in the principal component analysis in A. The full principal component analysis shows a clear separation of
preferences for organic (including alcohols and aromatics) and amino acids. C,D, Individual loadings per substrate for each principal component (PC; left). Note that all acids have negative
loadings on PC1 but all but two organic acids switch sign on PC2 relative to amino acids.Scatter plots of the first principal component (based on full growth rate matrix) versus the SAP as
defined in the main text, and the second principal component versus the amino acid–organic acid preference defined analogously (right). Each point is a different isolate, coloured by
taxonomic order (as in Fig. 1). _P_ values are derived from linear regressions. EXTENDED DATA FIG. 5 COMPARISON WITH EXTERNAL DATASETS. A, Re-analysis of data from Kehe and colleagues11. The
heat map corresponds to their extended data fig. 2 (final optical density in each condition) except with rows and columns sorted by cosine similarity. B, Principal component analysis of
this matrix shows the clustering of the two taxonomic orders and their alignment with the average loadings of acids and sugars. C, Phylogenetic tree based on GTDB-tk of species contained in
the IJSEM and DEMETER trait databases as well as proGenomes (by species name). Note that two large phyla, Actinobacteriota and Firmicutes, are not at all represented in our strain library.
EXTENDED DATA FIG. 6 REPRODUCIBILITY BETWEEN EXPERIMENTS. A, Smooth histograms of the pairwise correlation coefficients between the growth vectors of strains across all three experiments
(V1, V2, V3; V3 is the experiment primarily discussed in the main text). B, Scatter plots of the SAP measured for each strain between all three replicate experiments. _P_ values are derived
from linear regressions. EXTENDED DATA FIG. 7 THREE MEASURES OF PATHWAY ABUNDANCE AND THEIR INTERRELATIONS. Completeness, coverage, and duplication are defined in detail in Methods. A,
Predicting coverage from completeness (linear model) generally yields higher quality fits than predicting coverage from duplication. B, After correcting for completeness, duplication tends
to explain more of the residuals than completeness does after correcting for duplication. C, Neither duplication nor coverage of any individual pathway correlated very strongly with SAP, and
whether duplication or coverage of a given pathway was more predictive of SAP depended on the pathway. D, Illustrating the concept of functional duplication on the example of the galactose
degradation pathway (KEGG pathway ko00052). Shown is the central part of the pathway that converts lactose and other oligosaccharides first to β-d-galactose, which is transformed through
multiple steps to α-d-glucose-6-phosphate, which then enters glycolysis. For some reaction, we found multiple orthologues in the same strains (for example, up to six orthologues of K01785
(galM, aldose 1-epimerase, EC:5.1.3.3). These orthologues are not exact duplicates, as illustrated by the tree on the right. The tree is based on a multiple sequence alignment of all
sequences annotated K01785 across all strains. We have highlighted the six copies found in the _Zobellia_ strains A2M03, which are spread around the tree and often grouped with orthologues
found in distantly related species. In fact, across all highly duplicated orthologues (maximum number of orthologues per strains of at least six), the pairwise distance (computed from the
multiple sequence alignments for each KEGG orthologue using the dist.ml function of the phangorn package in R), was about equally likely to be greater between orthologues in the same strain
relative to orthologues in different strains, as it was to be smaller. Thus, ‘duplicated’ orthologues in a strain probably represent functional variants of different evolutionary origin.
E,F, Average distances between KEGG orthologues within and between strains for genes associated with sugar and acid catabolism. The KEGG orthologues in black have a more than 10% difference
between the two distances. Points represent the mean ± s.e.m.; the number of comparisons differs for each gene, from _n_ = 496 to _n_ = 179,101. G, Comparison between measured and predicted
growth on individual substrates. Predicted growth was derived from FBA simulations of genome-scale metabolic models created using CarveMe using standard parameters (no gapfilling). This
procedure yielded 58% correct predictions (vertical line), which was within the range of correct predictions achieved when the comparison was performed with shuffled labels (distribution,
obtained by shuffling labels 1,000 times, each time measuring the proportion of correct predictions). EXTENDED DATA FIG. 8 THE NUMBER OF POLYSACCHARIDE-DEGRADING ENZYMES CORRELATES WITH SAP.
A–D, Number of CAZymes (A,B, glycosyl hydrolases; and C,D, polysaccharide lyases) and their correlation with SAPs (B,D). B,D, The insets show −log10_P_ per order, the negative log10 of the
_P_ value obtained from linear regressions of CAZyme number with SAP within each order; −log10_P_ > 2 (vertical line) corresponds to a significant correlation at the 5% level, Bonferroni
corrected for multiple testing. B, The square symbols correspond to the squares in Fig. 1d. These are exceptions to the median metabolic preference per order, such as the acid-specialist
_Tenacibaculum_ genus in the Flavobacteriales, which includes fish pathogens60. Conversely, the orders Pseudomonadales and Rhodobacterales (commonly thought to specialize in simple
substrates13) tended to prefer acids (SAP < 0), but we also found the sugar-specialist Pseudomonadales genus _Saccharophagus_, which are known sugar degraders61. The Flavobacteriales and
Pseudomonadales strains with atypical phenotypes for their taxonomy tended to have fewer/more CAZymes than their close relatives, respectively. Small points correspond to individual
isolates, large points with error bars indicate the mean ± s.d. for each order (A,C, _n_ = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales), 32
(Flavobacteriales)) or SAP bin (B,D, total number of strains _n_ = 182). EXTENDED DATA FIG. 9 GENOMIC GC CONTENT AND CONSEQUENCES FOR NUTRIENT REQUIREMENTS. A,The GC content (measured across
all predicted coding regions) is relatively conserved at the order level across our strain library (_n_ = 28 (Pseudomonadales), 34 (Rhodobacterales), 20 (Vibrionales), 58 (Alteromonadales)
and 32 (Flavobacteriales)). B, The GC content predicts the carbon and nitrogen requirements per coded amino acid. All protein sequences were manually scored according to the number of carbon
and nitrogen atoms of each amino acid. C, Same data as Fig. 3b without binning: GC content is correlated with genomic GC content across the whole set of strains but not within orders,
possibly because GC content evolves very slowly and is thus relatively conserved below the order level. Notably, this correlation was much stronger than the correlation between GC content
and other basic characteristics of the genomes, such as the number of coding regions (linear model fit, _P_ = 0.2), and there was no practically significant difference between the GC content
of genes in sugar- and acid-catabolic pathways (E). D, Because of the correlation between GC content and both nutrient requirements and SAP, SAP is positively/negative correlated with the
number of carbon/nitrogen atoms per coded amino acid. Small points correspond to individual strains, large points with error bars indicate the mean ± s.d. for the five main orders. Lines and
_P_ values are derived from linear regressions. E, The average GC content of sugar- and acid-catabolic genes are very similar. Scatter plot of the GC content of all genes annotated as
sugar/acid genes (Supplementary Table 5), extracted from the genomes and averaged per strain. The line corresponds to equal GC content in sugar/acid genes. F, Residuals of the linear fit in
A, showing a weak but statistically significant (_P_ = 6 × 10−16) trend for high GC genomes to have a slightly higher GC content in sugar genes than acid genes. G, Example for the
correlation and linear regression of pathway abundance with GC content in more than _n_ = 11,000 diverse reference genomes (proGenomes). H, Extracting the linear regression coefficients
(slopes) for each pathway, all of which were highly significant, yields a picture similar to Fig. 2b, that is, sugar pathways tended to decrease and acid pathways tended to increase in
abundance as a function of GC content. The slopes for sugar (_n_ = 7) and acid (_n_ = 26) pathways are significantly different from each other (_t_-test, dof = 31, _T_ = −4.26, _P_ =
0.00017). EXTENDED DATA FIG. 10 DETAILS OF ENRICHMENTS AND SYNTHETIC COMMUNITY EXPERIMENTS. A, Taxonomic distribution and distribution of SAPs in the synthetic communities, coloured by order
(Fla, Flavobacteriales; Vib, Vibrionales; Alt, Alteromonadales; Pse, Pseudomonadales; Rho, Rhodobacterales; Cyt, Cytophagales). B, Richness over time in synthetic communities growing on one
of four carbon sources (Fig. 4a). Points with error bars indicate the mean ± s.d. across six replicates. C, Abundance-weighted average GC content of communities enriched on acids or sugars.
Genome-average GC for individual OTUs was estimated using SkewDB (Methods). The distributions are statistically significantly different (two-sided Welch’s _t_-test
\(T=6.95,{\rm{dof}}=13.8,{P}=7.5\times {10}^{-6}\)). D, Final richness in synthetic communities growing on four different concentrations of GlcNAc. The communities consisted of a complex
mixture of strains, of which only about half were capable of consuming GlcNAc in monoculture (consumers). The remaining species (crossfeeders) therefore must have been crossfeeding on
metabolites excreted by the consumers. E,F, Average number of C or N atoms per coded amino acid in the communities, weighted by the abundance of each strain. Shown is the average over the
last five time points. Asterisks indicate significant differences between conditions (_P_ = {2, 0.2, 5.8, 6.2} × 10−6 from top to bottom in E and _P_ = {0.01, 3.0, 3.8, 1,4} × 10−5 from top
to bottom in F) in a two-tailed Mann–Whitney test (using Bonferroni correction for multiple testing). D–F,H, Small points correspond to replicates (including different dilution factors, _n_
= 12 points per condition), large points with error bars indicate the mean ± s.d. G, Functional composition of synthetic communities growing on four different concentrations of GlcNAc as the
sole carbon (but not nitrogen) source. Final species compositions are shown as bar charts, where each species is coloured according to its SAP. At low GlcNAc concentrations, more
acid-specialist species (negative SAP, green tones) dominated. This trend was driven not by a change in the relative abundance of consumers (which was roughly constant across conditions) but
by both consumers and crossfeeders with lower SAP dominating at lower carbon concentrations. H, This pattern was remained when perturbing the communities. All four replicate communities at
the intermediate dilution factor (grown for six cycles at the highest and lowest concentration (20 and 0.02 mM GlcNAc, respectively) were transferred into all of the other concentrations, in
parallel to the unperturbed communities. Consistently with the unperturbed observation, an increase/decrease in GlcNAc concentration led to an increase/decrease in cSAP, respectively. This
effect was overall stronger for more severe perturbation, for example, compare the 20 mM to 2 mM switched communities (yellow) to the 20 mM to 0.02 mM switched communities (red).
SUPPLEMENTARY INFORMATION REPORTING SUMMARY SUPPLEMENTARY TABLES 1–8 Supplementary Table 1. List of strains. Supplementary Table 2. List of substrates. Supplementary Table 3. Full dataset of
growth rates. Supplementary Table 4. KEGG pathways used for SAP predictions. Supplementary Table 5. KOs used for SAP predictions. Supplementary Table 6. List of sugar/acid KOs in our
strains. Supplementary Table 7. Predicted SAP for reference genomes. Supplementary Table 8. OTUs for synthetic communities on four carbon sources. SOURCE DATA SOURCE DATA FIGS. 1–4 Source
data for Figs. 1–4. RIGHTS AND PERMISSIONS Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Gralka, M., Pollak, S. & Cordero, O.X. Genome content predicts the carbon catabolic preferences of heterotrophic bacteria.
_Nat Microbiol_ 8, 1799–1808 (2023). https://doi.org/10.1038/s41564-023-01458-z Download citation * Received: 08 February 2023 * Accepted: 24 July 2023 * Published: 31 August 2023 * Issue
Date: October 2023 * DOI: https://doi.org/10.1038/s41564-023-01458-z SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry,
a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative
Trending News
Submerged valleys and barrier reefsABSTRACT As I have never visited the Pacific Islands, I do not attempt to bring their valleys under the same category as...
Attach mallya’s properties to pmla case: delhi courtOn 9 July, the court had cancelled the exemption from personal appearance granted to Mallya on an application of the ED ...
Lawmakers | Airport Takeover | PBSLawmakers Special | 9m 19s Capitol Correspondent Donna Lowry keeps us up-to-date with live hits from the capitol and bre...
What to watch after 'the last of us'There was a time when the idea of a video game adaptation sounded alarm bells, with all but guaranteed audience disappoi...
North korea and us war threats reignite - kim refuses to denucleariseKim Jong-un’s totalitarian regime has handed a heavy blow to Trump by vowing to hold on to its nuclear weapons. The herm...
Latests News
Genome content predicts the carbon catabolic preferences of heterotrophic bacteriaABSTRACT Heterotrophic bacteria—bacteria that utilize organic carbon sources—are taxonomically and functionally diverse ...
Miss e smith v red recruit ltd and red temps ltd and red temps sales management ltd: 3200787/2016MISS E SMITH V RED RECRUIT LTD AND RED TEMPS LTD AND RED TEMPS SALES MANAGEMENT LTD: 3200787/2016 Employment Tribunal de...
Why the Bihar outcome matters - Hindustan TimesWHY THE BIHAR OUTCOME MATTERS ByHT Correspondent Nov 10, 2020 09:21 AM IST THE FIRST POLL IN THE PANDEMIC WILL SHAPE POW...
Autophasenn: unsupervised physics-aware deep learning of 3d nanoscale bragg coherent diffraction imagingABSTRACT The problem of phase retrieval underlies various imaging methods from astronomy to nanoscale imaging. Tradition...
Does ssdi change at retirement age?Memorial Day Sale! Join AARP for just $11 per year with a 5-year membership Join now and get a FREE gift. Expires 6/4 G...