Chromosome-scale genome assemblies of sexually dimorphic male and female acrossocheilus fasciatus
Chromosome-scale genome assemblies of sexually dimorphic male and female acrossocheilus fasciatus"
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
ABSTRACT _Acrossocheilus fasciatus_ is a stream-dwelling fish species of the Barbinae subfamily. It is valued for its colorfully striped appearance and delicious meat. This species is also
characterized by apparent sexual dimorphism and toxic ovum. Biology and aquaculture researches of _A. fasciatus_ are hindered by the lack of a high-quality reference genome. Here, we report
chromosome-level genome assemblies of the male and female _A. fasciatus_. The HiFi-only genome assemblies for both female and male individuals were 899.13 Mb (N50 length of 32.58 Mb) and
885.68 Mb (N50 length of 33.06 Mb), respectively. Notably, a substantial proportion of the assembled sequences, accounting for 96.15% and 98.35% for female and male genomes, respectively,
were successfully anchored onto 25 chromosomes utilizing Hi-C data. We annotated the female assembly as a reference genome and identified a total of 400.62 Mb (44.56%) repetitive sequences,
27,392 protein-coding genes, and 35,869 ncRNAs. The high-quality male and female reference genomes will provide genomic resources for developing sex-specific molecular markers, inform
single-sex breeding, and elucidate genetic mechanisms of sexual dimorphism. SIMILAR CONTENT BEING VIEWED BY OTHERS CHROMOSOME-LEVEL GENOME ASSEMBLY AND ANNOTATION OF THE GYNOGENETIC
LARGE-SCALE LOACH (_PARAMISGURNUS DABRYANUS_) Article Open access 26 January 2025 TWO HIGH QUALITY CHROMOSOME-SCALE GENOME ASSEMBLIES OF FEMALE AND MALE SILVER POMFRET (_PAMPUS ARGENTEUS_)
Article Open access 08 October 2024 A CHROMOSOME-LEVEL GENOME ASSEMBLY OF THE MALE DARKBARBEL CATFISH (_PELTEOBAGRUS VACHELLI_) USING PACBIO HIFI AND HI-C DATA Article Open access 27
February 2025 BACKGROUND & SUMMARY The Barbinae is a subfamily of the Cyprinidae that is the largest family of freshwater fishes. This subfamily contains the most complex and diverse
fish groups within the Cyprinidae1. Their morphologies and habits are highly diverse. For example, _Sinocyclocheilus rhinocerous_ dwells in caves and has evolved relevant traits2. Genome
sequences of several Barbinae species, including three species of genus _Sinocyclocheilus_ (_S. grahami, S. rhinocerous_, and _S. anshuiensis_), _Poropuntius huangchuchieni_, _Puntigrus
tetrazonahas_, and _Onychostoma macrolepis_, have been deciphered, largely due to their phylogeny features and notable evolutionary status2,3,4. Most of the species in the Barbinae had
undergone whole genome duplication after the third round of teleost-specific genome duplication (TGD) event that generated tetraploid even hexaploid5. However, some species remain diploids
that retain the original chromosome number 2n=50, such as _O. macrolepis, P. huangchuchieni and P. tetrazonahas_3,4,6. _Acrossocheilus fasciatus_ is also a diploid species in the Barbinae,
with chromosome number 2n=507. It is mainly found in streams south of the Yangtze River and is extremely popular with recreational fisheries due to its colorful appearance with six dark
stripes. It is a local delicacy and is considered highly nutritious8 by people in southeast China, especially in Zhejiang Province. However, because of its small size and slow growth rate9,
this fish is always in short supply and has great market prospects. In addition, _A. fasciatus_ is ichthyootoxic, with toxic ova10. The structures of the toxins remain unknown. Furthermore,
it is sexually dimorphic in both body mass and appearance (Fig. 1). The weight of a two-year-old mature female is approximately 1.5 times that of the mature male11. In mature males, the six
black transverse stripes gradually faded with the appearance of secondary sex characteristics such as the pearl organs and redness of the abdomen, whereas the females always retain the
transverse stripes. Despite its biological and economic importance, the genomic resources of _A. fasciatus_ are limited. Several studies on _A. fasciatus_ were focused on the mitochondrial
DNA or transcriptomes12,13,14,15,16. In this study, we sequenced and annotated the chromosome-scale genome assemblies of the male and female _A. fasciatus_ using PacBio HiFi reads and
high-throughput chromosome conformation capture (Hi-C) technologies. The genome size of female _A. fasciatus_ was estimated to be about 880.6 Mb through k-mer frequency distribution analysis
with 126.33 Gb (~143 × ) Illumina clean data. The female and male genomes were independently assembled into contigs with PacBio HiFi reads. The female genome assembly spans 899.13 Mb with a
contig N50 length of 32.58 Mb using 62.01 Gb (~70 × ) PacBio HiFi clean reads. The male genome spans 885.68 Mb with a contig N50 length of 33.06 Mb using 97.67 Gb (~111 × ) of HiFi clean
reads. 96.15% and 98.35% of contig sequences of the female (contigs N50 length = 32.35 Mb; scaffolds N50 length = 33.86 Mb) and male (contigs N50 length = 32.84 Mb; scaffolds N50 length =
33.78 Mb) genomes were anchored onto 25 chromosomes using Hi-C data (Supplementary Table 1). Finally, the female genome was annotated as a reference genome with 44.56% (400.62 Mb) of
repetitive sequences, 27,392 protein-coding genes, and 35,869 ncRNAs. The female and male genome assemblies reported here provide genomic resources for development of sex-specific molecular
markers and single-sex breeding as well as a better understanding of the mechanisms of sexual dimorphism. METHODS SAMPLE COLLECTION Two-year-old female and male adults of _A. fasciatus_ were
randomly sampled from the second-generation progeny of selective breeding performed in Dingxin Ecological Agriculture Co., Ltd. (Xiuning County, Huangshan City of Anhui Province, China).
The sampled fish were euthanized with MS-222 (Sigma-Aldrich, #A5040) and dissected on ice. Eight tissues including the brain, gill, heart, intestine, liver, ovary, muscle, and skin of one
female (body length = 16.23 cm, body weight = 43.56 g) were collected and immediately frozen in liquid nitrogen and then stored at −80 °C until DNA and RNA extraction. The blood and muscle
tissues of one male (body length = 13.05 cm, body weight = 26.73 g) were collected for DNA extraction. DNA EXTRACTION AND SEQUENCING FOR GENOMES The high-molecular weight (HMW) genomic DNA
from the female muscle and the male blood of _A. fasciatus_ was extracted using the phenol/chloroform method17. The quality and quantity of the extracted DNA were assessed using 1.0% agarose
gel electrophoresis and a Qubit 4.0 fluorometer (Thermo Fisher Scientific, USA). For PacBio sequencing, the high-quality DNA (main band > 30 kb) was randomly interrupted into 15–18 kb
size fragments by a Covaris g-TUBE (Woburn, Massachusetts, USA), and then the SMRTbell libraries were constructed using the PacBio HiFi Express Template Prep Kit 2.0 according to the
manufacturer’s instruction18 (Pacific Biosciences, Menlo Park, CA, USA). For the female genome assembly, we generated two cells of HiFi clean reads with 62.01 Gb (~70 × ) data and an N50
read length of 14.12 kb using PacBio Sequel IIe platform. For the male genome assembly, we generated only one cell of HiFi reads with 97.67 Gb (~111 × ) data and an N50 read length of 13.96
kb using PacBio Revio platform (Table 1). For Illumina sequencing, the DNA was randomly interrupted into ~350 bp fragments using the Covaris ultrasonic crusher. Libraries were constructed
using NEBNext® UltraTM DNA Library Prep Kit for Illumina (NEB, #E7370L) and sequenced on the Novaseq 6000 platform (Illumina, Inc., San Diego, CA, USA) with paired-end (PE) 150 bp model. We
also obtained 126.33 Gb (~143 × ) of Illumina short reads to survey the female genome (Table 1). For genome scaffolding, Hi-C libraries were prepared using muscle tissues from both female
and male individuals for PacBio genome sequencing. The Hi-C library construction, including cell crosslinking, cell lysis, chromatin digestion (_Mbo_I), biotin labeling, proximal chromatin
DNA ligation and DNA purification, was performed according to the standard protocol described previously19,20. After quality control assessment by Agilent 2100 Bioanalyzer and qPCR test, the
resulting Hi-C libraries were subjected to sequencing with PE 150 bp model on Illumina Novaseq. 6000 platform. As a result, a total of 137.24 Gb (~152 × ) and 104.69 Gb (~116 × ) raw read
data were generated for the female and male genome, respectively (Table 1). RNA EXTRACTION AND TRANSCRIPTOME SEQUENCING Eight sampled tissues, including the brain, gill, heart, intestine,
liver, ovary, muscle, and skin of the female _A. fasciatus_ were each extracted for total RNA using TRIzolTM reagent (Thermo Fisher Scientific, USA). The resulting RNAs were treated with
DNase I (NEB, USA) to remove the genomic DNA. To facilitate genome annotation, both Iso-Seq and RNA-Seq were performed. For PacBio Iso-Seq, the RNAs were mixed equimolarly and subjected to
sequencing. Specifically, the concentration, integrity, and purity of the RNA isolated from each tissue of the female were confirmed using Qubit, Agilent 2100 and Nanodrop, then pooled
together at an equimolar concentration. A double-stranded cDNA library was prepared with SMARTer® PCR cDNA Synthesis Kit (Clontech, USA). Subsequently, the cDNA library was sequenced using
the PacBio Sequel IIe platform. After filtering and treating using SMRTlink v11.0 (https://www.pacb.com/support/software-downloads/) with parameters–minLength=50, a total of 20.25 Gb of
subreads data were generated (Table 1). For Illumina RNA-seq, eight cDNA libraries from the aforementioned tissues were constructed independently and sequenced using Illumina NovaSeq 6000. A
total of 56.32 Gb clean data were generated after removing reads containing adapters, reads with more than 10% unknown nucleotides (Ns) or low-quality bases (more than 20% bases with Phred
quality < 5) (Table 1). _DE NOVO_ GENOME ASSEMBLY WITH PACBIO HIFI READS AND HI-C TECHNOLOGIES Before _de novo_ assembly, the size of the female genome was estimated with k-mer analysis
of Illumina reads. The Illumina clean reads were filtered to remove redundancy with in-house script redup.v2 developed by Novogene (Beijing, China), and utilized to calculate the k-mer
frequency with k=17 using Jellyfish v2.2.721,22. Based on the formula: genome size = k-mer number/peak depth, the female genome size of _A. fasciatus_ was estimated to be 880.6 Mb, with a
heterozygous ratio of 0.53% and repeat rate of 47.82% (Supplemental Fig. 1). PacBio HiFi reads from the female and the male individuals were assembled into the female contigs and the male
contigs using Hifasm v0.16.123 with default parameters. A total of 110 female contigs were built with a total length of 899,126,031 bp and an N50 length of 32.58 Mb. And a total of 174 male
contigs were built with a total length of 885,680,593 bp and an N50 length of 33.06 Mb. The Hi-C raw reads were processed to remove paired reads that contain adapters or low-quality bases
(more than 20% bases with Phred quality <5), and quality-controlled by HiCUP24. Subsequently, the contigs were anchored into 25 pseudo-chromosomes using ALLHiC pipeline25 with the clean
Hi-C data (Fig. 2a). Juicebox software was used to correct chromosome interaction strength artificially (Supplemental Fig. 2)26. As a result, 84 scaffolds of the female genome were generated
with a total length of 899,129,631 bp and an N50 length of 33.86 Mb, of which 96.15% (864,515,734 bp) was anchored onto 25 chromosomes (Tables 2, 3). 167 scaffolds of the male genome were
generated with a total length of 885,681,293 bp and an N50 length of 33.78 Mb, of which 98.35% (871,084,321 bp) was anchored onto 25 chromosomes (Tables 2, 3). Finally, we obtained the
high-quality chromosome-level male and female reference genomes with Hi-C technologies20 for genome characters analysis (Fig. 2b). GENOMIC SYNTENY ANALYSIS To assign the chromosome ID of _A.
fasciatus_ genomes and assess the accuracy of genome assemblies, we performed the genomic synteny analysis between zebrafish _Danio rerio_, and the female and male _A. fasciatus_. For
synteny analysis between the assemblies of zebrafish and female _A. fasciatus_, Mummer27 (v4.0.0beta2) was used to match the maximal unique sequences between the genomes with parameter
“–mincluster 500”. The matched sequence sets were filtered by removing the sets with sequence similarity less than 80%. For synteny analysis between the female and the male assemblies of _A.
fasciatus_, the matched sequence sets were filtered by removing the sets with sequence similarity of less than 95% and length less than 10 kb. Genomic synteny graphs were generated with the
matched sets using RectChr v1.36 (https://github.com/BGI-shenzhen/RectChr) (Fig. 2c). The synteny graphs indicated a moderate level of collinearity with minor rearrangements between the
genomes of zebrafish and _A. fasciatus_, and the genome assemblies of the female and male _A. fasciatus_ are remarkably accurate. No obvious chromosome structure variation was observed
between female and male genomes through synteny analysis. REPEAT ANNOTATION OF THE FEMALE GENOME The repeat sequences mainly consisted of interspersed repeats (mainly transposable elements,
TEs) and tandem repeats. The repeat sequences of TEs in the female _A. fasciatus_ genome were identified using a strategy combing homology alignment and _ab initio_ search. Tandem repeats
were predicted _ab initio_ using TRF28. Firstly, the homolog prediction of TEs was based on Repbase29 database employing RepeatMasker and RepeatProteinMask30 (https://www.repeatmasker.org/)
with default parameters. Secondly, _de novo_ repetitive elements were identified by LTR_FINDER31, RepeatScout32, and RepeatModeler33 with the default parameters. All repeat sequences with
length > 100 bp and a gap ‘N’ less than 5% constituted the _de novo_ TE library. Finally, a customized library (combination of homolog and _de novo_ TE library without redundancy) was
subjected to homology search using RepeatMasker to identify TEs. As a result, extensive repeat sequences including tandem repeats and interspersed repeats were detected in the genome,
accounting for approximately 44.56% (400.62 Mb) of the genome (Table 4), which is close to the repeat rate of 47.82% estimated by the genome survey. The tandem repeat sequences were 57.51 Mb
in length, accounting for 6.40% of the genome (Table 4). GENE PREDICTION AND FUNCTIONAL ANNOTATION Three strategies were used to predict gene structures in the female genome: homology
searching, _ab initio_ prediction, and transcriptome-assisted prediction. For homology searching, the homologous protein sequences of _Danio rerio_, _Ctenopharyngodon idella_, _Megalobrama
amblycephala_, _Poropuntius huangchuchieni_, _Puntigrus tetrazona_, _Onychostoma macrolepis_, and _Oryzias latipes_ were downloaded from NCBI database
(https://ftp.ncbi.nlm.nih.gov/genomes/refseq). Protein sequences were aligned to the genome using TBLASTN (v2.2.26; E-value ≤1e−5)34, and then the matched proteins were aligned to the
homologous genome sequences for accurate spliced alignments with GeneWise (v2.4.1)35 which was used to predict gene structure contained in each protein region. For gene predication _ab
initio_, AUGUSTUS36 (v3.2.3), GeneID37 (v1.4), GENSCAN38 (v1.0) and GlimmerHMM39 (v3.04) and SNAP40 (2013-11-29) were used in an automated gene prediction pipeline. For
RNA-sequencing-assisted prediction, transcriptome read assemblies were generated with Trinity (v2.1.1) for the genome annotation41. To optimize the genome annotation, the RNA-Seq reads from
different tissues were aligned to genome sequences using HISAT (v2.0.4) with default parameters to identify exon regions and splice positions42. The alignment results were then used as the
input for Cufflinks (v2.2.1) with default parameters for genome-based transcript assembly43. The non-redundant reference gene set was generated by merging genes predicted by three methods
with EvidenceModeler (EVM, v1.1.1) and then further annotated with PASA (Program to Assemble Spliced Alignment)44. As a result, we identified 27,392 protein-coding genes in the female
reference genome (Table 5, Supplemental Fig. 3a). Gene functions were assigned according to the best match by aligning the protein sequences to the Swiss-Prot45 (http://www.uniprot.org/)
using BLASTP (E-value ≤ 1e-5). The motifs and domains were annotated using InterProScan7046 (v5.31) (https://www.ebi.ac.uk/interpro/). The Gene Ontology (GO) IDs for each gene were assigned
according to the corresponding InterPro entry. We predicted the protein function by transferring annotations from the closest BLAST hit (E-value ≤ 1e-5) in the Swiss-Prot database and
DIAMOND (v0.8.22)/BLAST hit (E-value < 10-5) in the NR database (ftp://ftp.ncbi.nih.gov/blast/db). We also mapped the gene set to a KEGG pathway and identified the best match for each
gene47. As a result, 96.1% of the predicted 27,392 protein-coding genes have functional annotations (Supplementary Fig. 3b). For non-coding RNA (ncRNA) annotation, the tRNAs were predicted
using the program tRNAscan-SE48. Since rRNAs are highly conserved, the rRNA sequences of _Homo sapiens_ were chosen as references, and rRNA sequences were predicted using BLASTN (E-value ≤
1e-5). Other ncRNAs, including miRNAs and snRNAs were identified by searching against the Rfam database with default parameters using the infernal software49. Finally, a total of 35,869
ncRNAs were identified including 2,588 miRNAs, 18,386 tRNAs, 12,709 rRNAs, and 2,186 snRNAs (Supplementary Table 2). Furthermore, the male genome of _A. fasciatus_ was also annotated using
the annotation result of the female genome as a reference with the liftoff50 software, an accurate gene annotation mapping tool, capable of mapping genes from a reference genome to a target
genome. DATA RECORDS All the raw sequencing data for genome assembly have been deposited in the NCBI database (https://www.ncbi.nlm.nih.gov/bioproject). Specifically, for the female genome,
the Illumina WGS data (SRR2699340851-SRR2699340952), PacBio WGS data (SRR2699339353-SRR2699339454), transcriptome data (SRR26993400-SRR26993400755,56,57,58,59,60,61,62,SRR2699339263) and
Hi-C data (SRR26993395-SRR2699339964,65,66,67,68) were deposited under the BioProject accession number PRJNA1045882. For the male genome, the PacBio WGS data (SRR2712617969) and Hi-C data
(SRR2758855370) were deposited under the BioProject accession number PRJNA1049304. The final files of the assembled genome of _A. fasciatus_ have been deposited at GenBank under the
accession number JAXUIB000000000 (female)71 and JAZDCR000000000 (male)72. Meanwhile, all the data including the male and female genome sequences and annotation files are accessible through
the Figshare73. TECHNICAL VALIDATION Benchmarking Universal Single-Copy Orthologues (BUSCO)74, Core Eukaryotic Genes Mapping Approach (CEGMA)75, and Merqury software76 were used to evaluate
the genome assemblies. The BUSCO (v5.2.2) was used to evaluate the completeness of the genome assemblies with the vertebrata database (vertebrata_odb10). Out of the 3,354 orthologous genes,
3,304 (98.5%) genes were identified as complete genes, 16 (0.5%) genes were identified as fragmented genes, and 34 (1%) genes were missing from the female genome assembly (Fig. 3a). On the
other hand, 3,301 (98.5%) genes were identified as complete genes, 19 (0.5%) genes were identified as fragmented genes, and 34 (1%) genes were missing from the male genome assembly (Fig.
3b). Meanwhile, CEGMA (v2.5) evaluation was also considered for genome completeness evaluation. Out of the 248 Eukaryotic core genes, 235 (94.76%) genes and 233 (93.95%) were identified in
the female and male genomes, respectively (Supplementary Table 3). To further assess the completeness of genome assemblies, we identified telomeric repeats in both female and male genomes
using tidk (v0.2.41) (https://zenodo.org/records/10091385) with Cypriniformes-specific telomeric repeat sequences. The results demonstrated telomeric repeat sequences could be identified in
almost all of the chromosome ends (Supplementary Fig. 4). These results indicate an extremely high level of completeness of the genome assemblies. To evaluate the quality and accuracy of the
female genome assembly, we employed a three-step validation process. Firstly, the Illumina short-reads for the genome survey were mapped to genome assembly using BWA-MEM (v0.7.8)77 with
default parameters, and then SAMtools77 was used for SNP calling. As a result, 99.30% of reads were mapped to the genome with approximately 99.95% coverage. Subsequently, the base quality
value (QV) of genome sequences was quantified using Merqury software, resulting in a QV score of 52.22. All these results indicate a high-quality genome assembly. The GC skew of genome
assembly was calculated with a 10 kb slide window using SOAP.coverage (v2.7.7)78. GC content was 37.49% with no obvious separation, indicating no foreign contamination in the genome
(Supplementary Fig. 5). CODE AVAILABILITY There were no custom software codes developed. The tools used for reads quality control are non-open scripts developed by the Novogene (Beijing,
China). All bioinformatics tools and pipelines were performed following the instructions of the manuals and protocols. The versions of the software used, along with their corresponding
parameters, have been thoroughly described in the Methods section. REFERENCES * Zheng, L. P., Yang, J. X. & Chen, X. Y. Molecular phylogeny and systematics of the Barbinae (Teleostei:
Cyprinidae) in China inferred from mitochondrial DNA sequences. _Biochem. Syst. Ecol._ 68, 250–259 (2016). Article CAS Google Scholar * Yang, J. X. _et al_. The _Sinocyclocheilus_
cavefish genome provides insights into cave adaptation. _BMC Biol_. 14, (1) (2016). * Chen, L. _et al_. Chromosome-level genome of _Poropuntius huangchuchieni_ provides a diploid
progenitor-like reference genome for the allotetraploid _Cyprinus carpio_. _Mol. Ecol. Resour._ 21, 1658–1669 (2021). Article CAS PubMed Google Scholar * Li, J. T. _et al_. Parallel
subgenome structure and divergent expression evolution of allo-tetraploid common carp and goldfish. _Nat. Genet._ 53, 1493–1503 (2021). Article CAS PubMed PubMed Central Google Scholar
* Xu, M. R. X. _et al_. Maternal dominance contributes to subgenome differentiation in allopolyploid fishes. _Nat. Commun_. 14 (2023). * Cui, W. Y. _et al_. Embryonic development and
phylogenetic analysis of _Puntius tetrazona_. _Journal of Fisheries of China (in Chinese)_ 44, 1286–1295 (2020). Google Scholar * Jiang, J., Li, M. Y. & Wu, E. M. Chromosome karyotyping
of _Acrossocheilus fasciatus_. _Freshwater Fisheries of China (in Chinese)_ 39, 77–79 (2009). Google Scholar * Yu, Y. Y., Zhou, J. B., Zhang, Y. M. & Li, M. Y. The nutritional
compositions and evalution of wild and cultured _Acrossocheilus fasciatus_. _Journal of Fishery Sciences of China (in Chinese)_ 31, 207–210 (2012). CAS Google Scholar * Yan, Y. Z. _et al_.
Life-history strategies of _Acrossocheilus fasciatus_ (Barbinae, Cyprinidae) in the Huishui Stream of the Qingyi watershed, China. _Ichthyol. Res._ 59, 202–211 (2012). Article Google
Scholar * Wu, H. L. _New records of toxic and medicinal fishes in China_. (China Agriculture Press, 2002). * Zhang, Y. M., Cheng, S., Jiang, J. H., Lei, S. Y. & Yang, L. J. Primary
study on the growth of _Acrossocheilus fasciatus_ in cultivation. _Journal of Shanghai Ocean University (in Chinese)_ 21, 542–548 (2012). Google Scholar * Zhou, M. Y. _et al_. Historical
landscape evolution shaped the phylogeography and population history of the cyprinid fishes of _Acrossocheilus_ (Cypriniformes: Cyprinidae) according to mitochondrial DNA in Zhejiang
Province, China. _Diversity (Basel)_ 15 (2023). * Wei, Z. Z., Fang, Y., Shi, W., Chu, Z. J. & Zhao, B. Transcriptional modulation reveals physiological responses to temperature
adaptation in _Acrossocheilus fasciatus_. _Int. J. Mol. Sci_. 24 (2023). * Wei, W. B. _et al_. Integrated mRNA and miRNA expression profile analysis of female and male gonads in
_Acrossocheilus fasciatus_. _Biology_ 11 (2022). * Wang, L. _et al_. Influences of chronic copper exposure on intestinal histology, antioxidative and immune status, and transcriptomic
response in freshwater grouper (_Acrossocheilus fasciatus_). _Fish Shellfish Immunol_. 139 (2023). * Wang, L. _et al_. Dietary berberine against intestinal oxidative stress, inflammation
response, and microbiota disturbance caused by chronic copper exposure in freshwater grouper (_Acrossocheilus fasciatus_). _Fish Shellfish Immunol_. 139 (2023). * Green, M. R. &
Sambrook, J. Isolation of High-Molecular-Weight DNA using organic solvents. _Cold Spring Harb. Protoc._ 2017, pdb.prot093450 (2017). Article PubMed Google Scholar * Eid, J. _et al_.
Real-time DNA sequencing from single polymerase molecules. _Science_ 323, 133–138 (2009). Article ADS CAS PubMed Google Scholar * Belton, J. M. _et al_. Hi-C: A comprehensive technique
to capture the conformation of genomes. _Methods_ 58, 268–276 (2012). Article CAS PubMed Google Scholar * Rao, S. S. P. _et al_. A 3D Map of the human genome at kilobase resolution
reveals principles of chromatin looping. _Cell_ 159, 1665–1680 (2014). Article CAS PubMed PubMed Central Google Scholar * Marcais, G. & Kingsford, C. A fast, lock-free approach for
efficient parallel counting of occurrences of _k_-mers. _Bioinformatics_ 27, 764–770 (2011). Article CAS PubMed PubMed Central Google Scholar * Li, R. Q. _et al_. _De novo_ assembly of
human genomes with massively parallel short read sequencing. _Genome Res._ 20, 265–272 (2010). Article CAS PubMed PubMed Central Google Scholar * Cheng, H. Y., Concepcion, G. T., Feng,
X. W., Zhang, H. W. & Li, H. Haplotype-resolved _de novo_ assembly using phased assembly graphs with hifiasm. _Nat. Methods._ 18, 170–175 (2021). Article CAS PubMed PubMed Central
Google Scholar * Wingett, S. _et al_. HiCUP: pipeline for mapping and processing Hi-C data. _F1000Research_ 4, 1310 (2015). Article PubMed PubMed Central Google Scholar * Zhang, X. T.,
Zhang, S. C., Zhao, Q., Ming, R. & Tang, H. B. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. _Nat. Plants_ 5, 833–845 (2019). Article CAS PubMed
Google Scholar * Durand, N. C. _et al_. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. _Cell Syst._ 3, 95–98 (2016). Article CAS PubMed PubMed
Central Google Scholar * Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. _Nucleic Acids Res._ 30,
2478–2483 (2002). Article PubMed PubMed Central Google Scholar * Benson, G. Tandem repeats finder: a program to analyze DNA sequences. _Nucleic Acids Res_ 27, 573–580 (1999). Article
CAS PubMed PubMed Central Google Scholar * Jurka, J. _et al_. Repbase update, a database of eukaryotic repetitive elements. _Cytogenet. Genome Res._ 110, 462–467 (2005). Article CAS
PubMed Google Scholar * Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. _Curr. Protoc. Bioinform._ CHAPTER 4, Unit 4.10 (2004). Google Scholar * Xu, Z.
& Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. _Nucleic Acids Res._ 35, W265–W268 (2007). Article PubMed PubMed Central Google
Scholar * Price, A. L., Jones, N. C. & Pevzner, P. A. _De novo_ identification of repeat families in large genomes. _Bioinformatics_ 21, I351–I358 (2005). Article CAS PubMed Google
Scholar * Flynn, J. M. _et al_. RepeatModeler2 for automated genomic discovery of transposable element families. _Proc. Natl. Acad. Sci. USA_ 117, 9451–9457 (2020). Article ADS CAS
PubMed PubMed Central Google Scholar * Altschul, S. F. _et al_. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. _Nucleic Acids Res._ 25, 3389–3402
(1997). Article CAS PubMed PubMed Central Google Scholar * Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. _Genome Res._ 14, 988–995 (2004). Article CAS PubMed PubMed
Central Google Scholar * Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. _Bioinformatics_ 19, II215–II225 (2003). Article PubMed Google
Scholar * Parra, G., Blanco, E. & Guigó, R. GeneID in _Drosophila_. _Genome Res._ 10, 511–515 (2000). Article CAS PubMed PubMed Central Google Scholar * Burge, C. & Karlin, S.
Prediction of complete gene structures in human genomic DNA. _J.Mol. Biol._ 268, 78–94 (1997). Article CAS PubMed Google Scholar * Majoros, W. H., Pertea, M. & Salzberg, S. L.
TigrScan and GlimmerHMM: two open source _ab initio_ eukaryotic gene-finders. _Bioinformatics_ 20, 2878–2879 (2004). Article CAS PubMed Google Scholar * Korf, I. Gene finding in novel
genomes. _BMC Bioinform_. 5 (2004). * Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. _Bioinformatics_ 25, 1105–1111 (2009). Article CAS
PubMed PubMed Central Google Scholar * Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. _Nat. Methods._ 12, 357–360 (2015). Article
CAS PubMed PubMed Central Google Scholar * Trapnell, C. _et al_. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell
differentiation. _Nat. Biotechnol._ 28, 511–U174 (2010). Article CAS PubMed PubMed Central Google Scholar * Haas, B. J. _et al_. Automated eukaryotic gene structure annotation using
EVidenceModeler and the Program to assemble spliced alignments. _Genome Biol._ 9, R7 (2008). Article PubMed PubMed Central Google Scholar * Bairoch, A. & Apweiler, R. The SWISS-PROT
protein sequence database and its supplement TrEMBL in 2000. _Nucleic Acids Res._ 28, 45–48 (2000). Article CAS PubMed PubMed Central Google Scholar * Mulder, N. & Apweiler, R.
InterPro and InterProScan: tools for protein sequence classification and comparison. _Methods Mol. Biol. (Clifton, N.J.)_ 396, 59–70 (2007). Article CAS Google Scholar * Kanehisa, M.
& Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. _Nucleic Acids Res._ 28, 27–30 (2000). Article CAS PubMed PubMed Central Google Scholar * Lowe, T. M. & Eddy, S. R.
tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. _Nucleic Acids Res._ 25, 955–964 (1997). Article CAS PubMed PubMed Central Google Scholar *
Ashburner, M. _et al_. Gene Ontology: tool for the unification of biology. _Nat. Genet._ 25, 25–29 (2000). Article CAS PubMed PubMed Central Google Scholar * Shumate, A. & Salzberg,
S. L. Liftoff: accurate mapping of gene annotations. _Bioinformatics_ 37, 1639–1643 (2021). Article CAS PubMed PubMed Central Google Scholar * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993408 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993409 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993393 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993394 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993400 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993401 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993402 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993403 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993404 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993405 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993406 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993407 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993392 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993395 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993396 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993397 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR26993398 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR26993399 (2023). * _NCBI Sequence Read Archive_
https://identifiers.org/ncbi/insdc.sra:SRR27126179 (2023). * _NCBI Sequence Read Archive_ https://identifiers.org/ncbi/insdc.sra:SRR27588553 (2023). * _NCBI GenBank_
https://identifiers.org/ncbi/insdc:JAXUIB000000000 (2023). * _NCBI GenBank_ https://identifiers.org/ncbi/insdc:JAZDCR000000000 (2023). * Yuan, Y. X. The genome annotations of _Acrossocheilus
fasciatus_. _figshare_ https://doi.org/10.6084/m9.figshare.24995825 (2023). * Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: novel and streamlined
workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. _Mol. Biol. Evol._ 38, 4647–4654 (2021). Article CAS PubMed PubMed
Central Google Scholar * Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. _Bioinformatics_ 23, 1061–1067 (2007). Article
CAS PubMed Google Scholar * Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. _Genome
Biol_. 21 (2020). * Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. _Bioinformatics_ 25, 1754–1760 (2009). Article CAS PubMed PubMed Central
Google Scholar * Li, R. _et al_. SOAP2: an improved ultrafast tool for short read alignment. _Bioinformatics_ 25, 1966–1967 (2009). Article CAS PubMed Google Scholar Download
references ACKNOWLEDGEMENTS This work was financially supported by the National Key Research and Development Program of China (No.2022YFD2400102) and the National Natural Science Foundation
of China (No. 31872207). AUTHOR INFORMATION Author notes * These authors contributed equally: Yixin Yuan, Tianxing Zhong. AUTHORS AND AFFILIATIONS * Key Laboratory of Freshwater Aquatic
Genetic Resources certificated by the Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai, 201306, China Yixin Yuan, Tianxing Zhong, Yifei Wang, Jinquan Yang, Lang
Gui, Yubang Shen, Jiale Li, Mingyou Li & Jianfeng Ren * Zhejiang Forest Resource Monitoring Center, Hangzhou, 310020, China Jiajun Zhou * Department of Fisheries and Wildlife, Michigan
State University, East Lansing, MI, 48824, USA Yu-Wen Chung-Davidson & Weiming Li * Huangshan Dingxin Ecological Agriculture Co., Ltd, Huangshan, 245431, China Jinkai Xu Authors * Yixin
Yuan View author publications You can also search for this author inPubMed Google Scholar * Tianxing Zhong View author publications You can also search for this author inPubMed Google
Scholar * Yifei Wang View author publications You can also search for this author inPubMed Google Scholar * Jinquan Yang View author publications You can also search for this author inPubMed
Google Scholar * Lang Gui View author publications You can also search for this author inPubMed Google Scholar * Yubang Shen View author publications You can also search for this author
inPubMed Google Scholar * Jiajun Zhou View author publications You can also search for this author inPubMed Google Scholar * Yu-Wen Chung-Davidson View author publications You can also
search for this author inPubMed Google Scholar * Weiming Li View author publications You can also search for this author inPubMed Google Scholar * Jinkai Xu View author publications You can
also search for this author inPubMed Google Scholar * Jiale Li View author publications You can also search for this author inPubMed Google Scholar * Mingyou Li View author publications You
can also search for this author inPubMed Google Scholar * Jianfeng Ren View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS J.F.R., M.Y.L. and
J.L.L. conceived and supervised the study. T.X.Z., Y.F.W. and J.K.X. collected the samples. Y.X.Y., T.X.Z. and J.F.R. performed the bioinformatics analysis. Y.X.Y., T.X.Z. and J.F.R. drafted
the manuscript. J.Q.Y., L.G., Y.B.S., J.J.Z., Y.-W.C.-D. and W.M.L. provided review comments and modification of the manuscript. All authors read and approved the final manuscript.
CORRESPONDING AUTHORS Correspondence to Mingyou Li or Jianfeng Ren. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER’S
NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY FILES RIGHTS AND
PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The
images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not
included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Yuan, Y., Zhong,
T., Wang, Y. _et al._ Chromosome-scale genome assemblies of sexually dimorphic male and female _Acrossocheilus fasciatus_. _Sci Data_ 11, 653 (2024).
https://doi.org/10.1038/s41597-024-03504-9 Download citation * Received: 05 February 2024 * Accepted: 10 June 2024 * Published: 21 June 2024 * DOI: https://doi.org/10.1038/s41597-024-03504-9
SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy
to clipboard Provided by the Springer Nature SharedIt content-sharing initiative
Trending News
Narendra modi visits nawaz sharif's home, attends his grand-daughter's wedding* Home * News * Narendra Modi visits Nawaz Sharif’s home, attends his grand-daughter’s wedding INDIA PAKISTAN RELATIONSH...
Flight secrets: the safest airplane seat revealed – are you booked on?Flight passengers have a variety of options available to them during the booking process for their dream getaway. Aside ...
Crescenzi: a scary drop in commercial paperFor a third week, the total amount of commercial paper outstanding plunged, falling a record $94.9 billion in the week e...
Meet five small-pet-owning new yorkers -- new york magazine - nymagLISA REINKE & HER RABBITS SPOT AND THELMA _Flatbush_ “I moved into an apartment with two bedrooms—one just for the r...
Learning french: when and why would we describe someone as un bisounours?THIS CUTE CULTURAL REFERENCE IS ACTUALLY USED TO CRITICISE Care Bears: the cartoon team of rainbow teddies with adorable...
Latests News
Chromosome-scale genome assemblies of sexually dimorphic male and female acrossocheilus fasciatusABSTRACT _Acrossocheilus fasciatus_ is a stream-dwelling fish species of the Barbinae subfamily. It is valued for its co...
French rail network hit by ‘malicious’ arson attacks hours before paris 2024 olympics opening ceremony* FRANCE'S HIGH-SPEED RAIL NETWORK WAS HIT BY "MALICIOUS" ATTACKS ON THE MORNING OF FRIDAY, JULY 26 * THE...
New twists in italian seismology trialCalifornian scientist testifies against defendants in quake manslaughter case. Access through your institution Buy or su...
The year without a summer | Nature GeoscienceThe 1815 eruption of Tambora caused an unusually cold summer in much of Europe in 1816. The extreme weather led to poor ...
iLounge | TechCrunchSAVE NOW THROUGH JUNE 4 FOR TECHCRUNCH SESSIONS: AI SAVE $300 ON YOUR TICKET TO TC SESSIONS: AI—AND GET 50% OFF A SECOND...