Application of rna-seq for single nucleotide variation identification in a cohort of patients with hypertrophic cardiomyopathy

Nature

Application of rna-seq for single nucleotide variation identification in a cohort of patients with hypertrophic cardiomyopathy"


Play all audios:

Loading...

ABSTRACT A variety of techniques for DNA sequencing, such as specific gene sequencing, whole genome sequencing, or exome sequencing, are currently used to detect single nucleotide variations


(SNVs). Although RNA-seq can be used to identify SNVs, studies that employ this approach are uncommon, and those that do often rely on outdated mapping methods or methods that are more


suitable for genomic and exomic alignment. In this work, our aim is to apply modern RNA-seq specific alignment method in order to identify SNV in a cohort of HCMP patients, and characterize


those SNV to gain insight into possible mechanisms of HCMP pathogenesis. The algorithm of identification of SNV based on transcriptomic sequencing data has been developed and evaluated. The


algorithm was evaluated and the optimal quality threshold was determined based on allelic discrimination for the rs397516037 mutation (_MYBPC3_ c.3697 C > T) among patients. A total of


42,809 SNVs with a quality of 75 or higher were identified in 48 transcriptomes of hypertrophic cardiomyopathy (HCMP) myocardial tissue. Verification of missense and nonsense variants in key


HCMP genes using Sanger sequencing confirmed the accuracy of the pipeline results. To identify variants potentially associated with HCMP pathogenesis, a filtration process was conducted


based on minor allele frequency, substitution prediction score and ClinVar outcome. 214 missense mutations and 6 nonsense mutations were selected. Together with nonsense mutations, 19


mutations meeting the strictest SIFT and PolypPhen criteria were identified as potential factors influencing HCMP pathogenesis. We have developed and validated a method for identifying SNVs


based on transcriptomic data, which can be used to identify putative pathogenic variants. We identified mutations in key HCMP genes _MYBPC3_ and _MYH7_ in a cohort of patients. We also found


potentially pathologic mutations in genes _ANXA6_ and _FEM1 A_ and obtained data supporting the role of _NEBL_ in myocardial diseases. This method would be useful in analyzing


transcriptomic data available in the Gene Expression Omnibus, but should be used with caution as we have tested it on a specific disease. SIMILAR CONTENT BEING VIEWED BY OTHERS


CHARACTERIZATION OF CLINICALLY RELEVANT COPY-NUMBER VARIANTS FROM EXOMES OF PATIENTS WITH INHERITED HEART DISEASE AND UNEXPLAINED SUDDEN CARDIAC DEATH Article 25 September 2020 PARSECNV2:


EFFICIENT SEQUENCING TOOL FOR COPY NUMBER VARIATION GENOME-WIDE ASSOCIATION STUDIES Article 01 November 2022 STRUCTURAL VARIANT CALLING AND CLINICAL INTERPRETATION IN 6224 UNSOLVED RARE


DISEASE EXOMES Article Open access 31 May 2024 INTRODUCTION Various forms of genomic DNA sequencing are presently used for identification of single nucleotide variations (SNVs) associated


with hereditary diseases. These methods vary from sequencing of particular genes in precisely diagnosed and well-studied monogenic diseases up to full genome sequencing for diseases with an


unidentified genetic component. A cheaper alternative for full genome sequencing is represented by exome sequencing, clinical exome sequencing, and assays targeting a set of particular


single nucleotide polymorphisms (SNPs), associated with various diseases with hereditary components1,2,3. Full exome sequencing can identify variations in coding regions of a genome or in


closely adjacent segments of intron regions, while not investigating intergenic noncoding regions and most of the introns. This omission of most noncoding regions is generally considered


well justified, as the majority of currently known mutations and polymorphisms associated with or causative of hereditary diseases are located in coding regions. These mutations typically


result in an amino acid substitution (commonly referred to as a ‘missense mutation’), the appearance of a new stop codon (a ‘nonsense mutation’), or shifts in the open reading frame. The


variants in coding regions are likely to have a higher effect size, and their functional consequences are easier to interpret. For common diseases, however, causal variants are often


regulatory, affecting the expression of nearby genes. This makes the interpretation of intergenic variants more challenging. While emphasis can be placed on coding variants, the significance


of noncoding variants should not be underestimated. Since those kinds of mutations can only be located in coding regions of mRNA, it is possible to identify them using transcriptome


sequencing. Identifying variations in transcripts could allow for identification of both somatic and germline mutations and can simultaneously provide information on transcription levels of


mutated mRNA. It is also worth noting that transcribed regions of the genome comprise only a small fraction of the full genome, which allows higher coverage of these regions with smaller


number of reads and, consequently, lesser expenses on sequencing4. Also, since the RNA for transcriptome sequencing is usually extracted from a particular tissue of interest in the


pathogenesis of an investigated disease (e.g., heart tissue in hypertrophic cardiomyopathy (HCMP)), genes that are not expressed in this tissue and, therefore, would likely not be related in


any way to disease pathogenesis would be excluded from analysis automatically. Full transcriptome sequencing is widely used in investigations of various diseases. The results of many such


investigations are publicly available in open databases, such as Gene Expression Omnibus5. These accumulated data could be used for identification of novel mutations and disease-associated


variants, which would expand the current understanding of genetic components of a wide range of diseases. Despite using RNA-seq in order to identify SNV being a possibility, studies


employing this approach are quite rare, and ones that do often use deprecated methods of mapping6,7 or methods of mapping which are better suited for genomic and exomic alignment4.


Therefore, in this work, our aim is to apply modern RNA-seq specific alignment method in order to identify SNV in a cohort of HCMP patients, and characterize those SNV to gain insight into


possible mechanisms of HCMP pathogenesis. HCMP is a disease characterized by hypertrophy of the left ventricle and subsequent clinical consequences, including sudden cardiac death (SDC),


heart failure, and atrial fibrillation, followed by embolic stroke8,9,10. HCMP is usually considered to be a monogenic disease with a heterogenic hereditary component and autosomal dominant


type of inheritance10. Most cases of HCMP are associated with pathogenic variants in the main sarcomeric genes (_MYH7_, _MYBPC3_, _TNNT2_, _TNNI3_, _MYL2_, _MYL3_, _TPM1_, _ACTC1_)9,10.


However, these are not the only genes associated with HCMP pathogenesis, and currently over 1400 different associated mutations in dozens of additional genes have been identified (Chakova et


al., 2017). The genetic heterogeneity of HCMP makes it a fitting subject of inquiry in investigation of RNA-seq’s potential in identification of novel, pathogenically significant SNVs. Our


research group is currently working on transcriptome profiling of the myocardium of HCMP patients. The goals of the current study are to develop a pipeline for SNV identification from


RNA-seq data and apply this pipeline to identification of putative pathologically significant SNV using accumulated RNA-seq data. MATERIALS AND METHODS ETHICAL COMPLIANCE The study was


conducted in accordance with the World Medical Assembly Declaration of Helsinki. The study was approved by the Ethics Committees of Institute of Molecular Genetics of National Research


Centre “Kurchatov Institute” (Protocol №22/5, 16.12.2022). Written informed consent was obtained from all participating patients and families. PATIENT COHORT Forty-eight unrelated adult


patients with HCMP have been used in this study. The average age of a patient was 52 ± 24.48 years. The ratio of male to female patients was 1: 1 (m/f). Characteristics of the patients


enrolled in the study are presented in the Supplementary Table 2. Patients were diagnosed in concordance with the _ESC Guidelines_ on diagnosis and management of hypertrophic cardiomyopathy


(a ≥ 15-mm-thick interventricular septum with no other identified causes of hypertrophy)11. Sixteen out of 48 patients had an older relative with SCD (6) and/or HCMP (11). Forty-three out of


48 patients have been tested for rs397516037 mutation (_MYBPC3_ c.3697 C > T), using TaqMan allelic discrimination real-time PCR assay for detection of single nucleotide substitution.


Primer and probe sequences used in the assay are found in Supplementary Table 1 on page 3 on the Supplementary Material. MYOCARDIUM TISSUE PREPARATION, RNA EXTRACTION, AND RNA SEQUENCING


Myocardium bioptates were placed in RNALater (Invitrogen, United States) solution immediately after extraction, then stored at + 4 °C for 24 h, and then refrigerated at −20 °C for


transportation. Subsequently, samples were stored at −80 °C. Total RNA was extracted using TRIzol (Invitrogen, United States), in accordance with manufacturer’s recommendations. The quality


and quantity of total RNA were measured using BioAnalyser with RNA 6000 Nano Kit (Agilent, United States). A poly(A) fraction of RNA was extracted, and libraries for sequencing were prepared


using an NEBNext® mRNA Library Prep Reagent Set (NEB, United States). Sequencing was performed by Genoanalitica (Russia, Moscow) using an HiSeq 1500 (Illumina, United States), generating no


less than 15 million 50 bp reads. RNA-SEQ DATA PREPARATION AND ALIGNMENT Removal of ambiguous and low-quality nucleotides from FASTQ was performed using an AdapterRemovalV212. Read


alignment to the GRCH38 genome was conducted using the “rsem-calculate-expression” command in RSEM13 and STAR14 tools with an enabled “-star” option. SNV ANALYSIS BAM files obtained in the


previous step were sorted using «samtools sort» from SAMtools15. Sorted BAM files were converted into pileup with the «bcftools mpileup» command from BCFtools16. An SNV call was performed


using the «bcftools call» command with subsequent filtration with the «bcftools filter -i QUAL» command. Obtained VCF files were compressed and indexed using “bgzip” and “tabix” from the


HTSlib library17. Then variants were sorted by quality. A quality threshold of 75 was selected based on concordance of call and filtration results to previously obtained TaqMan allelic


discrimination results on rs397516037 in such a way that neither false positives or false negatives were identified in the genomic position of rs397516037. GENOMIC DNA EXTRACTION Genomic DNA


extraction from peripheral blood and myocardium tissues (in cases in which no blood was available) was performed using a Quick-DNA™ Miniprep Plus Kit (Zymo Research Corp., United States) in


accordance with the manufacturer’s recommendations. DNA concentration was determined using a Qubit 3.0 0 fluorometer and Qubit dsDNA BR (Broad-Range) Assay Kit (Invitrogen™, United States).


SNV CONFIRMATION USING SANGER SEQUENCING SNV was confirmed in eight patients, identified using RNA-seq, using Sanger sequencing. Primers were designed based on GRCh38 genome assembly using


Premier Biosoft International Beacon Designer 7.0 (Palo Alto, United States) (Supplementary Table 1 on page 3 on the Supplementary Material). Target sequence amplification was performed


using the QuantStudio 3 (Thermo Fisher Scientific, United States) and PCR reagents (Sintol and DNK-Sintez (Russia), Thermo Fisher Scientific (United States)). The reaction mix included: 3 µl


of buffer solution (x10), 3 µl dNTP (2 nmol/µl each), 3 µl MgCl2 (25 nmol/µl), 1 µl of each primer (10 pmol/µl), 1 µl of each probe (5 pmol/µl), 0.2 µl Taq-polymerase (5 U/µl), and up to 30


µl of ultrapure water. The amplification cycle protocol was as following: 180 s at 95 °C, then 40 cycles of 5 s at 95 °C and 20 s at 60 °C. Obtained target fragments were separated using


gel electrophoresis in 2% agarose gel, cut, and purified with a Cleanup S-Cap kit (Evrogen, Russia). Sanger sequencing was performed by Evrogen (Russia, Moscow). Sequencing was performed in


both the forward and backward directions where possible. RESULTS The pipeline developed for SNV identification from RNA-seq is presented in Fig. 1. AdapterRemovalV2 is used for removal of


ambiguous and low-quality nucleotides from the fastq libraries. Read mapping to the reference genome is performed using STAR and RSEM by the «rsem-calculate-expression» command with the


«-star» option enabled. Alignment rates for each alignment obtained are presented in Supplementary Table 3 on page 4 on the Supplementary Material. Bam files are then sorted using the


«samtools sort» command for SAMtools. Sorted bam files are then converted into pileup format using «bcftools mpileup», and then SNVs are called from pileup into VCF format using the


«bcftools call» command for BCFtools. VCF files are filtered based on quality with the «bcftools filter -i QUAL» command, and then compressed and indexed using bgzip and tabix from the


HTSlib library17. Called variants were split into groups based on quality (< 20, 20–50, 50–75, 75–100, 100–120, 120–140, > 140) in order to select a proper threshold for quality


filtration. In order to evaluate the developed algorithm and aid in threshold selection, allelic discrimination data for rs397516037 (_MYBPC3_ c.3697 C > T) mutation in patient cohort was


obtained. Four patients were identified as carriers of heterozygous rs397516037 (C/T). Thirty-nine patients had a major allele in a homozygous (CC) state. Five patients were not tested due


to an absence of blood samples. Using these data, the upper threshold of quality for filtration has been chosen, based on the minimal quality at which false negatives were observed, of 120


quality. Notably, no false positives were observed across all quality thresholds applied. For each quality group, a number of randomly selected variants were manually checked using IGV. The


minimal threshold of quality was selected in such a way that at least five reads were found for each of the alleles in heterozygous states. Based on this, the minimal threshold of quality


for filtration was determined to be 75. Since no false positives were detected in the previously verified rs397516037 genomic position at the threshold of 75, this threshold was used for


further analysis. A total of 42 809 SNVs with 75 or higher quality were identified in 48 transcriptomes of HCMP myocardial. Next, in order to identify variants, potentially associated with


HCMP pathogenesis, a filtration based on minor allele frequency (MAF), substitution prediction score and ClinVar outcome have been conducted. 35,574 SNV were filtered out based on having


higher than 1% MAF in either 1000 genomes, Gnomad genome or Gnomad exome databases. Then, out of remaining SNV missense (214) (Supplementary Table 4) and nonsense mutations (6) (Table 1)


have been selected. Two groups of missense mutations were filtered based on SIFT and PolyPhen scores – a group (19) with maximum PolypPhen (1) and SIFT (0) score (Table 2), and a wider


selection of missenses (214) with PolypPhen predictions being “possibly_damaging” or “probably_damaging” and SIFT predictions being either “deleterious” or “deleterious_low_confidence”


(Supplementary Table 4). Together with nonsense mutations, 19 mutations filtered by the strictest SIFT and PolypPhen criteria can be considered as potential factors, impacting pathogenesis


of HCMP (Table 2). In order to further check the results obtained using the pipeline, missense and nonsense variants in hallmark HCMP genes _MYBPC3_ and _MYH7_ were selected (Table 3). Of


the 8 obtained, 7 were verified using Sanger sequencing, with the exception of one missense mutation in _MYH7_ (p.D1378G) for which there was not enough material for verification. The


results of this verification can be found in Supplementary Fig. 1 on page 1 on the Supplementary Material. DISCUSSION At present, very few studies considering use of RNA-seq for SNV


identification are published. In a study by Chepelev et al., the high cost of full genome sequencing, as compared to transcriptome sequencing, is emphasized4. It is worth noting that exome


sequencing is also cheaper than full genome sequencing. However, this is most likely not considered in the study by Chepelev because of timing, since the paper by Chepelev et al. was


published only a month after a study that introduced exome sequencing as a cheaper alternative for full genome sequencing for identification of novel variants18, and both papers were likely


in progress simultaneously. However, Chepelev et al. also considers the ability to simultaneously evaluate both presence of SNVs and changes in expression as advantages of transcriptome


sequencing as compared to full genome sequencing4, which would also be an advantage of transcriptome sequencing as compared to exome sequencing. In the study by Chepelev et al. (Chepelev et


al., 2009), 20 SNVs were selected to be verified. Regions containing 18 out of 20 of these SNVs were successfully amplified and sequenced using Sanger sequencing, with 16 out of 18 being


confirmed4. One of the unconfirmed SNVs was actually confirmed to be present in mRNA, based on cDNA sequencing. This result demonstrates that changes in the structure of the protein can


emerge as a result of RNA editing4. The possibility to identify such variation, which ultimately leads to a change in protein structure without appearing in genomic DNA, can also be


considered an advantage of transcriptome sequencing. In a paper by Cirulli et al.1, a comparison between identification of SNVs using transcriptomic sequencing and genomic sequencing on


samples derived from the same individuals was conducted. This comparison allowed both evaluating the sensitivity and specificity of transcriptome sequencing for SNV identification and


estimating the effects of SNVs on gene expression. When considering all the coding SNVs, RNA-seq has only captured 41% of SNVs captured by genomic sequencing. However, when considering only


the genes expressed in the target tissue, the intersection comprised 81% SNV1. Overall, 48 740 SNVs were identified in genomic DNA and 40 605 SNVs were identified in cDNA. A total of 19 054


of them were common to both genomic and cDNA1. Both sensitivity (number of true positives divided by the sum of true positives and false negatives) and specificity (number of true positives


divided by the sum of true positives and false positives) were evaluated for transcriptome sequencing, on the basis of considering SNVs to be correctly identified by genomic DNA sequencing1.


Quality filter thresholds chosen in the study were optimized in such a way as to maximize both sensitivity and specificity. When considering all the genes, including ones that are not


expressed in the target tissue, sensitivity amounted to 0.39 and specificity amounted to 0.47. Identified SNVs were also compared to dbSNV records. It has been noted that 94% of true


positives were found in dbSNV, whereas only 23% of false positives and 89% of false negatives were found in dbSNV1. Also, the percentage of SNV intersection with dbSNV was found to be


inversely correlated with coverage in transcriptomic sequencing a lower proportion of false positive SNVs were found in regions with higher coverage1. We have also analyzed pipelines used in


studies, investigating identification of SNV in RNA-seq data, and compared those with our own approach, paying special attention to tools used to map the reads and tools used to call SNV.


We have identified 6 appropriate work to compare our own pipeline with – Chepelev et al. 2009, Cirulli et al. 2010, Piskol et al. 2013, Quinn et al. 2013, Liu 2019 and Dou et al.


20244,6,7,19,20,21. Chepelev et al. 20094 use ELAND by Illumina as a read mapping tool. It is worth noting that ELAND is a tool for genomic alignments. Similarly, Piskol et al. 201319 use


BWA for read alignment, which is also a tool developed for genomic/exomic alignment. However, currently a lot of tools have been developed specifically for transcriptome alignment, and are


generally considered to work better for that goal, such as Rsubread22 and STAR (that is used in our pipeline)14, and we believe that using such specialized tools is a better approach. Quinn


et al. 2013 and Cirulli et al. 2010 both use Tophat2, which is currently considered deprecated by its developers23. Dou et al. 202421 don’t describe the procedure of obtaining mappings in


their paper. Finally, Liu et al.20 consider both STAR and GSNAP, ultimately ending up using STAR, since based on their data using it leads to higher true positive rate in SNV detection. For


SNV calling, both Chepelev et al. and Dou et al. used the tools that they developed, being Point Mutation Analyzer and Monopogen, respectively. Piskol et al. also use their own filtering


algorithm SNiPR, in combination with GATK24. Quinn et al. use both Samtools and GATK, while Cirulli et al. only use Samtools. Finally, Liu et al.20 compares several different SNV calling


methods, including both Samtools, GATK and several others. Their analysis concludes that Samtools is the recommended method of SNV calling. However, it is worth noting that Liu et al. are


testing both SNV calling and mapping as applied to scRNA-seq data, and their results might be not perfectly applicable to our research. While we aсknowledge that GATK is widely considered to


be the gold standard for SNV-calling, based on Liu et al. comparison and its performance in our own test, we consider that combination of STAR and Samtools is a good choice for SNV calling


from RNA-seq data. In order to check the results obtained using our pipeline, we reviewed and validated a total of 8 SNV using two different methods: the rs397516037 mutation was tested


using TaqMan real-time PCR assay with allelic discrimination in 43 patients. 7 missense and nonsense variants of the characteristic HCMP genes _MYBPC3_ and _MYH7_ verified and confirmed in 8


patients using Sanger sequencing. We have not identified any discrepancies between results of real-time PCR-based allelic determination/Sanger sequencing and RNA-seq data. Therefore, in the


limited scope of experimental verification we have conducted, both specificity and sensitivity amount to 1. However, it is worth noting that, due to the small number of verified mutations,


this result probably requires further verification. However, it is worth noting that verification of individual mutations with Sanger sequencing or allelic determination is more reliable


than verification of them based on genomic sequencing. Most HCMP-associated genes carry both familial and de novo pathogenic variants9. Most of these variants are missense mutations and have


dominant hereditary properties9,25. Some pathogenic variants are also characterized by incomplete penetrance, which could depend on environmental or/and other genetic factors. A multitude


of rare pathogenic variants with average or low penetrance are found in patients with sporadic HCMP and small families with familial HCMP9,10. It has been established that most HCMP-related


genes code sarcomere or sarcomere-associated proteins. From 70 to 80% of familial cases of HCMP carry mutations in the heavy myosin chain gene _MYH7_ and myosin binding protein C gene


_MYBPC3_9,26. Patients with pathogenic variants in _MYH7_ have an increased risk of developing atrial fibrillation, earlier onset of disease, and more severe form of disease overall than


patients with pathogenic variants in _MYBPC3_. Also, it has been established that HCMP patients with mutations in _TNNI3_ had shorter life expectancy compared to carriers of mutations in


either _MYBPC3_ or _MYH7_, while HCMP patients with mutation in _TNNC1_ had an increased risk of developing a fatal atrial arrhythmia27. Based on the prevalence of _MYBPC3_ or _MYH7_


mutations in cases of HCMP, we have decided to use mutations in those two genes for further verification. In order to identify SNV, potentially associated with HCMP pathogenesis, we have


employed several filters, such as MAF, outcome of mutation, amino acid substitution prediction score (in case of missenses). These filtered lists were then used to evaluate putative


mutations based on genes they are situated in and their relevance to HCMP. Gene _MYBPC3_, encoding myosin binding protein C is considered to be one of the key sarcomere genes, causatively


linked to HCMP development28. Based on various estimates, 40 to 50% o all HCMP associated mutations are located within this gene29, and most of those lead to production of truncated


transcripts30, which is also the case with nonsense mutations we have identified in _MYBPC3_ in our cohort of HCMP patients. Due to cMyBP-C playing a key role in regulation of myosin-actin


cross-bridge kinetics, nonsense mutations in 11_47332189_G/A (p.Q1233X) and 11_47341180_C/A (p.E619X)) could lead to haploinsufficiency, which could in turn lead to disturbances in


contraction/relaxation cardiomyocytes. It is worth noting that despite not being annotated in dbsnp, 11_47341180_C/A was previously identified in young Russian patents with HCMP31,


suggesting it might be endemic to this region. _NEBL_ encodes mechanosensitive protein of Z-disc Nebulete32, which plays a key role in organization and functioning of myofibrilla, which


could imply that early termination of NEBL translation could play a role in HCMP pathogenesis. _NEBL_ isn’t considered to be a hallmark HCMP gene, however variants in this gene were


previously associated with several types of cardiomyopathies, including HCMP, dilated cardiomyopathy and left ventricular non-compaction cardiomyopathy33,34. It has also been shown that


mutatuion in NEBL could be a causative in pathogenesis of Brugada syndrome35. Based on that, rs147622517 can be either a reason, or, more likely, considering relatively high frequency of


this variant in population (0,15%), a risk factor of HCMP pathogenesis. Nonesense variant rs371110900 which we have identified in our patient cohort is located in _FEM1 A._ Gene _FEM1 A_


encodes a part of CRL2 complex FEM1 A which serves as an adaptor for protein ubuqitination by E3 ligase36. Based on ClinVar data, rs371110900 has no known associations with cardiovascular


diseases, however, it is known that this variant is associated with polycystic ovary syndrome37. It is also worth noting that increased expression of _Fem1a_ was identified in


ischemia-reperfusion in mice37, while RNA-seq of human myocardial tissues has shown repression of _FEM1 A_ expression after ischemia38. Taken together with identification of rs371110900 in


HCMP patient, this data suggests that role of _FEM1 A_ in normal and pathological heart physiology should be investigated. Missense mutations, filtered by strictest Polyphen (1) and SIFT (0)


scores (Table 3) were also investigated for potential connections to HCMP in literature. Among the genes they are located in, _ANXA6_ appears to be the most relevant in this regard. In it,


we have identified rs759582371, situated in exone 8 and leading to R206/174H, with undetermined clinical significance in ClinVar. Very high Polyphen and SIFT scores of this mutation are most


likely explained by the fact that this is a substituition of aliphatic amino acid to aromatic one with weak basic properties. ANXA6 protein is the main myocardial annexin from the family of


calcium and phospholipid binding membrane proteins, which plays an important role in regulation of endocytosis and exocytosis and supporting the homeostasis of calcium in cardiomyocytes.


Using transgenic mice, it has been shown that knockout of ANXA6 leads to dilatational changes in myocardium, whereas it’s increased expression leads to hypertrophic changes39. These effects


could be linked to its interaction with atrial natriuretic peptide pathways, which participate in vitro in regulation of processes of hypertrophy and apoptosis of H9c2 rat myoblasts40,41.


ANXA also participates in cholesterol metabolism through interactions with phospholipase A2 and blocking of EGFR-Ras pathway42, which could in turn affect myocardiocyte growth. It has also


been shown that annexin 6 directly interactis with sarcomeres alpha-actinin, and this interaction affects excitability and contractability of cardiomyocyte43. Overall, _ANXA6_ plays an


important role in cardiac function, which in turn means that mutation in _ANXA6_ could potentially be related to HCMP. However, additional research is required in order to establish what


causative role it plays, if any. It is also worth noticing that the same patient that carries rs759582371 also carries a missense mutation in _MYH7_, one of the hallmarks HCMP genes


(rs2754158). rs2754158 is also considered to have an established pathogenic effect in HCMP44,45,46,47,48. We can conclude that using RNA-seq for SNV identification, we have successfully


identified a number of mutations in HCMP patient cohort, with some even having an established relation to HCMP pathogenesis. Use of transcriptomic sequencing for SNV identification would


allow an additional investigation to be conducted of accumulated RNA-seq data on various diseases. It would allow additional data to be procured from already-existing datasets, which have


already served their purpose in investigation of transcriptional profiles, as well as allowing the effects of SNVs on gene expression within each sample to be observed. Also, the fact that


reads are only assigned to genes expressed in target tissue allows for more efficient use of coverage and, consequently, lesser expenses for sequencing. However, it is also worth considering


the limitations of such an approach. Difficulties with reaching high coverages for transcripts with lower expression levels would probably lead to increased chance of false negative results


for SNVs in such genes. An average coverage of no less than 70X for a full exome and no less than 30X for a full genome are usually considered to be standard49. In the case of RNA-seq,


since the coverage is directly linked to gene expression, reaching such coverages for less-expressed genes appears almost impossible. As such, the aforementioned increased risk of false


negatives seems inevitable. Verifying such false negatives is also a complicated task, since it is much easier and cheaper to confirm that an SNV indeed exists than to investigate genetic


loci that could have SNVs that have been missed. Another possible issue with use of RNA-seq for SNV identification in nonsense-mediated decay (NMD). Transcripts carrying NMD mutations can be


absent or less abundant in transcriptome due to this very decay50, whereas detecting NMD mutations in the case of exome and genome sequencing is not more complicated than any other SNV. In


a single case, among the SNVs chosen for verification, in our studies with SNVs with a predicted NMD effect (11_47341180_C/A in Table 1), we have found no observable effect of NMD on


transcript abundance (Supplementary Fig. 2). However, this outcome is far from conclusive, and it is still worth considering the possibility of having a false negative result due to NMD in


other cases. It is also worth noting that genes _MYH7_ and _MYBPC3_ present the best-case scenario for use of RNA-seq for SNV identification, since both genes are very highly expressed in


the myocardium. Therefore, the results may not be as good for other diseases and target tissues. This method overall appears to be very well suited for the study of diseases with


autosomal-dominant inheritance, linked to mutations that affect the protein structure. Its applicability to the study of diseases with recessive inheritance with other types of causative


mutations should be evaluated separately. CONCLUSIONS In this study, the method of identification of SNVs based on transcriptome sequencing data has been developed, applied, and verified.


Based on our own data and results obtained in previous studies, this method could be used for identification of putative pathogenic variants. We have identified a number of mutations in a


cohort of HCMP patients, including ones in the key HCMP genes - _MYBPC3_ и _MYH7_. We have also identified potentially pathologic mutations in genes with no previously established relation


to _HCMP – ANXA6_ and _FEM1 A._ We have also obtained data that supports the possible role of _NEBL_ in various myocardial diseases. The use of such a method would be especially interesting


in the context of the vast amounts of already-accumulated transcriptomic data that have yet to be examined in such a way, such as the datasets available, for example, in the Gene Expression


Omnibus. However, the limitations and applicability of this method should also be carefully considered. We have examined it on the basis of disease and type of inheritance, which appears to


be very well suited for the method, but, in cases of other diseases and types of inheritance, its applicability should be evaluated on a case-by-case basis. DATA AVAILABILITY Raw FAST files


and processed read count are available from Gene Expression Omnibus (GEO), accession number GSE273325. REFERENCES * Cirulli, E. T. et al. Screening the human Exome: A comparison of whole


genome and whole transcriptome sequencing. _Genome Biol._ 11, 1–8 (2010). Article  Google Scholar  * Taqui, S. & Daniels, L. B. Putting it into perspective: multimarker panels for


cardiovascular disease risk assessment. _Biomark. Med._ 7, 317–327 (2013). Article  CAS  PubMed  Google Scholar  * Bertens, L. C. M. et al. Use of expert panels to define the reference


standard in diagnostic research: A systematic review of published methods and reporting. _PLoS Med._ ;10. (2013). * Chepelev, I., Wei, G., Tang, Q. & Zhao, K. Detection of single


nucleotide variations in expressed exons of the human genome using RNA-Seq. _Nucleic Acids Res._ 37, e106–e106 (2009). Article  PubMed  PubMed Central  Google Scholar  * Edgar, R.,


Domrachev, M. & Lash, A. E. Gene expression omnibus: NCBI gene expression and hybridization array data repository. _Nucleic Acids Res._ 30, 207–210 (2002). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Cirulli, E. T. et al. Screening the human Exome: a comparison of whole genome and whole transcriptome sequencing. _Genome Biol._ 11, R57 (2010). Article  PubMed 


PubMed Central  Google Scholar  * Quinn, E. M. et al. Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 genomes


data. _PLoS One_. 8, e58815 (2013). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  * Filatova, E. V. et al. Targeted exome analysis of Russian patients with hypertrophic


cardiomyopathy. _Mol. Genet. Genomic Med._ ;9. (2021). * Teekakirikul, P., Zhu, W., Huang, H. C. & Fung, E. Hypertrophic cardiomyopathy: an overview of genetics and management.


_Biomolecules_ ;9. (2019). * Marian, A. J. & Braunwald, E. Hypertrophic cardiomyopathy: genetics, pathogenesis, clinical manifestations, diagnosis, and therapy. _Circ. Res._ 121, 749


(2017). Article  CAS  PubMed  PubMed Central  Google Scholar  * Elliott, P. M. et al. 2014 ESC guidelines on diagnosis and management of hypertrophic cardiomyopathy: the task force for the


diagnosis and management of hypertrophic cardiomyopathy of the European society of cardiology (ESC). _Eur. Heart J._ 35, 2733–2779 (2014). Article  PubMed  Google Scholar  * Schubert, M.,


Lindgreen, S. & Orlando, L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. _BMC Res. Notes_ ;9. (2016). * Li, B. & Dewey, C. N. RSEM: accurate


transcript quantification from RNA-Seq data with or without a reference genome. _BMC Bioinform._ 12, 1–16 (2011). Article  Google Scholar  * Dobin, A. et al. STAR: ultrafast universal


RNA-seq aligner. _Bioinformatics_ 29, 15–21 (2013). Article  CAS  PubMed  Google Scholar  * Danecek, P. et al. Twelve years of samtools and BCFtools. _Gigascience_ 10, 1–4 (2021). Article 


CAS  Google Scholar  * Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter Estimation from sequencing data.


_Bioinformatics_ 27, 2987 (2011). Article  CAS  PubMed  PubMed Central  Google Scholar  * Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data.


_Gigascience_ 10, 1–6 (2021). Article  Google Scholar  * Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. _Nature_ 461, 272–276 (2009). Article  ADS 


CAS  PubMed  PubMed Central  Google Scholar  * Piskol, R., Ramaswami, G. & Li, J. B. Reliable identification of genomic variants from RNA-seq data. _Am. J. Hum. Genet._ 93, 641–651


(2013). Article  CAS  PubMed  PubMed Central  Google Scholar  * Liu, F. et al. Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing


data. _Genome Biol._ 20, 242 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Dou, J. et al. Single-nucleotide variant calling in single-cell sequencing data with monopogen.


_Nat. Biotechnol._ 42, 803–812 (2024). Article  CAS  PubMed  Google Scholar  * Chen, Y., Lun, A. T. L. & Smyth, G. K. From reads to genes to pathways: differential expression analysis of


RNA-Seq experiments using Rsubread and the edger quasi-likelihood pipeline. _F1000Res_ 5, 1438 (2016). PubMed  PubMed Central  Google Scholar  * Kim, D., Paggi, J. M., Park, C., Bennett, C.


& Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. _Nat. Biotechnol._ 37, 907–915 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar


  * McKenna, A. et al. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. _Genome Res._ 20, 1297–1303 (2010). Article  CAS  PubMed  PubMed


Central  Google Scholar  * Maron, B. J., Maron, M. S. & Semsarian, C. Genetics of hypertrophic cardiomyopathy after 20 years: clinical perspectives. _J. Am. Coll. Cardiol._ 60, 705–715


(2012). Article  PubMed  Google Scholar  * Puckelwartz, M. J. & McNally, E. M. HCM gene testing: go big?? _Circ. Cardiovasc. Genet._ ;10. (2017). * Abbas, M. T. et al. Role of genetics


in diagnosis and management of hypertrophic cardiomyopathy: A glimpse into the future. _Biomedicines_ ;12. (2024). * Bonaventura, J., Polakova, E., Vejtasova, V. & Veselka, J. Genetic


testing in patients with hypertrophic cardiomyopathy. _Int. J. Mol. Sci._ ;22. (2021). * Lopes, L. R., Ho, C. Y. & Elliott, P. M. Genetics of hypertrophic cardiomyopathy: established and


emerging implications for clinical practice. _Eur. Heart J._ 45, 2727–2734 (2024). Article  CAS  PubMed  PubMed Central  Google Scholar  * Glazier, A. A., Thompson, A. & Day, S. M.


Allelic imbalance and haploinsufficiency in MYBPC3-linked hypertrophic cardiomyopathy. _Pflugers Arch._ 471, 781–793 (2019). Article  CAS  PubMed  Google Scholar  * Savostyanov, K. V. et al.


_Annals Russian Acad. Med. Sci._ ;72:242–253. (2017). Google Scholar  * Maiellaro-Rafferty, K. et al. Altered regional cardiac wall mechanics are associated with differential cardiomyocyte


calcium handling due to Nebulette mutations in preclinical inherited dilated cardiomyopathy. _J. Mol. Cell. Cardiol._ 60, 151–160 (2013). Article  CAS  PubMed  PubMed Central  Google Scholar


  * Hiruma, T. et al. Association of multiple nonhypertrophic cardiomyopathy-Related genetic variants and outcomes in patients with hypertrophic cardiomyopathy. _JACC Heart Fail._


https://doi.org/10.1016/j.jchf.2024.08.005 (2024). Article  PubMed  Google Scholar  * Perrot, A. et al. Mutations in NEBL encoding the cardiac Z-disk protein Nebulette are associated with


various cardiomyopathies. _Arch. Med. Sci._ 12, 263–278 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Chen, J. et al. Whole exome sequencing in Brugada and long QT


syndromes revealed novel rare and potential pathogenic mutations related to the dysfunction of the cardiac sodium channel. _Orphanet J. Rare Dis._ 17, 394 (2022). Article  PubMed  PubMed


Central  Google Scholar  * Chen, X. et al. Molecular basis for arginine C-terminal Degron recognition by Cul2FEM1 E3 ligase. _Nat. Chem. Biol._ 17, 254–262 (2021). Article  ADS  CAS  PubMed


  Google Scholar  * Cambier, L., Lacampagne, A., Auffray, C. & Pomiès, P. Fem1a is a mitochondrial protein up-regulated upon ischemia-reperfusion injury. _FEBS Lett._ 583, 1625–1630


(2009). Article  CAS  PubMed  Google Scholar  * Saddic, L. A. et al. The long noncoding RNA landscape of the ischemic human left ventricle. _Circ. Cardiovasc. Genet._ ;10. (2017). * Kaetzel,


M. A. & Dedman, J. R. Annexin VI regulation of cardiac function. _Biochem. Biophys. Res. Commun._ 322, 1171–1177 (2004). Article  CAS  PubMed  Google Scholar  * Banerjee, P. &


Bandyopadhyay, A. Cytosolic dynamics of Annexin A6 trigger feedback regulation of hypertrophy via atrial natriuretic peptide in cardiomyocytes. _J. Biol. Chem._ 289, 5371–5385 (2014).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Banerjee, P., Chander, V. & Bandyopadhyay, A. Balancing functions of Annexin A6 maintain equilibrium between hypertrophy and


apoptosis in cardiomyocytes. _Cell. Death Dis._ 6, e1873 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Grewal, T., Koese, M., Rentero, C. & Enrich, C. Annexin


A6-regulator of the EGFR/Ras signalling pathway and cholesterol homeostasis. _Int. J. Biochem. Cell. Biol._ 42, 580–584 (2010). Article  CAS  PubMed  Google Scholar  * Mishra, S. et al.


Interaction of Annexin A6 with alpha actinin in cardiomyocytes. _BMC Cell. Biol._ 12, 7 (2011). Article  CAS  PubMed  PubMed Central  Google Scholar  * Chiou, K-R., Chu, C-T. & Charng,


M-J. Detection of mutations in symptomatic patients with hypertrophic cardiomyopathy in Taiwan. _J. Cardiol._ 65, 250–256 (2015). Article  PubMed  Google Scholar  * Uchiyama, K. et al.


Impact of QT variables on clinical outcome of genotyped hypertrophic cardiomyopathy. _Ann. Noninvasive Electrocardiol._ 14, 65–71 (2009). Article  PubMed  PubMed Central  Google Scholar  *


Funada, A. et al. Impact of renin-angiotensin system polymorphisms on development of systolic dysfunction in hypertrophic cardiomyopathy. Evidence from a study of genotyped patients. _Circ.


J._ 74, 2674–2680 (2010). Article  PubMed  Google Scholar  * Marsiglia, J. D. C. et al. Screening of MYH7, MYBPC3, and TNNT2 genes in Brazilian patients with hypertrophic cardiomyopathy.


_Am. Heart J._ 166, 775–782 (2013). Article  CAS  PubMed  Google Scholar  * Berge, K. E. & Leren, T. P. Genetics of hypertrophic cardiomyopathy in Norway. _Clin. Genet._ 86, 355–360


(2014). Article  CAS  PubMed  Google Scholar  * Ryzhkova, O. P. et al. Guidelines for the interpretation of massive parallel sequencing variants (update 2018, v2). _Nauchno-prakticheskii


Zhurnal «Medicinskaia Genetika»_. 2, 3–23 (2020). Google Scholar  * Colombo, M., Karousis, E. D., Bourquin, J., Bruggmann, R. & Mühlemann, O. Transcriptome-wide identification of


NMD-targeted human mRNAs reveals extensive redundancy between SMG6- and SMG7-mediated degradation pathways. _RNA_ 23, 189–201 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar 


Download references ACKNOWLEDGEMENTS Not applicable. FUNDING The work was carried out within the state assignment of the National Research Centre “Kurchatov Institute” (agreement No.


5.F.5.9). The work of Ivan Vlasov, Anna Klass, Anastasia Chumakova, Petr Slominsky, Andrey Lysenko, and Gennady Salagaev was partially supported by the Russian Science Foundation (RSF)


(project No. 22-15-00242, https://rscf.ru/project/22-15-00243/). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * National Research Centre “Kurchatov Institute”, Kurchatov sq. 2, Moscow,


123182, Russia Anastasia Chumakova, Ivan Vlasov, Elena Filatova, Anna Klass, Maria Shadrina & Petr Slominsky * Petrovsky National Research Center of Surgery, Abrikosovsky Ln 2, Moscow,


119991, Russia Andrey Lysenko & Gennady Salagaev Authors * Anastasia Chumakova View author publications You can also search for this author inPubMed Google Scholar * Ivan Vlasov View


author publications You can also search for this author inPubMed Google Scholar * Elena Filatova View author publications You can also search for this author inPubMed Google Scholar * Anna


Klass View author publications You can also search for this author inPubMed Google Scholar * Andrey Lysenko View author publications You can also search for this author inPubMed Google


Scholar * Gennady Salagaev View author publications You can also search for this author inPubMed Google Scholar * Maria Shadrina View author publications You can also search for this author


inPubMed Google Scholar * Petr Slominsky View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS Conceptualization, P.A.S., M.I.S., and I.N.V.;


methodology, I.N.V., A.L.K., E.V.F., A.B.C., A.V.L., and G.I.S.; software, I.N.V. and A.B.C.; validation, A.L.K., I.N.V., and E.V.F.; formal analysis, I.N.V., A.B.C.; investigation, P.A.S.,


M.I.S., I.N.V., A.B.C., A.L.K., E.V.F., A.V.L., G.I.S.; resources, A.V.L., G.I.S, M.I.S., and P.A.S.; data curation, I.N.V., P.A.S., M.I.S., A.V.L. and G.I.S.; writing — original draft,


A.B.C.; writing — review & editing, I.N.V., P.A.S., and M.I.S.; visualization, A.B.C., I.N.V., A.L.K.; supervision, P.A.S., and I.N.V.; project administration, I.N.V., M.I.S., and


P.A.S.; obtaining funding, P.A.S. All authors have read and agreed to the published version of the manuscript. CORRESPONDING AUTHOR Correspondence to Anastasia Chumakova. ETHICS DECLARATIONS


ETHICS APPROVAL AND CONSENT TO PARTICIPATE The study was conducted in accordance with the World Medical Assembly Declaration of Helsinki. The study was approved by the Ethics Committees of


Institute of Molecular Genetics of National Research Centre “Kurchatov Institute” (Protocol №22/5, 16.12.2022). Written informed consent was obtained from all participating patients and


families. CONSENT FOR PUBLICATION Not applicable. COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with


regard to jurisdictional claims in published maps and institutional affiliations. ELECTRONIC SUPPLEMENTARY MATERIAL Below is the link to the electronic supplementary material. SUPPLEMENTARY


MATERIAL 1 SUPPLEMENTARY MATERIAL 2 SUPPLEMENTARY MATERIAL 3 RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License,


which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link


to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless


indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or


exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints


and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Chumakova, A., Vlasov, I., Filatova, E. _et al._ Application of RNA-seq for single nucleotide variation identification in a cohort of


patients with hypertrophic cardiomyopathy. _Sci Rep_ 15, 18788 (2025). https://doi.org/10.1038/s41598-025-03226-x Download citation * Received: 13 August 2024 * Accepted: 19 May 2025 *


Published: 29 May 2025 * DOI: https://doi.org/10.1038/s41598-025-03226-x SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link


Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative


Trending News

William kuan, | va houston health care | veterans affairs

William Kuan is a clinical pharmacist practitioner at the Michael E. DeBakey VA Medical Center. He completed his PharmD ...

Netflix hit series department q has viewers complaining of 'glaring' error

Some people have been left less than impressed after watching Netflix's new detective series Department Q. Filmed i...

Warner bros. Won't release $90 million batgirl movie that was already filmed

Holy cancellation, Batgirl! Warner Bros. has shockingly scrapped plans to release a highly anticipated new _Batgirl_ mov...

Kate and prince william suffer with major 'gap' as camilla 'not keen'

The Duchess of Cambridge, 39, is not interested in shooting or horses like the rest of the Royal Family. Royal expert To...

Recovery, relapse, and episodes of default in the management of acute malnutrition in children in humanitarian emergencies: a systematic review

RECOVERY, RELAPSE, AND EPISODES OF DEFAULT IN THE MANAGEMENT OF ACUTE MALNUTRITION IN CHILDREN IN HUMANITARIAN EMERGENCI...

Latests News

Application of rna-seq for single nucleotide variation identification in a cohort of patients with hypertrophic cardiomyopathy

ABSTRACT A variety of techniques for DNA sequencing, such as specific gene sequencing, whole genome sequencing, or exome...

Inside huge home being auctioned off as part of council sell-off

A large home in a plush part of Birmingham is being auctioned off as part of the continued city council asset sell-off t...

A model way with waves

Researchers have begun a project to create the first global tool to forecast how changes in wave patterns and rising sea...

Jacob rees-mogg defends queen and says 'sensible' people support her

The leader of the House of Commons has spoken out on the interview, even though Boris Johnson remains tight-lipped on th...

Ali fazal calls out tadka producers for non-payment of dues after they release song featuring taapsee pannu

Ali Fazal calls out Tadka producers for non-payment of dues&nbsp KEY HIGHLIGHTS * Khainch Le Qashh, a song featuring...

Top