Massive coronavirus sequencing efforts urgently need patient data
Massive coronavirus sequencing efforts urgently need patient data"
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
Researchers mapping the genetic blueprint of the novel coronavirus SARS-CoV-2 have by now shared more than 12,000 genome sequences from across the world on the open platform Global
Initiative on Sharing All Influenza Data (GISAID). The repository has seen unprecedented activity since December when the first sequence from Wuhan in China came in. On NCBI’s GenBank, more
than 20,000 nucleotide and protein sequences of the virus have already been submitted. The virus is all set to become the most sequenced ever in history. Researchers, however, warn that
unless the sequences are accompanied by de-identified data from patients, the billions of dollars being spent in sequencing the virus globally will not be of much clinical or epidemiological
value, a crucial need during a rapidly evolving pandemic. Laboratories, clinicians, epidemiologists and governments wanting to quickly use this gold mine of information are meeting a
stumbling block as the look for more granular data that should ideally supplement the primary sequence data. “We badly need de-identified meta-data from the patients from whom these
sequences came so that it makes sense for any kind of analysis,” says Seshadri Vasan, who leads the Dangerous Pathogens team at the Australian Animal Health Laboratory and is senior
principal research consultant for Health and Biosecurity at the Commonwealth Scientific and Industrial Research Organization (CSIRO), Australia's national science agency. De-identified
data does not reveal the identity of the patient. Vasan says the minimum set of de-identified data that researchers need is the patient’s age, gender, if they had mild, moderate or severe
disease and if they survived. Questions around lifestyle and comorbidities, such as do they smoke, have pre-existing respiratory illness or diabetes, are also important to add meaning to
this data. “We usually get information on country and city, but it may be beneficial to have postcode and ethnicity data too,” he says. India has announced an ambitious 1000-genome
sequencing project to better understand the viral and host genomics of the COVID-19 outbreak. India’s Council for Scientific and Industrial Research (CSIR), which undertook a mega 1008-human
genome sequencing project last year, has been leading the sequencing efforts in India. Scientists at the Centre for Cellular and Molecular Biology (CCMB), Hyderabad; Institute of Genomics
and Integrated Biology (IGIB), Delhi; Institute of Microbial Technology, Chandgarh; the National Institute of Virology, Pune, and Gujarat Biotechnology Research Centre, Gandhinagar are
sequencing the viral genome. Besides, the Central Drug Research Institute (CDRI), Lucknow and IICB, Kolkata are also gearing up to sequence the viral genome. With the 1000-genome project,
about 10 more facilities across the country will be pulled in to sequence the virus. Virologist Mitali Mukerji, a genomic scientist at IGIB who is coordinating CSIR’s sequencing efforts says
at the moment scientists are only trying to analyse the strain of the virus and where the sequences came from. “Clinical history is not getting submitted from any place. It’s very important
since this is not the end of the outbreak we are seeing,” she says. Epidemiologists need to identity people who might be more at risk and analysing clinical information will be crucial, she
says. IGIB director and clinician scientist Anurag Agrawal, who is overseeing a molecular and digital surveillance project around the genome sequences from India, says it would be extremely
useful to know the viral loads and numbers of symptomatic versus asymptomatic cases. “Nothing is meaningful for molecular epidemiology or our knowledge of clusters unless these clinical
parameters are well defined in the data,” he says. The biggest barrier, he says, is coordination among researchers sequencing the data and agencies uploading it on to the databases. “We work
with the National Centre for Disease Control (NCDC), who have the underlying patient information and since they upload the sequences, they do add much more value to the data.” Upasana Ray
Banerjee, a virologist at the CSIR-Indian Institute of Chemical Biology (IICB) whose team recently analysed the genome sequence from a COVID-19 patient from Gujarat, agrees. “This remains a
concern for most of us – to correlate this data with our analysis,” she told _Nature India_ . “It is extremely important for us when we want to assign clinical significance to our sequencing
efforts,” she says. The reason this additional data is needed is that the same viral strain could be fatal for one person, and result in mild, moderate or severe symptoms in others. “And
some strains could also be more or less virulent than others,” Vasan adds. Vasan, who holds an honorary chair in Health Sciences at the University of York in the UK, says the World Health
Organisation should lead this effort to standardise the meta-dataset that can be followed globally, with consistent definitions to categorise severity and outcomes of COVID-19. “No country
can solve this problem in isolation. It is important for the WHO to specify the minimal meta-dataset not just for SARS-CoV-2 but also a future ‘Disease X’,” he told _Nature India_ . In the
absence of patient meta-data “we don’t know how the disease is progressing, how long the virus shedding occurs in different settings and what kind of immunity levels exist in individuals or
populations," says epidemiologist Giridhara R Babu from the Public Health Foundation of India (PHFI). “As we move forward, we have to be very careful in improving the quality of the
meta-data and, more importantly, have it unbiasedly assessed by people who don’t run the clinical trials,” Babu told _Nature India_ . That way measurement errors and selection biases can be
removed from the data to make it more useful. Information on severity of symptoms and disease progression dynamics would be immensely helpful when combined with the genomic sequences. “For
instance, one could actually know if there is a sub-group of asymptomatic people who never go on to develop the disease. They would be way more useful to design a disease modifying mechanism
or immunomodulation, instead of the quest for a vaccine as the endgame.” Disregarding all these data elements eliminates the possibility of other non-pharmacological interventions to
disrupt the transmission of the virus, Babu says. EVOLUTION, MUTATIONS AND CLADES The global effort to peer into the genetic make-up of the pandemic-causing virus since the start of the
COVID-19 outbreak has provided real-time understanding of the organism. Databases such as the GenBank and GISAID provide ammunition to researchers trying to understand the evolution and
mutations of viruses. They are also solid tools for research and development of drugs and vaccines against the virus. The data so far reveals some minor mutations in the virus which may have
no functional consequence, Vasan says. “For instance, when we looked at 388 sequences from Australia, only 162 had protein-changing mutations,” he says. However, his team was unable to
determine clinical or epidemiological impacts of these minor mutations without the underlying meta-data. Only 14 out of these 388 sequences had clinical annotations, the rest were either
annotated as unknown or not at all. CSIRO has developed1 a novel visualization platform – similar to the one used to analyse the human genome – to pinpoint differences among the thousands of
individual genetic sequences of COVID-19 now globally available. The data visualisation platform highlights evolving genetic mutations of the virus as it continues to change and adapt to
new environments. "Analysing global data on the published genome sequences of this novel corona virus will help fast track our understanding of this complex disease, how changes in the
virus could affect its behaviour and impact," Vasan says. "Assessing the evolutionary distance between these data points helps researchers find out about the different strains of
the virus – including where they came from and how they continue to evolve,” he says. Vasan, whose team has analysed the first 181 published genome sequences from the current COVID-19
outbreak says the RNA virus can "evolve into a number of distinct clusters that share mutations." The analysis has already helped determine which strains of the virus are suitable
for testing vaccines underway at the Australian Centre for Disease Preparedness in Geelong. RNA viruses, Vasan adds, generally evolve into clusters and show ‘quasispecies diversity’, meaning
not just a single genotype but an ensemble of related sequences. Quasispecies arise from rapid genomic evolution powered by the high mutation rate of RNA viral replication. The novel
coronavirus, an RNA virus, emerged from China and restrictions on air travel and movements of people did not come into place for a while after the outbreak in Wuhan. “Therefore, the clusters
do not correspond to countries. For instance, the first 181 published genomic sequences could be grouped into three clusters (with three more emerging), and Australian isolates can be found
in each of them,” he says. For this reason it is unhelpful to call the virus ‘an Indian strain’ or ‘Australian strain’ or ‘Chinese strain’ or make claims that one regional strain is more
virulent than the other. “Over time, we may likely find clusters with varied virulence in all countries. The real question is whether we can link the accumulated mutations in the genome to
clinical meta-data and find clinically/epidemiologically meaningful correlations,” he says. A GISAID statement says the circulating virus strains globally can be classified into different
number of clades based on genetic variation. ”These are part of the natural evolution of the virus currently not known to be associated with any differences in virulence,” it says. Data from
the early outbreak period is not enough for a detailed interpretation of the early history of global transmissions from a few genomes, according to GISAID. Ray Banerjee, whose team reported
in a preprint paper2 two novel mutations in the spike protein of the SARS-CoV-2 isolate from Gujarat as compared with the Wuhan virus isolates, says these mutations have a somewhat
different origin. “One of the mutations is exclusive in the virus obtained from Gujarat whereas the other was also seen in North American and European isolates.” Almost 95 per cent of the
strains reported in global databases till now are from Wuhan in China where the outbreak began. “The rest five per cent are from the rest of the world. So some descriptions of virulence
being low or high in a particular region are wishful thinking at best," Giridhara Babu says. _[Nature India's latest coverage on the novel coronavirus and COVID-19 pandemic
__here__. More updates on the global crisis __here__.] _ REFERENCES 1. Bauer, D. C. et al. Pandemic response using genomics & bioinformatics, a case study on the emergent SARS-CoV-2
outbreak. Transbound. Emerg. Dis. (2020) doi: 10.1111/tbed.13588 2. Banerjee, A. K. et al. Novel mutations in the S1 domain of COVID 19 spike protein of isolate from Gujarat origin, Western
India. Preprints (2020) doi: 10.20944/preprints202004.0450.v1
Trending News
This city in puebla celebrates christmas with millions of lights and other attractionsA Christmas village in Puebla may be the closest thing to a northern Christmas that exists in Mexico. From late November...
Woman left 'in shock' by diagnosis after finding lumpsA young woman was horrified to discover the alarming reason behind the new lumps appearing on her body daily for a week....
Something went wrong, sorry. :(NEWS EXPLAINER 23 March 2020 Coronavirus tests: researchers chase new diagnostics to fight the pandemic Nature examines ...
Something went wrong, sorry. :(Роскомнадзор направил в адрес Telegram уведомление о необходимости заблокировать ботов, позволяющих пользователю мессенд...
Making biofuel from algae - Los Angeles TimesBusiness Making biofuel from algae Chemist Matt Moranville watches another scientist’s experiment on algae at General At...
Latests News
Massive coronavirus sequencing efforts urgently need patient dataResearchers mapping the genetic blueprint of the novel coronavirus SARS-CoV-2 have by now shared more than 12,000 genome...
Something went wrong, sorry. :(NEWS EXPLAINER 23 March 2020 Coronavirus tests: researchers chase new diagnostics to fight the pandemic Nature examines ...
Something went wrong, sorry. :(Роскомнадзор направил в адрес Telegram уведомление о необходимости заблокировать ботов, позволяющих пользователю мессенд...
Making biofuel from algae - Los Angeles TimesBusiness Making biofuel from algae Chemist Matt Moranville watches another scientist’s experiment on algae at General At...
Fitting elephants in modern machine learning by statistically consistent interpolationABSTRACT Textbook wisdom advocates for smooth function fits and implies that interpolation of noisy data should lead to ...