Massive coronavirus sequencing efforts urgently need patient data

Nature

Massive coronavirus sequencing efforts urgently need patient data"


Play all audios:

Loading...

Researchers mapping the genetic blueprint of the novel coronavirus SARS-CoV-2 have by now shared more than 12,000 genome sequences from across the world on the open platform Global


Initiative on Sharing All Influenza Data (GISAID). The repository has seen unprecedented activity since December when the first sequence from Wuhan in China came in. On NCBI’s GenBank, more


than 20,000 nucleotide and protein sequences of the virus have already been submitted. The virus is all set to become the most sequenced ever in history. Researchers, however, warn that


unless the sequences are accompanied by de-identified data from patients, the billions of dollars being spent in sequencing the virus globally will not be of much clinical or epidemiological


value, a crucial need during a rapidly evolving pandemic. Laboratories, clinicians, epidemiologists and governments wanting to quickly use this gold mine of information are meeting a


stumbling block as the look for more granular data that should ideally supplement the primary sequence data. “We badly need de-identified meta-data from the patients from whom these


sequences came so that it makes sense for any kind of analysis,” says Seshadri Vasan, who leads the Dangerous Pathogens team at the Australian Animal Health Laboratory and is senior


principal research consultant for Health and Biosecurity at the Commonwealth Scientific and Industrial Research Organization (CSIRO), Australia's national science agency. De-identified


data does not reveal the identity of the patient. Vasan says the minimum set of de-identified data that researchers need is the patient’s age, gender, if they had mild, moderate or severe


disease and if they survived. Questions around lifestyle and comorbidities, such as do they smoke, have pre-existing respiratory illness or diabetes, are also important to add meaning to


this data. “We usually get information on country and city, but it may be beneficial to have postcode and ethnicity data too,” he says. India has announced an ambitious 1000-genome


sequencing project to better understand the viral and host genomics of the COVID-19 outbreak. India’s Council for Scientific and Industrial Research (CSIR), which undertook a mega 1008-human


genome sequencing project last year, has been leading the sequencing efforts in India. Scientists at the Centre for Cellular and Molecular Biology (CCMB), Hyderabad; Institute of Genomics


and Integrated Biology (IGIB), Delhi; Institute of Microbial Technology, Chandgarh; the National Institute of Virology, Pune, and Gujarat Biotechnology Research Centre, Gandhinagar are


sequencing the viral genome. Besides, the Central Drug Research Institute (CDRI), Lucknow and IICB, Kolkata are also gearing up to sequence the viral genome. With the 1000-genome project,


about 10 more facilities across the country will be pulled in to sequence the virus. Virologist Mitali Mukerji, a genomic scientist at IGIB who is coordinating CSIR’s sequencing efforts says


at the moment scientists are only trying to analyse the strain of the virus and where the sequences came from. “Clinical history is not getting submitted from any place. It’s very important


since this is not the end of the outbreak we are seeing,” she says. Epidemiologists need to identity people who might be more at risk and analysing clinical information will be crucial, she


says. IGIB director and clinician scientist Anurag Agrawal, who is overseeing a molecular and digital surveillance project around the genome sequences from India, says it would be extremely


useful to know the viral loads and numbers of symptomatic versus asymptomatic cases. “Nothing is meaningful for molecular epidemiology or our knowledge of clusters unless these clinical


parameters are well defined in the data,” he says. The biggest barrier, he says, is coordination among researchers sequencing the data and agencies uploading it on to the databases. “We work


with the National Centre for Disease Control (NCDC), who have the underlying patient information and since they upload the sequences, they do add much more value to the data.” Upasana Ray


Banerjee, a virologist at the CSIR-Indian Institute of Chemical Biology (IICB) whose team recently analysed the genome sequence from a COVID-19 patient from Gujarat, agrees. “This remains a


concern for most of us – to correlate this data with our analysis,” she told _Nature India_ . “It is extremely important for us when we want to assign clinical significance to our sequencing


efforts,” she says. The reason this additional data is needed is that the same viral strain could be fatal for one person, and result in mild, moderate or severe symptoms in others. “And


some strains could also be more or less virulent than others,” Vasan adds. Vasan, who holds an honorary chair in Health Sciences at the University of York in the UK, says the World Health


Organisation should lead this effort to standardise the meta-dataset that can be followed globally, with consistent definitions to categorise severity and outcomes of COVID-19. “No country


can solve this problem in isolation. It is important for the WHO to specify the minimal meta-dataset not just for SARS-CoV-2 but also a future ‘Disease X’,” he told _Nature India_ . In the


absence of patient meta-data “we don’t know how the disease is progressing, how long the virus shedding occurs in different settings and what kind of immunity levels exist in individuals or


populations," says epidemiologist Giridhara R Babu from the Public Health Foundation of India (PHFI). “As we move forward, we have to be very careful in improving the quality of the


meta-data and, more importantly, have it unbiasedly assessed by people who don’t run the clinical trials,” Babu told _Nature India_ . That way measurement errors and selection biases can be


removed from the data to make it more useful. Information on severity of symptoms and disease progression dynamics would be immensely helpful when combined with the genomic sequences. “For


instance, one could actually know if there is a sub-group of asymptomatic people who never go on to develop the disease. They would be way more useful to design a disease modifying mechanism


or immunomodulation, instead of the quest for a vaccine as the endgame.” Disregarding all these data elements eliminates the possibility of other non-pharmacological interventions to


disrupt the transmission of the virus, Babu says. EVOLUTION, MUTATIONS AND CLADES The global effort to peer into the genetic make-up of the pandemic-causing virus since the start of the


COVID-19 outbreak has provided real-time understanding of the organism. Databases such as the GenBank and GISAID provide ammunition to researchers trying to understand the evolution and


mutations of viruses. They are also solid tools for research and development of drugs and vaccines against the virus. The data so far reveals some minor mutations in the virus which may have


no functional consequence, Vasan says. “For instance, when we looked at 388 sequences from Australia, only 162 had protein-changing mutations,” he says. However, his team was unable to


determine clinical or epidemiological impacts of these minor mutations without the underlying meta-data. Only 14 out of these 388 sequences had clinical annotations, the rest were either


annotated as unknown or not at all. CSIRO has developed1 a novel visualization platform – similar to the one used to analyse the human genome – to pinpoint differences among the thousands of


individual genetic sequences of COVID-19 now globally available. The data visualisation platform highlights evolving genetic mutations of the virus as it continues to change and adapt to


new environments. "Analysing global data on the published genome sequences of this novel corona virus will help fast track our understanding of this complex disease, how changes in the


virus could affect its behaviour and impact," Vasan says. "Assessing the evolutionary distance between these data points helps researchers find out about the different strains of


the virus – including where they came from and how they continue to evolve,” he says. Vasan, whose team has analysed the first 181 published genome sequences from the current COVID-19


outbreak says the RNA virus can "evolve into a number of distinct clusters that share mutations." The analysis has already helped determine which strains of the virus are suitable


for testing vaccines underway at the Australian Centre for Disease Preparedness in Geelong. RNA viruses, Vasan adds, generally evolve into clusters and show ‘quasispecies diversity’, meaning


not just a single genotype but an ensemble of related sequences. Quasispecies arise from rapid genomic evolution powered by the high mutation rate of RNA viral replication. The novel


coronavirus, an RNA virus, emerged from China and restrictions on air travel and movements of people did not come into place for a while after the outbreak in Wuhan. “Therefore, the clusters


do not correspond to countries. For instance, the first 181 published genomic sequences could be grouped into three clusters (with three more emerging), and Australian isolates can be found


in each of them,” he says. For this reason it is unhelpful to call the virus ‘an Indian strain’ or ‘Australian strain’ or ‘Chinese strain’ or make claims that one regional strain is more


virulent than the other. “Over time, we may likely find clusters with varied virulence in all countries. The real question is whether we can link the accumulated mutations in the genome to


clinical meta-data and find clinically/epidemiologically meaningful correlations,” he says. A GISAID statement says the circulating virus strains globally can be classified into different


number of clades based on genetic variation. ”These are part of the natural evolution of the virus currently not known to be associated with any differences in virulence,” it says. Data from


the early outbreak period is not enough for a detailed interpretation of the early history of global transmissions from a few genomes, according to GISAID. Ray Banerjee, whose team reported


in a preprint paper2 two novel mutations in the spike protein of the SARS-CoV-2 isolate from Gujarat as compared with the Wuhan virus isolates, says these mutations have a somewhat


different origin. “One of the mutations is exclusive in the virus obtained from Gujarat whereas the other was also seen in North American and European isolates.” Almost 95 per cent of the


strains reported in global databases till now are from Wuhan in China where the outbreak began. “The rest five per cent are from the rest of the world. So some descriptions of virulence


being low or high in a particular region are wishful thinking at best," Giridhara Babu says. _[Nature India's latest coverage on the novel coronavirus and COVID-19 pandemic


__here__. More updates on the global crisis __here__.] _ REFERENCES 1. Bauer, D. C. et al. Pandemic response using genomics & bioinformatics, a case study on the emergent SARS-CoV-2


outbreak. Transbound. Emerg. Dis. (2020) doi: 10.1111/tbed.13588 2. Banerjee, A. K. et al. Novel mutations in the S1 domain of COVID 19 spike protein of isolate from Gujarat origin, Western


India. Preprints (2020) doi: 10.20944/preprints202004.0450.v1


Trending News

This city in puebla celebrates christmas with millions of lights and other attractions

A Christmas village in Puebla may be the closest thing to a northern Christmas that exists in Mexico. From late November...

Woman left 'in shock' by diagnosis after finding lumps

A young woman was horrified to discover the alarming reason behind the new lumps appearing on her body daily for a week....

Something went wrong, sorry. :(

NEWS EXPLAINER 23 March 2020 Coronavirus tests: researchers chase new diagnostics to fight the pandemic Nature examines ...

Something went wrong, sorry. :(

Роскомнадзор направил в адрес Telegram уведомление о необходимости заблокировать ботов, позволяющих пользователю мессенд...

Making biofuel from algae - Los Angeles Times

Business Making biofuel from algae Chemist Matt Moranville watches another scientist’s experiment on algae at General At...

Latests News

Massive coronavirus sequencing efforts urgently need patient data

Researchers mapping the genetic blueprint of the novel coronavirus SARS-CoV-2 have by now shared more than 12,000 genome...

Something went wrong, sorry. :(

NEWS EXPLAINER 23 March 2020 Coronavirus tests: researchers chase new diagnostics to fight the pandemic Nature examines ...

Something went wrong, sorry. :(

Роскомнадзор направил в адрес Telegram уведомление о необходимости заблокировать ботов, позволяющих пользователю мессенд...

Making biofuel from algae - Los Angeles Times

Business Making biofuel from algae Chemist Matt Moranville watches another scientist’s experiment on algae at General At...

Fitting elephants in modern machine learning by statistically consistent interpolation

ABSTRACT Textbook wisdom advocates for smooth function fits and implies that interpolation of noisy data should lead to ...

Top