30 Oct 2020
Meet the Team: Dr Sebastian Duchene uses computational biology to expose COVID-19
University of Melbourne Dr Sebastian Duchene, ARC Discovery Early Career Fellow at the Doherty Institute, shines a light on the genetic detective work required to map the evolution and outbreak of COVID-19.
Can you introduce yourself and your role at the Doherty Institute?
I am an ARC early career fellow and joined the Doherty Institute in early 2019. My expertise is in computational biology. Specifically, I focus on molecular evolutionary analyses of infectious pathogens. For example, we can estimate the time of origin of infectious outbreaks, map their spread across geographic regions, and ultimately understand their epidemiological dynamics. This is entirely computational work that consists of analysing genomic sequence data using a range of statistical models.
What originally attracted you to this unique area of science?
When I was studying biology, I became really passionate the practicality of computational biology - I loved the fact that I could make discoveries on my home computer by analysing data that was publicly available. From there, I became fascinated in a topic called molecular clocks. The idea behind molecular clocks is that we can use genome sequencing data to estimate when and how species evolved; I began applying this technique to understand the evolution of killer whales.
A couple of years later, I moved to Australia for my PhD with Professor Simon Ho. Virologist Professor Eddie Holmes joined our department and he pointed out that my work in molecular clocks was relevant to viruses because they evolve very quickly, allowing us to quantify their evolution in calendar time. Together, we developed a few approaches to understand the evolutionary timescales and rates over which many viruses evolve, including HIV, influenza and others. We found that there is a lot of variation in virus evolutionary rates and that they depend on the timescale over which they are measured - a phenomenon known as time-dependent rates. Overall, these techniques could tell us when a virus emerged in humans or any other hosts, allowing us information that could be used to trace transmission pathways and make inferences broadly about their epidemiology.
Since then, there have been significant developments in this field. The most transformational has been that genomic data is now ubiquitous in many evolutionary studies. For instance, it’s played an important role in the COVID-19 pandemic. The pandemic has been the first time that we’ve been able to sequence positive cases as they're diagnosed. This wasn’t possible several years ago, where our analyses were all retrospective – helping us to understand how things had happened. Now, we’re able to trace the evolution of the virus almost in real time. It has really changed the perspective of my research, as it can help to ‘nowcast’ as opposed to simply understanding the past.
Can you provide an example of genomic ‘nowcasting’?
The initial study that I completed on SARS-CoV-2 is a good example of understanding the very recent evolutionary signatures. Scientists were sharing a few sequences a day of SARS-CoV-2 (the causative agent of COVID-19) on GISAID, a platform for depositing genome data from influenza and other viruses. At the time, it was mostly Chinese sequences and a some from Europe. It was during the early days of the pandemic, at the start of 2020, and my team began collecting these sequences every day to determine when the virus originated – at the time, there was some controversy around whether it had been manufactured in a lab. Within a couple of weeks, our estimates were robust enough to demonstrate that the virus had originated in late 2019 - between October and early December - consistent with our understanding that the virus probably emerged from a spillover event from wildlife a short time before the first cases were reported in China. We also estimated the evolutionary rate of the virus, which is pretty ordinary for a coronavirus. Our results are incompatible with theories that the virus was circulating much earlier in the year or that it was a laboratory strain, which typically have unusually high evolutionary rates.
At the time, I was struck by how quickly we’d made these inferences and by how rapidly the virus was spreading. This was roughly the same time that epidemiologists were obtaining their first estimates of R naught - the basic reproductive number – and we were able to estimate that using genomic data. This has been a significant example of ‘nowcasting’ in action.
You have pivoted the focus of your research this year to support the response to COVID-19. What have been some of the challenges?
A challenge has been that I didn’t expect that the SARS-CoV-2 virus would spread so quickly and that we would be required to obtain such a large number of genomic sequences. Initially, we were dealing with less than a thousand genomes in Victoria. Today, we’ve sequenced nearly 10,000 samples since the second wave.
The challenge has been dealing with these very vast data sets and asking the right questions that can be answered with the data. For example, Victoria’s second wave of infections has been linked with a small number of imported cases in hotel quarantine. We call this a ‘clonal outbreak’, where most of these genomes are very similar. If you were to ask a question about who infected whom, it is very difficult to tease that information out of the data because the cluster spread so quickly, and the genomes are very similar. Our techniques to address these questions have improved very rapidly, but there is more work to do in this space.
What does a ‘typical’ day look like doing this work?
It really varies day to day. With the SARS-CoV-2 genomics work, I’ll receive a data set from Microbiological Diagnostic Unit (MDU) Public Health Laboratory. There’s a phenomenal team of bioinformaticians and epidemiologists at the laboratory and we have some of the best data quality in the world. When I receive a data set, it often comes with a request to provide statistical evidence of the effectiveness of public health interventions. To discover this is a process. I like to use a pen to draw equations that I might need to test different hypothesis around transmission models. Then, I’ll write computer code and run a range of analyses. I spend a lot of time lying face down coding. I don’t like using my desk too much. On some days, I receive support from members of the team to clean the data, making sure that the metadata and the sequences match. That’s very important. I also have plenty of Zoom meetings, like everybody else.
On some days, a special request will come in. For instance, when New Zealand had their second wave of infections, one of my colleagues was working in New Zealand and called me. She wanted to learn more about the virus lineage that we had in Australia to help learn more about where the cases may have come from. I was able to access our servers and go through our databases and examine a range of Australians virus lineages. All of these lineages have labels – B.1, B.1.1, etc. I was able to confirm that the sequence did not match one of our clusters. That was a hectic day, given the urgency that was required.
What are some of the biggest misconceptions you’ve encountered while working on COVID-19?
One that is close to my heart is misinformation around the virus mutating. Viruses mutate and evolve – that’s what they do. I get asked many questions from friends and family about whether SARS-CoV-2 is becoming ‘more deadly’.
In fact, when we estimated the rate of evolution of the virus initially it was pretty much what we expected for this virus, like any other coronavirus that exists. In fact, it’s five times slower than flu. The probability of it becoming more virulent or more transmissible in a short timeframe is quite low. It can happen and there is evidence that there’s a lineage that is very slightly more transmissible, but not more deadly. I’m concerned when this issue gets blown out of proportion.
This year has been a big year for you. What is an achievement that you’re proud of?
The Doherty Institute has done a great job at obtaining genome sequence data and the initial publications (Nature and Virus Evolution) examining the early evolution of the virus were quite influential. Some of my team are collaborating with people in countries in the northern hemisphere because they want to understand how their pandemic started. It’s been exciting to open the door to some meaningful international collaborations and I'm proud of the methods I've developed that are being applied in these global settings, including those that have been published in Molecular Biology and Evolution. For instance, some of these methods can tell us whether public health interventions have had an impact - we have collaborators in Spain looking at how their first lockdown has decreased the genetic diversity and also the transmission rate of the virus. We have also been able to apply this in New Zealand.
Computational science is evolving rapidly. Do you have any recommendations for emerging scientists who would like to move into this field?
To begin, I think it’s useful to read about the science as much as possible, particularly ‘popular science’. I think this is very helpful. Some of the knowledge I have – for instance, on how HIV originated - is from books by David Quammen and Peter Doherty. Popular science makes some difficult concepts more accessible. I wish I could have given my younger self that advice. As an area of science, computational research can seem quite difficult, however it's a matter of utilising the tools to understanding it. We've written many tutorials online that are available to the public. Finally, the other thing that I strongly recommend is to get in touch with scientists. I really enjoy receiving emails from high school students asking me about something they saw in the media. It’s always good to support up-and-coming scientists.