Tracking SARS-CoV-2 using real-time phylogenetics with Nextstrain
Richard Neher
Biozentrum, University of Basel
slides at neherlab.org/202004_TAGC.html
I won't talk specifically about work by myself or by my group
Instead:
Overview of a collective effort from the Nextstrain perspective
Acknowledgements
Trevor Bedford and his lab -- terrific collaboration since 2014
James Hadfield, Emma Hodcroft and Tom Sibley have led the recent development
Data we analyze are contributed by scientists from all over the world
Data are shared and curated by GISAID
SARS-CoV-2 and its relatives
SARS-CoV-2 is in the same family as SARS-CoV-1, MERS and two seasonal coronavirus
The latter cause mild disease, the former severe
SARS-CoV-2 spreads more easily than SARS-CoV-1 or MERS
Relatives are found in many different animals
Closest known relative is a virus isolated from bats (RaTG13, approx 96% identical)
Coronavirus recombine: trees need to be interpreted with caution
The SARS-CoV-2 genome
29k linear (+)ssRNA genome -- one of largest RNA virus genomes
the first 2/3 code for the replication machinery
Early lessons from SARS-CoV-2 genomes (mid Jan)
The outbreak originated from one source
→ not repeated zoonoses from a diverse reservoir.
The common ancestor of all samples was Nov - early Dec 2019.
Family clusters showed up as identical genomes (expected)
A second clade emerged that will continue to spread
→ the closest to "real-time" we have experienced so far...
Figure by James Hadfield
Subset of available data on April 22
About two mutations per month, but mutations are often clustered and overdispersed
nextstrain.org/ncov
Current take-home from SARS-CoV-2 genomes
SARS-CoV-2 was disseminated widely before travel restrictions
→ Border closures had little effect
The virus typically spread locally before it was noticed
(while testing focussed on people with travel history to hotspots)
Temporal resolution by mutation is about 4 weeks
→ too low to resolve direct transmissions or directionality
→ sufficient to connect outbreaks and identify clusters
Currently no solid evidence for mutations with functional significance
see Sergei Pond's analysis of signals of positive selection
Conspiracy theories and sensationalist research
WRONG:
LIKELY WRONG:
Prevalence is high, the mortality is low
False positive rates are not accounted for correctly, unrepresentative study populations
Some strains are more severe than others
confounders not accounted for
Some strains are adapted to different parts of the world
confounders not accounted for
many more....
Colorful trees are easily misinterpreted...
Low genetic diversity combined with very biased sampling → no directionality can be inferred
nextstrain.org/ncov
Sidney Bell, Cassia Wagner, Emma Hodcroft, James Hadfield, Nicola Müller and others
Acknowledgements
Trevor Bedford and his lab -- terrific collaborations since 2014
James Hadfield, Emma Hodcroft, and Tom Sibley have led the recent development
People at the frontlines: health care staff and everybody else who keeps things going...
Data we analyze are contributed by scientists all over the world
Data are shared and curated by GISAID