Tracking SARS-CoV-2 using real-time phylogenetics with Nextstrain


Richard Neher
Biozentrum, University of Basel


slides at neherlab.org/202004_TAGC.html

I won't talk specifically about work by myself or by my group

Instead:

Overview of a collective effort from the Nextstrain perspective

Acknowledgements

Trevor Bedford and his lab -- terrific collaboration since 2014

James Hadfield, Emma Hodcroft and Tom Sibley have led the recent development

Data we analyze are contributed by scientists from all over the world

Data are shared and curated by GISAID

Data summarized by Ian MacKay
Data summarized by Ian MacKay
Data summarized by Ian MacKay
Data summarized by Ian MacKay
Data summarized by Ian MacKay
by Trevor Bedford
by Trevor Bedford

SARS-CoV-2 and its relatives

  • SARS-CoV-2 is in the same family as SARS-CoV-1, MERS and two seasonal coronavirus
  • The latter cause mild disease, the former severe
  • SARS-CoV-2 spreads more easily than SARS-CoV-1 or MERS
  • Relatives are found in many different animals
  • Closest known relative is a virus isolated from bats
    (RaTG13, approx 96% identical)
  • Coronavirus recombine: trees need to be interpreted with caution

Tracking diversity and spread of SARS-CoV-2 in nextstrain

The SARS-CoV-2 genome

  • 29k linear (+)ssRNA genome -- one of largest RNA virus genomes
  • the first 2/3 code for the replication machinery

Available data on Jan 26

Early genomes differed by only a few mutations, suggesting very recent emergence
nextstrain.org/ncov/2020-01-26

Early lessons from SARS-CoV-2 genomes (mid Jan)

  • The outbreak originated from one source
    → not repeated zoonoses from a diverse reservoir.
  • The common ancestor of all samples was Nov - early Dec 2019.
  • Family clusters showed up as identical genomes (expected)
  • A second clade emerged that will continue to spread
→ the closest to "real-time" we have experienced so far...
Figure by James Hadfield

Available data on April 22

nextstrain.org/ncov

Subset of available data on April 22

About two mutations per month, but mutations are often clustered and overdispersed
nextstrain.org/ncov

Tracing the origins of samples from Iceland

SARS-CoV-2 is remarkably mixed geographically -- genomes connect outbreaks in different places
Gudbjartsson et al, nextstrain.org/ncov

Current take-home from SARS-CoV-2 genomes

  • SARS-CoV-2 was disseminated widely before travel restrictions
    → Border closures had little effect
  • The virus typically spread locally before it was noticed
    (while testing focussed on people with travel history to hotspots)
  • Temporal resolution by mutation is about 4 weeks
    → too low to resolve direct transmissions or directionality
    → sufficient to connect outbreaks and identify clusters
  • Currently no solid evidence for mutations with functional significance
    see Sergei Pond's analysis of signals of positive selection

Fighting misinformation

Conspiracy theories and sensationalist research

WRONG:

LIKELY WRONG:

  • Prevalence is high, the mortality is low
    False positive rates are not accounted for correctly, unrepresentative study populations
  • Some strains are more severe than others
    confounders not accounted for
  • Some strains are adapted to different parts of the world
    confounders not accounted for
  • many more....

Colorful trees are easily misinterpreted...

Low genetic diversity combined with very biased sampling → no directionality can be inferred
nextstrain.org/ncov

Nextstrain situation reports

Sidney Bell, Cassia Wagner, Emma Hodcroft, James Hadfield, Nicola Müller and others

Acknowledgements

Trevor Bedford and his lab -- terrific collaborations since 2014

James Hadfield, Emma Hodcroft, and Tom Sibley have led the recent development

People at the frontlines: health care staff and everybody else who keeps things going...

Data we analyze are contributed by scientists all over the world

Data are shared and curated by GISAID