Using open data to track and predict infectious disease
Richard Neher
Biozentrum & SIB, University of Basel
slides at neherlab.org/201910_RIVM.html
Sequences record the spread of pathogens
Mutations accumulate at a rate of $10^{-5}$ per site and day!
images by Trevor Bedford
Frequent mutations imply...
- most viruses in an outbreak/season differ from each other
- transmission chains are can be inferred
- transmission can be ruled out!
- geographic spread can be reconstructed
- drug resistance surveillance
- specific mutations might mediate antigenic mismatch
- Influenza viruses evolve to avoid human immunity
- Vaccines need frequent updates
Vaccine strain selection schedule
Klingen and McHardy, Trends in Microbiology
GISRS and GISAID -- Influenza virus surveillance
- comprehensive coverage of the world
- timely sharing of data through GISAID -- often within 2-3 weeks of sampling
- hundreds of sequences per week (in peak months)
→ requires continuous analysis and easy dissemination
→ interpretable and intuitive visualization
Visualization features of nextstrain
- Regular and time scaled phylogenies
- Mutations are mapped to the tree
- Filtering to time interval, region, country, authors, ...
- Zoom into clades
- Information on specific viruses
- Color by amino acid or nucleotide
- Frequency trajectories of clades and mutations
- Color by antigenic advance, predictive scores, etc
Hadfield et al, 2018
Beyond tracking: can we predict?
Fitness variation in rapidly adapting populations
- Speed of adaptation is logarithmic in population size
- Environment (fitness landscape), not mutation supply, determines adaptation
- Different models have universal emerging properties
RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derride; Kessler & Levine
Predicting evolution
Given the branching pattern:
- can we predict fitness?
- pick the closest relative of the future?
RN, Russell, Shraiman, eLife, 2014
Fitness inference from trees
$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$
RN, Russell, Shraiman, eLife, 2014
Prediction of the dominating H3N2 influenza strain
- no influenza specific input
- how can the model be improved? (see model by Luksza & Laessig)
- what other context might this apply?
RN, Russell, Shraiman, eLife, 2014
Our current prediction...
Enterovirus D68 -- with Robert Dyrdak, Emma Hodcroft & Jan Albert
- Non-polio enterovirus
- Almost everybody has antibodies against EV-D68
- Large outbreak in 2014 with severe neurological symptoms in
young children (acute flaccid myelitis)
- Another outbreak in 2016
- Outbreaks tend to start in late summer/fall
- Several reports of EV-D68 outbreaks last fall
(201 AFM cases in the US in 2018)
EV-D68 whole genome deep sequencing project across Europe
Geographic and demographic distribution EV-D68
Acknowledgments
- Trevor Bedford
- Pavel Sagulenko
- James Hadfield
- Emma Hodcroft
- Tom Sibley
- and others
Influenza and Theory acknowledgments
- Boris Shraiman
- Colin Russell
- Trevor Bedford
- Oskar Hallatschek
Acknowledgments -- Enterovirus
- Robert Dyrdak
- Jan Albert
- Lina Thebo
- Emma Hodcroft
- Bert Niesters (Groningen)
- Randy Poelman (Groningen)
- Elke Wollants (Leuven)
- Adrian Egli (Basel)
- Andrés Antón Pagarolas (Barcelona)