### Real-time analysis and forecasting of influenza virus evolution

Richard Neher

Biozentrum, University of Basel

slides at neherlab.org/201803_IMRP.html

- Influenza viruses evolve to avoid human immunity
- Vaccines need frequent updates

### Large scale sequencing -- A/H3N2 genomes in GISAID

### Joint work with....

- Boris Shraiman
- Colin Russell
- Trevor Bedford

## Features

- Maps mutations to the tree
- Calculates frequency trajectories of every major mutation
- Allows subsetting of data to date ranges and geographic region
- Time-scaled and standard phylogenetic trees
- Updated frequently, reflects GISAID data
- Integrates HI data and molecular evolution data

## Beyond tracking: can we predict?

## Different approaches to predict IAV evolution

- extrapolation of current frequency trajectories
- sampling bias can affect this dramatically

- explicit fitness score based on historical patterns (Lukzsa and LĂ¤ssig)
- epitope mutations
- other mutations -- interfere with virus function

- fitness inference from branching patterns in the tree (RN, Russell, Shraiman)
- requires no historical data
- not influenza specific

Recent review: Morris et al, Trends in Microbiology, 2017

### Model of an adapting influenza virus population

RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derrida; Kessler & Levine

#### Typical tree

#### Bolthausen-Sznitman Coalescent

RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007; Desai, Walczak, Fisher, Genetics, 2013
## Bursts in a tree ↔ high fitness genotypes

### Fitness inference from trees

$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$

RN, Russell, Shraiman, eLife, 2014
### Validation on simulated data

RN, Russell, Shraiman, eLife, 2014
### Prediction of the dominating H3N2 influenza strain

- Since 2015: Reports with (conservative) predictions are available on nextflu.org

RN, Russell, Shraiman, eLife, 2014
### Sept 2015: "3c2.a will continue to dominate"

### Feb 2016: "...we predict the HA1:171K (now 3c2.a1) variant to dominate..."

### Sep 2016: "...we predict that clade 3c2.a1 variant to dominate, but..."

### Feb 2017: "...we predict clades 171K/121K (3c2.a1a) and 131K/142K (3c2.a2) to be successful..."

### Sep 2017: "...we think clades 3c2.a1a/135K, 3c2.a2, 3c2.a3 are competitive"

### A reassortant dominated A/H3N2 circulating this past season

### HI data sets

- Long list of distances between sera and viruses
- Tables are sparse, only close by pairs
- Structure of space is not immediately clear
- MDS in 2 or 3 dimensions

Smith et al, Science 2002
Slide by Trevor Bedford
### Integrating antigenic and molecular evolution

- $H_{a\beta} = v_a + p_\beta + \sum_{i\in (a,b)} d_i$
- each branch contributes $d_i$ to antigenic distance
- sparse solution for $d_i$ through $l_1$ regularization
- related model where $d_i$ are associated with substitutions

RN et al, PNAS, 2016
### Integrating antigenic and molecular evolution

- MDS: $(d+1)$ parameters per virus
- Tree model: $2$ parameters per virus
- Sparse solution

→ identify branches or substitutions that cause titer drop

RN et al, PNAS, 2016
### Rate of antigenic evolution

- Cumulative antigenic evolution since the root: $\sum_i d_i$
- A/H3N2 evolves faster antigenically
- A/H3N2 has a more rapid population turn-over
- Proportion of children is high in B vs A/H3N2 infections

### How many sites are involved?

Mutation | effect |

K158N/N189K |
3.64 |

K158R |
2.31 |

K189N |
2.18 |

S157L |
1.29 |

V186G |
1.25 |

S193F |
1.2 |

K140I |
1.1 |

F159Y |
1.08 |

K144D |
1.08 |

K145N |
0.91 |

S159Y |
0.89 |

I25V |
0.88 |

Q1L |
0.85 |

K145S |
0.85 |

K144N |
0.85 |

N145S |
0.85 |

N8D |
0.73 |

T212S |
0.69 |

N188D |
0.65 |

### Exploring HI data relative to individual sera

### nextflu and nextstrain.org

- Trevor Bedford
- Colin Megill
- Pavel Sagulenko
- Sidney Bell
- James Hadfield
- Wei Ding