Human seasonal influenza viruses
slide by Trevor Bedford
Influenza viruses evolve to avoid human immunity
Vaccines need frequent updates
Model of rapidly adapting virus populations
RN, Annual Reviews, 2013; Desai & Fisher; Brunet & Derrida; Kessler & Levine
Typical tree
Bolthausen-Sznitman Coalescent
RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007; Desai, Walczak, Fisher, Genetics, 2013
Traveling waves and the Bolthausen-Snitman coalescent
Branching process approximation: $P(n_i, t|x_i)$
Does a sample (blue dots) have a common ancestor $\tau$ generations ago?
$\quad Q_b = \langle \sum_i \left(\frac{n_i}{\sum_j n_j}\right)^b\rangle \approx \frac{\tau-T_c}{T_c(b-1)} $
All other merger rates are also consistent with the Bolthausen-Sznitman coalescent: $\quad\lambda_{b,k} = \frac{(k-2)!(b-k)!}{T_c (b-1)!}$
RN, Hallatschek, PNAS, 2013; see also Brunet and Derrida, PRE, 2007
Bursts in a tree ↔ high fitness genotypes
Predicting evolution
Given the branching pattern:
can we predict fitness?
pick the closest relative of the future?
RN, Russell, Shraiman, eLife, 2014
Fitness inference from trees
$$P(\mathbf{x}|T) = \frac{1}{Z(T)} p_0(x_0) \prod_{i=0}^{n_{int}} g(x_{i_1}, t_{i_1}| x_i, t_i)g(x_{i_2}, t_{i_2}| x_i, t_i)$$
RN, Russell, Shraiman, eLife, 2014
Prediction of the dominating H3N2 influenza strain
no influenza specific input
how can the model be improved? (see model by Luksza & Laessig)
what other context might this apply?
RN, Russell, Shraiman, eLife, 2014
Prediction of the dominating H3N2 influenza strain
Hemagglutination Inhibition assays
Slide by Trevor Bedford
HI data sets
Long list of distances between sera and viruses
Tables are sparse, only close by pairs
Structure of space is not immediately clear
MDS in 2 or 3 dimensions
Smith et al, Science 2002
Slide by Trevor Bedford
Integrating antigenic and molecular evolution
$H_{a\beta} = v_a + p_\beta + \sum_{i\in (a,b)} d_i$
each branch contributes $d_i$ to antigenic distance
sparse solution for $d_i$ through $l_1$ regularization
RN et al, PNAS, 2016
Integrating antigenic and molecular evolution
MDS: $(d+1)$ parameters per virus
Tree model: $2$ parameters per virus
Sparse solution → identify branches or substitutions that cause titer drop
but...
HI titers are a crude proxy for selection in humans
need human serology + infection history
better coverage where most of influenza infections happen
RN et al, PNAS, 2016
Lots of reassortement
next to no idea what mutations in segments other than HA do...
What about bacteria?
vertical and horizontal transmission
genome rearrangements
much larger genomes
NGS genomes tend to be fragmented
very little understanding of relevant phenotypes
what are the relevant spatial and temporal scales?
pan-genome identification pipeline
phylogenetic analysis of each orthologous cluster
detect associations with phenotypes
fast: analyze hundreds of genomes in a few hours
github.com/neherlab/pan-genome-analysis
Bacteria: Species trees and gene trees
Structural variation
closely related strains differ in dozens to hundreds of genes
frequent rearrangements/loss of synteny
often multiple plasmids from a few to several 100 thousand bases
recombination necessary for genome maintainance and segregation
most evolutionary analyses look for SNPs in a "core genome"
but the action is somewhere else...
Genome assembly with short reads
10s of millions of short reads (<500bp)
Too short to bridge repetitive elements
→ assemblies are fragmented into 100s of "contigs"
(really terrible example)
Images: illumina.com, github.com/rrwick
Ancestral recombination graphs
find orthologous clusters → use as seeds for local alignments
stitch together into a larger graph
identify synteny blocks, recombination hotspots, across species transfers...
Another global experiment...
Sequencing of Carbapenem resistant bacteria
Global spread over the last 30 years
multiple carbapenemases
chromosomal and plasmid mediated
resistance stacking: plasmids or integrons loaded with lots of resistance genes
Questions:
What are the right questions?
Gene or genome?
What are the assembly rules?
Species boundaries? Or geography/treatment regimens?
How does this translate to multispecies environments?