From 1'000s to 1'000'000s of samples

From a few labs to hundreds of contributing labs

Needed tools to make sense of an avalanche of data

Fast: immediate results/summaries
Robust: data quality varies, sampling is biased. Complex models go wrong in weird ways when assumptions are violated.
Simple: anything to complex can't be interpreted reliably.

Nextstrain's focus: enable teams to make sense of their data

Workflows to analyze custom data + background

All open sequences annotated and aligned, updated daily:

Prevalent QC problems changed over time:
- Initially: assembly/mapping problems with clustered SNPs
- Later: Reference calls in uncovered regions
- Later: Contamination/cross-talk
Feature additions:
- Differentiating between biological (known) frame shifts/stop codons and artefacts
- Detecting mixed lineages (recombinants, chimeras, contaminants)
- Added Pango lineage calling
Rewrites:
- Summer 2020: "web-first" JavaScript application (plus node.js CLI)
- Spring 2021: rewrite in C++ with web-assembly
- Jun 2022: release of the Rust version

Simple tools often more helpful than sophisticated ones
Client side computing in the browser has a lot of potential
Upsides: no expensive backends, no data privacy issues
Lineage annotations are immensely useful!
→ use sequence information without phylogenetics (done by humans).

Ivan Aksamentov
Cornelius Roemer