neherlab@biozentrum
  • Home
  • Outreach
  • Publications
  • Software
  • Talks
  • Teaching
  • Team

Running nextstrain on GISAID downloads

This tutorial will teach you how to run nextstrain/nextflu on a download from GISAID and visualize the result on your computer.

At first, you'll have to select a sensible number of sequences all of the same type/lineage! This is critical for successful analysis and only seasonal influenza virus lineages are supported at the moment.

gisaid_browse

Once you have your collection of isolates that you want to analyze, you need to download the sequences. When downloading, it is crucial that you download one segment at a time, that the fields in the FASTA have the indicated order, and that you choose YYYY-MM-DD as date format. Your download screen should look somewhat like this.

gisaid_download

If you mess up the FASTA header, nextflu won't know what is a date, what is an accession number, or the name of the isolate.

The next important step is that you put the file in the correct place and name it such that nextstrain knows what lineage and segment to expect. Your sequence download has to be placed in the directory data and named as follows

data/h3n2_ha.fasta

or more generally

data/LINEAGE_SEGMENT.fasta

where the allowed values for LINEAGE are h3n2, h1n1pdm, vic, and yam and for SEGMENT ha, na, pb1, pb2, pa, ns, np, and ma. Note that these are case sensitive!

Once you have your data in the right place, you are ready to run nextstrain. First, activate the nextstrain environment using

conda activate nextstrain

(unless it is already activated). Then run the analysis by typing

snakemake auspice/h3n2_ha_tree.json

This will tell the workflow manager snakemake to produce the file auspice/h3n2_ha_tree.json which will contain the analysis that can be visualized in your web-browser. You don't need to worry about the details of this now. But if you want to change the analysis or the file naming scheme, you will need to change the pipeline defined in the Snakefile.

To view your analysis results, type

auspice view

and open http://localhost:4000 in your browser.

You can find more information on how to customize your workflow in this Tutorial or on the documentation pages of nextstrain.

Annotated trees

It is often useful to annotate branches in a tree with the mutations that likely occurred on this branch. This is straightforwardly done using treetime and the nextstrain output:

treetime --aln results/aligned_h1n1pdm_ha.fasta --tree results/tree-raw_h1n1pdm_ha.nwk --outdir myMutationTree

will create a directory myMutationTree containing a tree file in nexus format with mutations assigned to branches. This tree can be viewed in common tree viewers such as figtree.


Published

Aug 26, 2019

Category

teaching

Tags

  • bioinformatics 39
  • phylogenetics 37
  • Imprint
  • Powered by Pelican. Theme based on: Elegant by Talha Mansoor