This workshop provides an overview of SARS-CoV-2 sequence analysis with Nextstrain. We will try to cover the following topics.
- Preliminary analysis of new data with Nextclade
- Producing your own build using a background set, a download from GISAID, and your data
- Discuss the possibility to establish weekly updated analysis for each country in Africa.
For the first two points, hope you'll be able to follow along.
The previous webinar already used nextstrain, so I assume most of you have augur
and the ncov
pipeline installed.
-
example data for the Nextclade analysis (180 recent global sequences):
- sequences.fasta.xz
- metadata.tsv
- If you are unsure how to decompress the sequence file, use this uncompressed link sequences.fasta
- if this is all going a bit too fast, click on the following link and Nextclade will open and load the data file directly from the web:
clades.nextstrain.org?input-fasta=https://neherlab.org/teaching_notes/2021-07-27_AfricaCDC_sequences.fasta
-
background data for the
ncov
build. - GISAID allows to download data in a format that can be directly read in by augur. But one challenge is to find search parameters that yield a data set of suitable size (see here).
I used the following (this will generate a few thousand sequences, but we will subsample them further):
- collection date: 2021-06-01 --
- region: Africa
- add your own data with appropriately formatted metadata.
Changes required to my_profile/getting_started/builds.yaml
inputs:
- name: global_context
metadata: s3://nextstrain-data/files/ncov/open/global/metadata.tsv.xz
sequences: s3://nextstrain-data/files/ncov/open/global/sequences.fasta.xz
- name: africa_set
metadata: ACDC_data/gisaid_auspice_input_hcov-19_2021_07_26_15.tar
sequences: ACDC_data/gisaid_auspice_input_hcov-19_2021_07_26_15.tar
- name: additional_data
metadata: ACDC_data/metadata.tsv
sequences: ACDC_data/sequences.fasta
builds:
my-build:
subsampling_scheme: my-subsampling
region: global
subsampling:
my-subsampling:
world:
group_by: "country year month"
seq_per_group: 10
exclude: "--exclude-where region=='Africa'"
africa:
group_by: "country year month"
seq_per_group: 100
exclude: "--exclude-where region!='Africa'"
min_date: "--min-date 2021-06-01"
refine:
root: "Wuhan-Hu-1/2019"
If fetching from s3
doesn't work, you can download these files and point directly to the files.
You can find the links to these files above.
In addition, if you have more than one core available on your computer, you should change the my_profile/getting_started/config.yaml
to reflect that.
Important links
Nextstrain
- nextstrain.org
- Nextclade
- nextstrain documentation
- Exploring interactive phylogenies with Auspice
- Analysis with augur and Snakemake -- viruses
- Time-scaled phylogenies with TreeTime
- Explore SARS-CoV-2 evolution in Auspice by Cassia Wagner
- Nextstrain webinar by US CDC