Nextstrain webinar at Africa CDC

This workshop provides an overview of SARS-CoV-2 sequence analysis with Nextstrain. We will try to cover the following topics.

Preliminary analysis of new data with Nextclade
Producing your own build using a background set, a download from GISAID, and your data
Discuss the possibility to establish weekly updated analysis for each country in Africa.

For the first two points, hope you'll be able to follow along. The previous webinar already used nextstrain, so I assume most of you have augur and the ncov pipeline installed.

example data for the Nextclade analysis (180 recent global sequences):
- sequences.fasta.xz
- metadata.tsv
- If you are unsure how to decompress the sequence file, use this uncompressed link sequences.fasta
- if this is all going a bit too fast, click on the following link and Nextclade will open and load the data file directly from the web:
  clades.nextstrain.org?input-fasta=https://neherlab.org/teaching_notes/2021-07-27_AfricaCDC_sequences.fasta
background data for the ncov build.
- sequences.fasta.xz
- metadata.tsv
GISAID allows to download data in a format that can be directly read in by augur. But one challenge is to find search parameters that yield a data set of suitable size (see here). I used the following (this will generate a few thousand sequences, but we will subsample them further):
- collection date: 2021-06-01 --
- region: Africa
add your own data with appropriately formatted metadata.

Changes required to `my_profile/getting_started/builds.yaml`

inputs:
  - name: global_context
    metadata: s3://nextstrain-data/files/ncov/open/global/metadata.tsv.xz
    sequences: s3://nextstrain-data/files/ncov/open/global/sequences.fasta.xz
  - name: africa_set
    metadata: ACDC_data/gisaid_auspice_input_hcov-19_2021_07_26_15.tar
    sequences: ACDC_data/gisaid_auspice_input_hcov-19_2021_07_26_15.tar
  - name: additional_data
    metadata: ACDC_data/metadata.tsv
    sequences: ACDC_data/sequences.fasta

builds:
  my-build:
    subsampling_scheme: my-subsampling
    region: global

subsampling:
  my-subsampling:
    world:
      group_by: "country year month"
      seq_per_group: 10
      exclude: "--exclude-where region=='Africa'"
    africa:
      group_by: "country year month"
      seq_per_group: 100
      exclude: "--exclude-where region!='Africa'"
      min_date: "--min-date 2021-06-01"

refine:
  root: "Wuhan-Hu-1/2019"

If fetching from s3 doesn't work, you can download these files and point directly to the files. You can find the links to these files above.

In addition, if you have more than one core available on your computer, you should change the my_profile/getting_started/config.yaml to reflect that.

Nextstrain webinar at Africa CDC

Changes required to `my_profile/getting_started/builds.yaml`

Important links

Nextstrain

General

Published

Category

Tags

Changes required to my_profile/getting_started/builds.yaml

Important links

Nextstrain

General

Published

Category

Tags

Changes required to `my_profile/getting_started/builds.yaml`