neherlab@biozentrum
  • Home
  • Outreach
  • Publications
  • Software
  • Talks
  • Teaching
  • Team

Nextstrain webinar at Africa CDC

This workshop provides an overview of SARS-CoV-2 sequence analysis with Nextstrain. We will try to cover the following topics.

  • Preliminary analysis of new data with Nextclade
  • Producing your own build using a background set, a download from GISAID, and your data
  • Discuss the possibility to establish weekly updated analysis for each country in Africa.

For the first two points, hope you'll be able to follow along. The previous webinar already used nextstrain, so I assume most of you have augur and the ncov pipeline installed.

  • example data for the Nextclade analysis (180 recent global sequences):

    • sequences.fasta.xz
    • metadata.tsv
    • If you are unsure how to decompress the sequence file, use this uncompressed link sequences.fasta
    • if this is all going a bit too fast, click on the following link and Nextclade will open and load the data file directly from the web:
      clades.nextstrain.org?input-fasta=https://neherlab.org/teaching_notes/2021-07-27_AfricaCDC_sequences.fasta
  • background data for the ncov build.

    • sequences.fasta.xz
    • metadata.tsv
  • GISAID allows to download data in a format that can be directly read in by augur. But one challenge is to find search parameters that yield a data set of suitable size (see here). I used the following (this will generate a few thousand sequences, but we will subsample them further):
    • collection date: 2021-06-01 --
    • region: Africa
  • add your own data with appropriately formatted metadata.

Changes required to my_profile/getting_started/builds.yaml

inputs:
  - name: global_context
    metadata: s3://nextstrain-data/files/ncov/open/global/metadata.tsv.xz
    sequences: s3://nextstrain-data/files/ncov/open/global/sequences.fasta.xz
  - name: africa_set
    metadata: ACDC_data/gisaid_auspice_input_hcov-19_2021_07_26_15.tar
    sequences: ACDC_data/gisaid_auspice_input_hcov-19_2021_07_26_15.tar
  - name: additional_data
    metadata: ACDC_data/metadata.tsv
    sequences: ACDC_data/sequences.fasta

builds:
  my-build:
    subsampling_scheme: my-subsampling
    region: global

subsampling:
  my-subsampling:
    world:
      group_by: "country year month"
      seq_per_group: 10
      exclude: "--exclude-where region=='Africa'"
    africa:
      group_by: "country year month"
      seq_per_group: 100
      exclude: "--exclude-where region!='Africa'"
      min_date: "--min-date 2021-06-01"

refine:
  root: "Wuhan-Hu-1/2019"

If fetching from s3 doesn't work, you can download these files and point directly to the files. You can find the links to these files above.

In addition, if you have more than one core available on your computer, you should change the my_profile/getting_started/config.yaml to reflect that.

Important links

Nextstrain

  • nextstrain.org
  • Nextclade
  • nextstrain documentation
  • Exploring interactive phylogenies with Auspice
  • Analysis with augur and Snakemake -- viruses
  • Time-scaled phylogenies with TreeTime
  • Explore SARS-CoV-2 evolution in Auspice by Cassia Wagner
  • Nextstrain webinar by US CDC

General

  • getting miniconda
  • snakemake tutorial

Published

Jul 27, 2021

Category

teaching

Tags

  • bioinformatics 39
  • phylogenetics 37
  • Imprint
  • Powered by Pelican. Theme based on: Elegant by Talha Mansoor