The tool-chain augur is the bioinformatics engine of nextstrain and produces the files that can be visualized in the webbrowser using auspice. Augur consists of a number of tools that allow the user to filter and align sequences, build trees, and integrate the phylogenetic analysis with meta data. The different tools are meant to be composable and the output of one tool will serve as the input of other tools. To compose such tools into an analysis pipeline, augur advocates the workflow management tool Snakemake.
Snakemake
Snakemake breaks a workflow into a set of rules that are specified in a file called Snakefile
.
Each rule takes a number of input files, specifies a few parameters, and produces output files.
A simple rule would look like this:
rule align: input: seq="data/sequences.fasta" params: nthreads = 2 output: aln="results/alignment.fasta" shell: ''' augur align --sequences {input.seq} --nthreads {params.nthreads} --ouput {output.aln} '''
results/alignment.fasta
from the input file data/sequences.fasta
using the augur align
command and additionally specifies that augur
is supposed to use 2 CPUs for this task.
When executing a rule, Snakemake will check whether all necessary input files are present, if not, it will determine which rules produce the necessary files and execute those first. Say there was an additional rule to build a tree that dependended on the alignment:
rule tree: input: aln="results/alignment.fasta" output: tree="results/tree.nwk" shell: ''' augur tree --alignment {input.aln} --ouput {output.tree} '''
>snakemake results/tree.nwk
- find the rule that produces
results/tree.nwk
(ruletree
in this case) - determine that it needs to run rule
align
to produceresults/alignment.fasta
- run rules
align
andtree
These simple examples only scratch the surface of what Snakemake can do but should give you the general idea.
Augur commands and Snakefiles
We will work off the tutorial for Zika virus on the nextstrain web site and the github repository nextstrain/zika-tutorial. Clone the tutorial using
git clone