Basecalling with albacore on the compute cluster
The albacore base caller is installed on the braid computing cluster in /sw/qbio/bin/read_fast5_basecaller.py
Since base calling requires substantial computational resources, this is best done on the cluster
A suitable submit script is located in /home/neher/submit_albacore.sh
#!/bin/bash
# specify number of nodes and cores per node
#PBS -l nodes=1:ppn=8
# specify the time you expect the job to run hh:mm:ss
#PBS -l walltime=2:00:00
#specify the amount of memory needed
#PBS -l mem=16G
# output and error files
#PBS -o myout.o$PBS_JOBID
#PBS -e myout.e$PBS_JOBID
# load paths
source /home/neher/.bashrc
# move to current working directory
cd $PBS_O_WORKDIR
# call albacore
read_fast5_basecaller.py -f $1 -k $2 -i $3 -s new_calls -t 8 -o fastq
#read_fast5_basecaller.py -f FLO-MIN107 -k SQK-RAD002 -i reads/fail/0 -s new_calls -t 8 -o fastq```
This script can be called via qsub
as
qsub ../submit_albacore.sh -F "FLO-MIN007 SQK-RAD002 reads/pass/0"
where the three arguments specify the flow cell (FLO-MIN007
), the sequencing kit (SQK-RAD002
) and the directory with the reads to be called.
My preliminary test indicate that the cluster can call between 3 and 10MB of fastq per minute when run on 16 cpus.