neherlab@biozentrum
  • Home
  • Outreach
  • Publications
  • Software
  • Talks
  • Teaching
  • Team

PanGraph: scalable bacterial pan-genome graph construction

Nicholas Noll, Marco Molari, Liam Shaw and Richard Neher

Microbial Genomics, in press., vol. , 2022.02.24.481757, 2023
10.1101/2022.02.24.481757

Abstract

The genomic diversity of microbes is commonly parameterized as population genetic polymorphisms relative to a reference genome of a well-characterized, but arbitrary, isolate. Reference genomes contain a fraction of the microbial pangenome, the set of genes observed within all isolates of a given species, and are thus blind to both the dynamics of the accessory genome, as well as variation within gene order and copy number. With the wide-spread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. Traditional computational approaches towards whole-genome analysis either scale poorly, or treat genomes as dissociated bags of genes, and thus are not suited for this new era. Here, we present PanGraph, a Julia based library and command line interface for aligning whole genomes into a graph, wherein each genome is represented as an undirected path along vertices, which in turn, encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into a several common formats for either downstream analysis or immediate visualization.


Publication date

Apr 17, 2023
10.1101/2022.02.24.481757
Apr 17, 2023

bibtex

  • Imprint
  • Powered by Pelican. Theme based on: Elegant by Talha Mansoor