Bacterial genomes differ in gene content since parts of their genome can be duplicated, lost, or picked up from the environment. The latter can be mediated by phages or dedicted DNA uptake mechanisms. This type of horizontal transfer complicates the phylogenetic analysis is bacteria. With the total number of genomes that are analyzed, the number of genes found in all strains (often referred to as the core genome) goes down, while the total number of distinct genes goes up (the pan-genome).
We will use here the pan-genome pipeline developed by Wei Ding in my lab. PanX combines a pipeline based on DIAMOND and MCL, phylogenetic post-processing, and a visualization tool.
To install panX, type
git clone https://github.com/neherlab/pan-genome-analysis.git
cd pan-genome-analysis
git submodule update
on the command line. All other dependencies should be install already. This will now allow you to run the pipe-line, either on the test data provided or on the your own data.
Pan-genome visualization
The visualization is best installed on your own laptop. If you have git
install, the following series of commands should do the trick
git clone https://github.com/neherlab/pan-genome-visualization.git
git submodule update --init
npm install
npm start
For this to work, you need an up-to-date version of node, which can be obtained from nodejs.org. On you local laptop, you should now be able to view the page at localhost:8000.