In a collaboration with the Sommer lab, we have a new paper out in Genetics. We report whole genome resequencing of 104 globally sampled strains of Pristionchus pacificus and the draft genome of a close outgroup P. expectatus. The strains have been collected and sequenced by the group of Ralf Sommer here at the MPI in Tuebingen. This represents one of the largest data sets to analyze population genetic processes in a natural population.
Evolution and population genetics on multiple time scales
The population is diverse and highly structured, which allows us to study evolution at various time scales. Intra-clade diversity is about 0.3% and inter-clade diversity about 1%, while the outgroup is roughly 10% diverged. Strong population structure indicates that there is little shared variation across clades, but the chromosomal diversity profiles (see figure) are strongly correlated across clades suggesting that similar, but independent, {.wp-image-264 .alignleft width="240" height="138"} processes shape genetic diversity. Together with the observation that an increasing amount of non-synonymous variation is lost with increasing separation and a negative correlation of local gene density and diversity suggests purifying selection as dominant force shaping diversity. LD, however, extends over many megabases and the dynamics of most polymorphisms is predominantly determined by its genetic background rather than its own effect on fitness. The latter only manifests itself on time scales comparable to the separation of clades.
Site frequency spectra are shaped by drift and draft
Linked selection acting on many loci -- whether deleterious or beneficial does not matter -- results in distortions of genealogies with characteristic effects on the site frequency spectrum (SFS). In particular, one expects a U-shaped spectrum with an excess of rare and high frequency derived alleles. After careful polarization, this is exactly what we see. {.wp-image-263 .alignright width="208" height="240"} Up to around frequencies of 10%, the SFS is compatible with neutral models (1/k decay), but at intermediate frequencies it resembles the SFS expected in presence of linked selection. This is plausible, as there is ample opportunity for linked selection: most non-synonymous polymorphisms are selected against and LD extends over several megabases. While individual effects might be weak, the collective aggregation of long linked haplotypes can exert strong selection.
Why do we care?
Neutral models have long been used to interpret genetic diversity data -- but they rarely fit well, at least when there is enough data. Rather than forcing the data into a particular model framework, we have attempted to find informative observables. The U-shaped SFS, for example, clearly show the inadequacies of neutral neutral models. The SFS suggest that drift-like effects are important on short time scales, but that genetic diversity is shaped by different processes on longer time scales. Those processes probably have little to do with the population size and caution is warranted when using inference methods based on neutral null models. While linked selection on predominantly deleterious variation is a plausible cause for these non-drift like features, further research is necessary to tell to what extent ecological processes are shaping genetic diversity in this species.
Christian Rodelsperger and Richard Neher