Philipp W. Messer and Richard A. Neher
Genetics, vol. 191, 593--605, 2012
Selective sweeps are typically associated with a local reduction of genetic diversity around the adaptive site. However, selective sweeps can also quickly carry neutral mutations to observable population frequencies if they arise early in a sweep and hitchhike with the adaptive allele. We show that the interplay between mutation and exponential amplification through hitchhiking results in a characteristic frequency spectrum of the resulting novel haplotype variation that depends only on the ratio of the mutation rate and the selection coefficient of the sweep. On the basis of this result, we develop an estimator for the selection coefficient driving a sweep. Since this estimator utilizes the novel variation arising from mutations during a sweep, it does not rely on preexisting variation and can also be applied to loci that lack recombination. Compared with standard approaches that infer selection coefficients from the size of dips in genetic diversity around the adaptive site, our estimator requires much shorter sequences but sampled at high population depth to capture low-frequency variants; given such data, it consistently outperforms standard approaches. We investigate analytically and numerically how the accuracy of our estimator is affected by the decay of the sweep pattern over time as a consequence of random genetic drift and discuss potential effects of recombination, soft sweeps, and demography. As an example for its use, we apply our estimator to deep sequencing data from human immunodeficiency virus populations.