TY - JOUR
T1 - Population-genetic inference from pooled-sequencing data
AU - Lynch, Michael
AU - Bost, Darius
AU - Wilson, Sade
AU - Maruki, Takahiro
AU - Harrison, Scott H
PY - 2014/1/1
Y1 - 2014/1/1
N2 - Although pooled-population sequencing has become a widely used approach for estimating allele frequencies, most work has proceeded in the absence of a proper statistical framework. We introduce a self-sufficient, closed-form, maximum-likelihood estimator for allele frequencies that accounts for errors associated with sequencing, and a likelihood-ratio test statistic that provides a simple means for evaluating the null hypothesis of monomorphism. Unbiased estimates of allele frequencies < 5/N (where N is the number of individuals sampled) appear to be unachievable, and near-certain identification of a polymorphism requires a minor-allele frequency> 10/N. A framework is provided for testing for significant differences in allele frequencies between populations, taking into account sampling at the levels of individuals within populations and sequences within pooled samples. Analyses that fail to account for the two tiers of sampling suffer from very large false-positive rates and can become increasingly misleading with increasing depths of sequence coverage. The power to detect significant allele-frequency differences between two populations is very limited unless both the number of sampled individuals and depth of sequencing coverage exceed 100. © 2014 The Author(s).
AB - Although pooled-population sequencing has become a widely used approach for estimating allele frequencies, most work has proceeded in the absence of a proper statistical framework. We introduce a self-sufficient, closed-form, maximum-likelihood estimator for allele frequencies that accounts for errors associated with sequencing, and a likelihood-ratio test statistic that provides a simple means for evaluating the null hypothesis of monomorphism. Unbiased estimates of allele frequencies < 5/N (where N is the number of individuals sampled) appear to be unachievable, and near-certain identification of a polymorphism requires a minor-allele frequency> 10/N. A framework is provided for testing for significant differences in allele frequencies between populations, taking into account sampling at the levels of individuals within populations and sequences within pooled samples. Analyses that fail to account for the two tiers of sampling suffer from very large false-positive rates and can become increasingly misleading with increasing depths of sequence coverage. The power to detect significant allele-frequency differences between two populations is very limited unless both the number of sampled individuals and depth of sequencing coverage exceed 100. © 2014 The Author(s).
KW - Allele-frequency estimation
KW - Population genomics
KW - Population subdivision
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84902964148&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=84902964148&origin=inward
U2 - 10.1093/gbe/evu085
DO - 10.1093/gbe/evu085
M3 - Article
C2 - 24787620
SN - 1759-6653
VL - 6
SP - 1210
EP - 1218
JO - Genome Biology and Evolution
JF - Genome Biology and Evolution
IS - 5
ER -