“In this paper, we review the performance of hidden Markov model-based imputation methods in animal breeding populations. Imputation is a technique that decreases the cost of genotyping animals, by allowing animal breeders to genotype most of their animals on a low-cost, low-density genotyping array, and a few of their animals on a higher-cost, higher-density genotyping array. Statistical regularities between the genotypes of the low-density and high-density genotypes can be used to “fill-in” the genotypes missing from the low-density array.
Traditionally, pedigree and heuristic based imputation methods have been used in animal breeding due to their computation efficiency, scalability, and accuracy. Recent advances in the area of human genetics have increased the ability of probabilistic hidden Markov model methods to perform accurate phasing and imputation in large populations. These methods tend to be more accurate than existing heuristics in populations where pedigree information is not available (e.g. cattle, or in developing countries), but historically were too computationally expensive to be used in routine breeding applications.
To test the performance of these hidden Markov model-based imputation methods, we evaluated the accuracy and computational cost of several methods in a series of simulated populations and a real animal population. First, we tested single-step (diploid) imputation, which performs both phasing and imputation. Second, we tested pre-phasing followed by haploid imputation. Overall, we used four available diploid imputation methods (fastPHASE, Beagle v4.0, IMPUTE2, and MaCH), three phasing methods, (SHAPEIT2, HAPI-UR, and Eagle2), and three haploid imputation methods (IMPUTE2, Beagle v4.1, and Minimac3).
We found that performing pre-phasing and haploid imputation was faster and more accurate than diploid imputation. In particular, among all the methods tested, pre-phasing with Eagle2 or HAPI-UR and imputing with Minimac3 or IMPUTE2 gave the highest accuracies with both simulated and real data.
The results of this study suggest that hidden Markov model-based imputation algorithms are an accurate and computationally feasible approach for performing imputation without a pedigree when pre-phasing and haploid imputation are used. Of the algorithms tested, the combination of Eagle2 and Minimac3 gave the highest accuracy across the simulated and real datasets. These results may help decrease the costs of genotyping animals when pedigree information is not available.”