We recently published a preprint for how to use high-throughput phenotypes to infer an individuals genotype and enable genomic selection. Using genomic selection is key for increasing genetic gain in breeding programs by more accurately evaluating which individuals to select. However genotyping is expensive. To help make genomic selection more cost effective, AlphaGenes has been researching different ways to decrease genotyping costs over the last 7 years.
One way to decrease genotyping costs is to use genomic imputation. Under this approach, most of the individuals in the population are genotyped using a low-cost and low-density genotyping array, and only a small number of individuals are genotyped using a higher-cost high-density array. We can then use statistical regularities between the low and high-density individuals to impute (or fill in) the ungenotyped markers. The key question is how to balance genotyping costs (by using lower-density marker panels) while maintaining selection accuracy.
This work lies at one end of the spectrum where the goal is to obtain moderate accuracy imputation at very low-costs. For the past few years, we have consistently found that in structured populations(e.g., bi-parental crosses or full-sib families), moderate imputation accuracies can be obtained using 1-5 markers per chromosome. Can we push marker densities even lower? Do we even need genotypes? What if we just had high-throughput phenotypes?
High-throughput phenotypes encapsulate a range of sensor technologies which can be used to non-invasively evaluate phenotypes on large numbers of individuals. Examples include things such as flying a drone with an infrared camera over a field, or sending an animal through a suite a sensors. The advantage of high-throughput phenotypes is that they can be collected at very low costs.
This paper demonstrates (in simulation) that it may be possible to use high-throughput phenotypes as a stand-in for genetic markers to infer genotypes. In order to get moderate imputation accuracies (~0.5), there need to be ~100 quantitative phenotypes measured (traits such as yield, or spectrometry data) each with a heritability of 0.5.
Here’s some intuition for how it works. Imagine you have a bi-parental cross or full-sib family. We assume the parents are genotyped (and phased if they are outbred). Our goal is to impute an offspring based on the high-throughput phenotypes. To do this, (1) we simulate a large number of putative offspring, (2) calculate estimated genetic values for each offspring for each of the high-throughput phenotypes, then (3) impute the offspring by finding simulated genotypes that produce estimated genetic values close to the observed phenotypes. The large possible number of offspring genotypes, means that we cannot simulate all of them by direct simulation so we actually use a sampling approach that does a guided search to find likely haplotypes.
The paper presents just a proof of concept of this idea. We have not tested it on real data, and the number of phenotypes and the heritability of the phenotypes is likely outside of what we could easily (and cheaply) collect. But this idea has a lot of potential: maybe in the future a breeder could fly a drone over a field, or send a pig through a sensor and get very low cost (albeit low-accuracy) genotypes on hundreds or thousands of individuals.