INTRODUCTION
Owing to reduction of genotyping cost and availability of high density single nucleotide polymorphism (SNP) panels, genome assisted selection method has become a popular selection method in animal breeding. The genomic estimated breeding values are used in genomic selection to predict the genetic merit of the candidate. Genomic selection refers to incorporation of DNA marker information, often the whole genome SNP data, to predict the genomic breeding values (genomic estimated breeding values, GEBV) used to make selection decisions. Genomic prediction is believed to provide better genetic gain for quantitative traits than could be achieved by phenotypic data alone [
1]. Selection based on genomic data can be applied to young animals without sacrificing the selection candidates, which is apparently the most important advantage of this method. Bayesian methods for GEBV have proven to be accurate and efficient for phenotypes controlling few genes with large effects; however, the high-density SNP data sets demand computation power for the parameter estimation algorithms. The accuracy of genomic prediction using different methods (genomic best linear unbiased prediction [GBLUP] and Bayesian methods) depends on the genetic structure controlling the phenotypes [
2]. Recently many reports suggested genomic selection to be better than traditional best linear unbiased prediction (BLUP) in terms of breeding value prediction accuracy [
3,
4]. One of the most important components of GEBV is the size and structure of reference population and the linkage disequilibrium between SNP and quantitative trait loci (QTL). Both of these factors affect the prediction accuracy considerably.
So far, in order to identify causative mutations and QTL controlling economic traits, genome wide association studies have been performed in many cattle breeds. There are various statistical models to detect QTL, such as single-marker regression and interval mapping. Since economic traits are likely to be influenced by many loci with small effects, whole genome SNP data will be more accurate detecting QTL than statistical models that analyse few SNP markers of candidate gene. In animal breeding, estimated breeding value (EBV) has been calculated using phenotype and pedigree data using a statistical model called BLUP and it has been one of the important criteria that animal breeders use to select genetically superior animals. Although the traditional BLUP model has been successfully used to select animals, it also has some drawbacks such long generation interval and pedigree error of breeding animals. Moreover, pedigree error can decrease reliability of EBV. In the present study we report genomic prediction for intramuscular fat and compare the accuracy of genomic breeding values using BLUP and GBLUP from different genomic relationship matrices. The accuracies thus obtained were compared to the accuracies obtained by using deterministic prediction equations given by Goddard [
5] and Daetwyler [
6].
RESULTS AND DISCUSSION
Statistics of pedigree and genomic relationship coefficients for 778 genotyped animals (706 steers and 72 bulls) are shown in
Table 2. In genomic relationship matrices (GOF, G05, and Yang), average of diagonal element was quite similar to the coefficient of the pedigree based relationship matrix (A) (
Table 2). The average minor allele frequency was 0.33. The distribution of the minor allele frequencies varied from 0 to 0.5 (
Figure 1). The average off-diagonal coefficients for GOF and Yang was less than coefficient of A, but in G05, the average off-diagonal coefficients was greater than that of A. In GOF and Yang, the average of off-diagonal coefficients equal to zero allowed a matrix with average diagonal elements equal to 1 (
Table 2). For genomic relationship matrices (GOF, G05, and Yang), variance of diagonal coefficients was greater than elements of A (
Table 2). A larger variance of G than A would be expected because genomic relationships reflect a realized relationship which is the actual gene fraction shared between individuals. However, a pedigree-based coefficient is the average expected value (
Supplementary Figure S1). In this study the range of relationships calculated by genomic data was continuously distributed from 0 to 0.5 in half-sib families while the relationship from pedigree data was not.
Additive variance of GRM was slightly higher than that of A whereas residual variance of GRM was two times higher than that of A (
Table 3). Compared to estimates obtained with A, most of the additive variance estimates using GRM in the smaller dataset were inflated. The inflation was approximately inversely proportional to the difference between the average diagonal and the off-diagonal elements of G. The range of co-variance of G (relationships between individuals) showed even continuous distributed than that of A (
Supplementary Figure S1) which allowed co-variance of G (relationships between individuals) to be larger within families because residual variance of G would be bigger than A with a shallow pedigree information (only sire side pedigree).
Estimates of breeding values of three genomic relationship matrices (GOF, G05, and Yang) for genotyped animals (n = 778) were on average similar for the three genomic relationship matrices. However, correlation between A and G, was quite low (0.28 to 0.45) for genotype bulls with only genotypes and no phenotypes (
Table 4). However, a high Correlation among G was observed for intramuscular fat traits in Hanwoo cattle. No drastic difference between estimated breeding values between the three different GRM’s was observed.
Only Yang’s [
8] genomic relationship matrix showed a high correlation between pedigree-based EBV and Genomic-based EBV in this study (
Table 4). Statistics for breeding values with three GRM’s and A for genotyped steers (n = 706, genotype, phenotypes, and pedigree) are tabulated in
Table 5 and
6. The means of A and G05, GOF, and Yang were −0.018, −0.026, 0.028, and 0.03 respectively. However, variance of breeding value calculated by A was much larger than GRM (
Tables 5,
6). The scale of breeding value calculated by A would be a larger range than that of GRM. Statistics on computed breeding values for genotyped bulls (n = 72, only genotypes and no phenotypes) also showed that smaller value for mean and bigger variance between A and GRM. Estimates of accuracy for genotyped bulls (n = 72) calculated using prediction error variance with different genomic matrices are in
Table 7. On average, the accuracy of breeding value for A was 13% and for three GRM’s (GOF, G05, and Yang) was 0.37, 0.45, and 0.38, respectively. It showed that the accuracy of GRM was 1.5 times higher than A. The accuracy of breeding value for cross validation (n = 70) i.e. 10 sets of randomly sampled data, differed only by 2% difference between A and GRM.
The accuracy of GEBV prediction relies on many factors, for examples size of reference population, marker density, heritability of the trait, QTL effects, the extent of linkage disequilibrium (LD) between markers and the QTL, and the LD phase persistence between the reference population and the validation population [
1,
6,
12]. In terms of LD and effective population size (Ne), Li and Kim [
13] reported that the Korean cattle population has a larger effective population size (Ne = 600). Therefore, closer relationship between reference and validation set showed higher accuracy of GEBV (
Table 7). In this study, the GEBV of genotyped bulls showed higher accuracy than the randomly sampled validation set because of their close relatedness.
In Australian cattle populations, Bolorma [
14] investigated an accuracy between molecular breeding value estimated using a panel of 14 SNPs and real intramuscular fat (IMF) phenotypes. Molecular breeding value estimated 14 SNPs explained 5.6% and 15.6% of the phenotypic and genetic variance of IMF, respectively. This study tells us genome-wide associate study cannot capture most of genetic and phenotypic variation. Therefore, genomic BLUP would capture more variation from the quantitative traits such as IMF with polygenic characteristics.
Heritability estimated using the genotype relationship matrix was 0.55 for IMF which is similar to marbling score in Hanwoo [
15]. The accuracy of GEBV for genotyped bulls was 0.37 (GOF), 0.45 (G05), and 0.38 (Yang) in Hanwoo using the 50K SNP panel, whereas the accuracy of normal BLUP was 0.13 (A).
A similar study has been performed by Forni [
8] in a pig population where there was no large difference in accuracy between GRM and A. The accuracy between A and GRM was from 0.791 to 0.799 and scale of estimated breeding value and genomic breeding value was also similar. The differences between Forni’s work and this study would be a size of reference population (three times bigger reference population size) and use of deep pedigree (n = 338,346). As the constitution and size of the reference dataset and the methods used to predict the breeding value are a major factor in achieving accuracy in breeding strategies [
12], in this study very limited reference population (n = 706) and genotyped bulls (n = 72) were used to estimated GEBV, hence a considerable difference was observed in accuracy between A and G. We found that three different GRM were very constant in statistics on GRM and scale of genomic breeding value.