The effectiveness of genomic selection for milk production traits of Holstein dairy cattle

Objective This study was conducted to test the efficiency of genomic selection for milk production traits in a Korean Holstein cattle population. Methods A total of 506,481 milk production records from 293,855 animals (2,090 heads with single nucleotide polymorphism information) were used to estimate breeding value by single step best linear unbiased prediction. Results The heritability estimates for milk, fat, and protein yields in the first parity were 0.28, 0.26, and 0.23, respectively. As the parity increased, the heritability decreased for all milk production traits. The estimated generation intervals of sire for the production of bulls (LSB) and that for the production of cows (LSC) were 7.9 and 8.1 years, respectively, and the estimated generation intervals of dams for the production of bulls (LDB) and cows (LDC) were 4.9 and 4.2 years, respectively. In the overall data set, the reliability of genomic estimated breeding value (GEBV) increased by 9% on average over that of estimated breeding value (EBV), and increased by 7% in cows with test records, about 4% in bulls with progeny records, and 13% in heifers without test records. The difference in the reliability between GEBV and EBV was especially significant for the data from young bulls, i.e. 17% on average for milk (39% vs 22%), fat (39% vs 22%), and protein (37% vs 22%) yields, respectively. When selected for the milk yield using GEBV, the genetic gain increased about 7.1% over the gain with the EBV in the cows with test records, and by 2.9% in bulls with progeny records, while the genetic gain increased by about 24.2% in heifers without test records and by 35% in young bulls without progeny records. Conclusion More genetic gains can be expected through the use of GEBV than EBV, and genomic selection was more effective in the selection of young bulls and heifers without test records.


INTRODUCTION
With the development of DNA analysis technology and the reduced cost of single nucleotide polymorphism (SNP) chip analysis, a lot of research has been conducted on the genomic selection of dairy cattle [1][2][3][4][5][6][7]. Gengler et al [8] proposed an algorithm that could predict genomic information about individuals without genomic information, and VanRaden [9] developed methods to calculate the genomic relationship matrix and to estimate the genomic estimated breeding value (GEBV). Misztal et al [10] later proposed a new algorithm by combining existing pedigree information and genomic information. Recently, Liu et al [11] developed the SNP single-step genomic model and presented methods for estimating effects of SNPs directly from the analysis model. The use of the GEBV of dairy cattle was formalized from August, 2010 in Germany, and young bulls without daughters can be selected as 'proven' bulls, while those that have been selected by GEBV are called 'genomic' bulls [12].
In the USA and Canada, young bulls have been evaluated using genomic information since 2009. For individuals evaluated in these countries without phenotypic information, it has been reported that the GEBV estimated using genomic information was more reliable than the estimated breeding value (EBV) estimated from the conventional best linear unbiased prediction (CBLUP) method [13]. In Japan, a reference population of about 4,000 young bulls was established and genomic information has since been applied in the juvenile selection of young bulls and heifers in the Japanese population [14].
Schaeffer [15] proposed the multiple-trait across country evaluation (MACE) project, in which 35 countries, including the Republic of Korea, are now participating [12]. Sullivan and VanRaden [16] proposed the genomic MACE (GMACE) project, which uses genomic information in the evaluation of cattle and has been in operation since 2014. Korea is currently establishing the reference populations and accumulating the genomic data it needs to participate in GMACE. The purpose of the present study was to test the efficiency of genomic selection for milk production traits in the domestic population of dairy cattle in Korea.

MATERIALS AND METHODS
Single nucleotide polymorphism data A total of 2,090 head of cattle, consisting of both bulls (507 head) and cows (1,583 head), were genotyped using a Bovine SNP50k chip (Illumina, San Diego, CA, USA), through which 50,908 SNPs were identified. To ensure the quality of the genotypic data obtained, SNPs were excluded from analyses if they were found on the sex chromosomes, lacked chromosomal information, had missing rates with higher than 10%, lacked polymorphism (all homo-or heterozygous), had a minor allele frequency less than 1%, or were found with a chisquared value of the Hard-Weinberg disequilibrium greater than 23.9 (p<1.0×10 -6 ). Animals with SNP missing rates greater than 10% were also excluded from analyses. After the quality control tests, 2,007 individuals and 41,837 SNPs were used in the following analysis (Supplementary Table S1).

Milk production data
Based on the test records for the dairy cows calved from 2002 to 2016, individuals were excluded from analyses if their records exceeded the following bounds: 305-days milk yield outside the range of 2,500 to 16,000 kg, 305-days fat yield outside the range of 70 to 600 kg, 305-days protein yield outside the range of 80 to 500 kg, for cows exceeding third parity. Additionally, data from cows were not used for analyses for whom less than 5 records were recorded within one herd-year-season (HYS), or whose calving ages were outside the range of 17 to 31 months in the first parity, 31 to 45 months in the second parity, or 45 to 59 months in the third parity. These eliminations were due to the potential outliers or ambiguous parity. Therefore, a total of 506,481 milk production records from 293,855 animals were used for the final analyses (Supplementary Table S2).

Statistical model
The HYS and parity×month of age at calving (PA) were included as fixed effects in a statistical analysis that used the following model: Where y i = n×1 vector of observation in the ith parity, b i = p×1 vector of the fixed effect, a i = q×1 vector of the additive random genetic effect, e i = n×1 vector of the residual effect, and X i (n×p), Z i (n×q), and W i (n×q) were known incidence matrices corresponding to b i , and a i , respectively. The total numbers of HYS, PA, and animals within pedigree values included in the analysis using this model, were 62,287, 75, and 384,406 head, respectively. Since there were no observed values comparable each parity by trait value, the value of the covariance matrix was set equal to zero in the matrix of the error variance and covariance shown below: The GEBV was estimated using Single- The GEBV was estimated using Single-Step genomic best linear unbiased prediction which integrates the genomically derived relationships with pedigree relationships [10].
The mixed model equation (MME) used in further analyses was as follows: The GEBV was estimated using Single-Step genomic best linear unbiased pr 139 integrates the genomically derived relationships with pedigree relationships [10].

140
The mixed model equation (MME) used in further analyses was as follows: The GEBV was estimated using Single-Step genomic best linear unbiased prediction which 139 integrates the genomically derived relationships with pedigree relationships [10].

140
The mixed model equation (MME) used in further analyses was as follows:  (Table 1).

166
Genetic parameters 167 The estimated heritability of milk yield by parity in the first, second, and third parity were 0.28, 0.20, 168 and 0.16, respectively, while that for fat yield were 0.26, 0.23, and 0.20, and that for protein yield were 169 0.23, 0.18, and 0.15, respectively (Table 2). ted using the prediction error variance (PEV) value by the following EBV and GEBV values were estimated using the BLUPF90 family dairy cattle examined, the estimated generation intervals of sire for the at for the production of cows (LSC) were 7.9 and 8.1 years, respectively, tervals of dams for the production of bulls (LDB) and cows (LDC) were (Table 1).
ilk yield by parity in the first, second, and third parity were 0.28, 0.20, at for fat yield were 0.26, 0.23, and 0.20, and that for protein yield were ly ( Table 2).

genomic estimated breeding value
= the inverse matrix of the numerator relationship matrix of dairy cattle with genomic information. The reliability (r 2 ) of breeding value was calculated using the prediction error variance (PEV) value by the following formula:  (Table 1).  (Table 2).  (Table 1).

166
Genetic parameters 167 The estimated heritability of milk yield by parity in the first, second, and third parity were 0. 28 Variance components, and EBV and GEBV values were estimated using the BLUPF90 family program [17].

Generation interval
In the domestic population of dairy cattle examined, the estimated generation intervals of sire for the production of bulls (L SB ) and that for the production of cows (L SC ) were 7.9 and 8.1 years, respectively, and the estimated generation intervals of dams for the production of bulls (L DB ) and cows (L DC ) were 4.9 and 4.2 years, respectively (Table 1).

Genetic parameters
The estimated heritability of milk yield by parity in the first, second, and third parity were 0.28, 0.20, and 0.16, respectively, while that for fat yield were 0.26, 0.23, and 0.20, and that for protein yield were 0.23, 0.18, and 0.15, respectively ( Table 2).

Estimated breeding value and genomic estimated breeding value
The overall regression coefficient estimates between EBV and GEBV for all milk production data analyzed were 0.9075 for milk, 0.9202 for fat, and 0.9012 for protein yields. The regression coefficient estimates between EBV for GEBV for the cows with test records and bulls with progeny records were in the ranges of 0.9210 to 0.9511 and 0.9378 to 0.9519, respectively, while those for bulls without progeny records were the lowest and in the range of 0.5348 to 0.6047 (Supplementary Table  S3).

Reliability
When genomic information was used, the reliability of trait selection increased by 9% on average in the overall data set when compared to the method using only pedigree information; the reliability was similarly increased by using genetic information by 7% for cows with test records, 4% for bulls with progeny records, 13% for heifers without test records, and 17% for young bulls without progeny records (Table 3).

Genetic gain
When selected using genetic information, the genetic gains in milk yield for the cows with test records increased by about 7.1%, over the gains achieved with CBLUP methods, and gains similarly increased by about 2.9% for bulls with progeny records, 24.2% for heifers without test records, and 35% for bulls without progeny records (Table 4). Compared with the CBLUP method, the genetic gains in fat yield were increased by about 7.7% in cows with test records and 2.7% in bulls with progeny   Table S5).

DISCUSSION
The heritability estimates for milk, fat, and protein yields in the first parity in this study were 0.28, 0.26, and 0.23, respectively; these results have been reported to Interbull. As the parity increased, the heritability decreased for all milk production traits. The genetic correlation coefficients among parities for milk, fat and protein yields were in the range of 0.85 to 0.99, while the phenotypic correlation coefficients among parities were lower than the genetic correlation coefficients and in the range of 0.42 to 0.52. Similar results to these were previously reported in other countries [18,19].
In the Korean dairy cattle population examined, the estimated L SB , L SC , L DB , and L DC were 7.9, 8.1, 4.9, and 4.2 years, respectively. For the Holstein population in the USA, the gen- Table 3. Reliabilities on GEBVs (EBVs) and standard deviations of animals with SNP information for milk production traits (kg) in each group   eration intervals for the L SB , and L SC reported in 2010 (before genomic selection was applied) were about 7 years, and those for the L DB and L DC were about 4 years. After genomic selection had been applied for 5 years, the generation intervals for the L SB , L SC , L DB , and L DC were reported to decrease to about 3, 5, 3, and 3.6 years, respectively [5]. In Canada, the average generation interval for Holstein cattle was 7 years in the 1970s, and since then it has decreased to about 5.8 years. For L DC it was 4.2 years and remained stable around this value [20].
When the regression coefficients between GEBV estimated from the single-step best linear unbiased prediction (ssBLUP) method and EBV estimated from the CBLUP method were compared, the coefficients estimated for young bulls without progeny records were the lowest, and in the range of 0.54 to 0.61. It can thus be concluded that genomic selection was more efficient in heifers and in young bulls without test records [21].
The reliability of GEBV was higher than that of EBV, especially for animals without phenotypic data. These results agreed with those of Forni et al [22], who reported that the accuracy of selection was increased by using genomic information compared with that using only pedigree information. In this study it was found that selection was relatively more accurate in young bulls and heifers without phenotypic data, and the accuracy of selection increased even more when genomic information was used.
The reason for the increased accuracy resulting from using genomic information might be due to the fact that when doing this the pedigree coefficient matrix used in the CBLUP method was replaced by a genomic relationship matrix, which was derived from the genotype similarity calculated for all markers and considering Mendelian sampling [23,24]. In the overall data set, the reliability of GEBV increased by 9% on average over that of EBV, and increased by 7% in cows with test records, about 4% in bulls with progeny records, and 13% in heifers without test records. The difference in the reliability between GEBV and EBV was especially great for data from young bulls, as this increased by 17% on average for milk (39% vs 22%), fat (39% vs 22%), and protein (37% vs 22%) yields. Similar results were obtained by VanRaden et al [13] who reported that in the USA's Holstein population combined genomic predictions had realized reliabilities that were 23% greater than reliabilities of parent averages (50% vs 27%) when averaged across all traits. These results suggested that genomic selection was more effective in the selection of young bulls and heifers without test records [21].
In other studies that compared the reliability of genomic and conventional selection methods for the estimation of breeding values, the reliability of GEBV was comparable to that of either the parent average or the pedigree index method [25][26][27]. These types of comparisons are possible since the reliability of genomic selection is very high for the selection of young bulls without test records for their daughters. Com-pared with conventional selection methods, genomic selection can accelerate the improvement of animals, since the reliability of genomic selection is relatively high and it can be used to reduce generation intervals. Therefore, genomic selection can be efficiently used for the juvenile selection of dairy cattle.
For the selection of proven bulls in Korea, first about 40 head of young bulls are selected and then 2 of them are further selected on the basis of the progeny test records from 20 of their daughter heifers. For the selection of young heifers, pedigree information is used. Therefore, the selection rate of young bulls is 5% and the selection intensity (i) is 2.06, while for young heifers the selection rate is 90% (9 out of 10) and the selection intensity (i) is 0.20 [28].
When selected for the milk yield using GEBV, the genetic gain increased in this study by about 7.1% over the gain with the EBV method in cows with test records, and by 2.9% in bulls with progeny records, while it increased by about 24.2% in heifers without test records and by 35% in young bulls without progeny records. Therefore, the application of genomic selection to gene introgression can help to speed up the process of introgression of a gene while simultaneously increasing the genetic gain [3].
Since the selection intensity actively used in the domestic population in Korea was applied in the present study, more genetic gains to this population can be expected through the use of genomic selection, since more young bulls and heifers can be selected to improve desirable traits.
Wiggans et al [26] reported that during the genomic selection of cattle conducted in 2011 in the USA, the reliability of the selection of milk yield increased by 34.0% over the parent average, and that of fat and protein yields increased by 33.8% and 24.9%, respectively, indicating that reliabilities can be increased even more than those we obtained in our study. The smaller improvements we found might have been due to the relatively very small reference population we used [29,30]. When genomic selection is applied in the selection of dairy cattle in the domestic population, the size of the reference population will increase continuously and potentially result in greater improvements, but this will take time.
Therefore, through the participation of Korea in international genetic performance evaluation programs using genomic information, or by sharing data with overseas dairy cattle populations related to the genetic resources of domestic dairy cattle populations, the improvement of dairy cattle can be facilitated. Also, the efficiency of data utilization should be increased and the introduction of new technologies should be accelerated in Korea to facilitate dairy cattle improvement.