Association of UDP-galactose-4-epimerase with milk protein concentration in the Chinese Holstein population

Objective An initial RNA-Sequencing study revealed that UDP-galactose-4-epimerase (GALE) was one of the most promising candidates for milk protein concentration in Chinese Holstein cattle. This enzyme catalyzes the interconversion of UDP-galactose and UDP-glucose, an important step in galactose catabolism. To further validate the genetic effect of GALE on milk protein traits, genetic variations were identified, and genotypes-phenotypes associations were performed. Methods The entire coding region and the 5′-regulatory region (5′-UTR) of GALE were re-sequenced using pooled DNA of 17 unrelated sires. Association studies for five milk production traits were performed using a mixed linear animal model with a population encompassing 1,027 Chinese Holstein cows. Results A total of three variants in GALE were identified, including two novel variants (g.2114 A>G and g.2037 G>A) in the 5′-UTR and one previously reported variant (g.3836 G>C) in an intron. All three single nucleotide polymorphisms (SNPs) were associated with milk yield (p<0.0001), fat yield (p = 0.0006 to <0.0001), protein yield (p = 0.0232 to <0.0001) and protein percentage (p<0.0001), while no significant associations were detected between the SNPs and fat percentage. A strong linkage disequilibrium (D’ = 0.96 to 1.00) was observed among all three SNPs, and a 5 Kb haplotype block involving three main haplotypes with GAG, AGC, and AGG was formed. The results of haplotype association analyses were consistent with the results of single locus association analysis (p<0.0001). The phenotypic variance ratio above 3.00% was observed for milk protein yield that was explained by SNP-g.3836G >C. Conclusion Overall, our findings provided new insights into the polymorphic variations in bovine GALE gene and their associations with milk protein concentration. The data indicate their potential uses for marker-assisted breeding or genetic selection schemes.


INTRODUCTION
Milk proteins are important nutrients and milk protein concentration serves as valuable index for milk quality. Dairy industry concerns have driven increasing efforts to improve milk protein concentration [1]. With the development of genomics, bioinformatics and statistical genetics, a single gene or chromosome segments affecting important economic traits can be analyzed [2]. It is possible to improve milk protein concentration through marker assisted selection (MAS) or genomic selection schemes, the challenge, however, is to identify key genes or causal variations affecting milk protein traits [3][4][5][6]. Our previously published research has identified that UDP-galactose-4-epimerase (GALE) was a strong candidate gene for milk protein traits due to its large differential expression (Log 2 fold-change = -0.74, q-value = 4.41E-03) in mammary tissues of cows with high and low milk protein percentage [7]. In addition, strong interactions were also observed between GALE and several other genes such as lactalbumin, alpha (LALBA), beta 1,4galactosyltransferase, polypeptide 1 (B4GALT1), and UDPglucose 6-dehydrogenase (UGDH) that play important roles in milk composition synthesis (Supplementary Figure S1) [8][9][10]. Therefore, based on the biological function and transcriptional effects on milk protein traits, the current study mined to screen the full-length coding regions of the GALE gene for single nucleotide polymorphisms (SNPs) and to evaluate the genetic effects of polymorphisms on milk production traits in a large Chinese Holstein population.

Ethics statement
Animal handling and sample collection procedures were performed in accordance with protocols approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University (Permit Number: DK996).

Genetic sampling
A total of 1,027 Chinese Holstein cows and their 17 corresponding sires were considered as the study population. Cows were selected from 17 farms in the Beijing Sanyuan Lvhe Dairy Farm Center, where routine standard performance test, i.e. Dairy Herd Improvement system (DHI) has been implemented since 1999. The phenotype observations for all individuals for five milk production traits (305 d milk yield, 305 d protein yield, 305 d fat yield, average 305 d protein percentage and average 305 d fat percentage) were collected for subsequent analyses via the complete DHI data from the Chinese dairy cattle population.

Single nucleotide polymorphism identification and genotyping
Blood samples were collected from 1,027 cows via coccygeal vein to isolate genomic DNA using DP (318) Blood DNA kits (TianGen, Beijing, China). Genomic DNA were also isolated from frozen semen of 17 sires using standard phenol-chloroform procedures and were pooled with 50 ng/μL DNA of each individual to identify variants of the GALE gene. GALE gene is 4,591 bp in length located at BTA2, contains 11 exons and 10 introns encoding 348 amino acids. All exons and their adjacent intronic sequences were targeted for selective amplification by polymerase chain reaction (PCR). A total of 12 pairs of nucleotide primers (Supplementary Table S1) targeting the regions of interest were designed using Primer3 based on the genomic sequence of the bovine GALE gene referring to Bos_taurus_UMD_3.1 assembly (NCBI reference sequence: AC_000159.1).
The PCR amplification was performed in a total volume of 25 μL containing 50 to 100 ng of genomic DNA, 0.5 μL of each primer, 2.5 mM of dNTP mix, 2.5 μL of 10× PCR buffer, and 1 U of Taq DNA Polymerase (Takara Biotechnology Co., Ltd, Dalian, China). The PCR reaction conditions included a pre-denaturation at 95°C for 5 min, followed by 34 cycles of 94°C for 30 s, annealing from 46°C to 56°C for 30 s, 72°C for 30 s, and a final extension at 72°C for 10 min. The PCR products were directly sequenced using the ABI3730xl DNA analyzer (Applied Biosystems, Foster City, CA, USA), and the sequences aligned to the bovine reference sequence (UMD3.1.1) using BLAST (http://blast.ncbi.nlm.nih.gov/ Blast.cgi) to identify potential SNPs.
The details of novel SNPs that were identified in the present study were submitted to dbSNP (http://www.ncbi.nlm. nih.gov/SNP/) and are publicly available (accession numbers from ss1996900612 to ss1996900613). All identified SNPs for subsequent genotyping in the 1,027 Chinese Holstein cows were performed with matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS, Squenom MassARRAY, Bioyong Technologies Inc. Hong Kong, China) assay.

Linkage disequilibrium analysis
Haploview [11] was used to measure pairwise linkage disequilibrium (LD) for all identified SNPs within GALE. Briefly, missing genotypes were first imputed for each individual using the Beagle3.2 software program [12]. Subsequently, the LD blocks were generated with the subject genotype data using the LD coefficient (D') [13]. A haplotype with a frequency >5% was considered as a distinguishable haplotype, while the haplotypes with relative frequency <5% were pooled into a single group. Haplotype blocks within these SNPs were used to test their associations with milk production traits.

Association analyses
A goodness-of-fit test (Chi-square) was applied to compare the numbers of expected and observed genotypes to test Hardy-Weinberg equilibrium for each identified SNP, 0.05 as the significant threshold value. Association analyses were conducted to estimate the effects of GALE variants on milk production traits based on both single SNP genotypes and the haplotype blocks. The effects of single SNP or haplotypes in GALE on the five milk production traits were analyzed with the mixed procedure of SAS9.3 software (SAS Institute Inc., Cary, NC, USA) using the following mixed linear animal model: y ijklmn = μ+F i +YS j +P k +b×M+G l +α m +e ijklmn where, y ijklmn was the phenotypic value of each trait for each cow (n = 1,027 for each trait); μ was the overall mean; F i was www.ajas.info

1727
Li et al (2020) Asian-Australas J Anim Sci 33:1725-1731 the fixed effect of farm; YS j was the fixed effect of year-season; P k was the fixed effect of parity; M was the covariate effect of calving month; b was the regression coefficient of M; G l was the fixed effect corresponding to the genotype of polymorphisms or haplotype (genotypes of SNPs were modelled as 0-1-2, haplotypes were modelled as 0-5); α m was the random polygenic effect, distributed as N (0, Aσ a 2 ), with the additive genetic relationship matrix A and the additive genetic variance σ a 2 ; A-matrix was constructed by tracing the pedigree back to three generations of 2,312 involved individuals; and e ijklmn was the random residual, distributed as N (0, Iσ e 2 ), with identity matrix I and residual error variance σ e 2 .
For single SNP and haplotype analyses, the Bonferroni method was adopted to correct for multiple-testing according to the number of SNP loci or haplotype blocks. Associations were considered as significant if a raw p value <0.05/N, where N was the number of SNP loci or haplotype blocks tested in analyses. The additive (a), dominance (d), and allele substitution (α) effects were estimated using the equation from Falconer and Mackay [14], i.e. a = (AA-BB)⁄2, d = AB-(AA+ BB)⁄2, and α = a+d(q-p), where AA and BB represented the two homozygous genotypes, AB was heterozygous genotype, and p and q were the allele frequencies of corresponding loci.
The effect of a SNP on a specific trait was measured as the proportion of phenotypic variance of the trait explained by the SNP. The proportion of variance explained by a SNP was calculated as 6 was the fixed effect of parity; M was the covariate effect of calving month; b was the regression coefficient of M; was the fixed effect corresponding to the genotype of polymorphisms or haplotype (genotypes of SNPs were modelled as 0-1-2, haplotypes were modelled as 0-5); α m was the random polygenic effect, distributed as N (0, Aσa 2 ), with the additive genetic relationship matrix A and the additive genetic variance σa 2 ; A-matrix was constructed by tracing the pedigree back to three generations of 2,312 involved individuals; and e ijklmn was the random residual, distributed as N (0, Iσe 2 ), with identity matrix I and residual error variance σe 2 .
For single SNP and haplotype analyses, the Bonferroni method was adopted to correct for

RESULTS
A total of three SNPs were discovered in the GALE gene of which two (g.2114A>G and , where p was the allele frequency of SNP, α was the average effect of gene substitution calculated based on the linear mixed model, and ree SNPs were discovered in the GALE gene of which two (g.2114A>G and was the estimate of phenotypic variance using the complete DHI data of the Chinese dairy cattle population.

RESULTS
A total of three SNPs were discovered in the GALE gene of which two (g.2114A>G and g.2037G>A) in the 5'-UTR are novel (ss1996900612 and ss1996900613). The other SNP (g.3836G>C) previously reported was located in the intronic region (rs211659075) ( Table 1). All three SNPs were in Hardy-Weinberg equilibrium (p>0.05, Table 2).
The association results between the identified SNPs in GALE and five milk production traits are presented in Table  3. All three SNPs (g.2114A>G, g.2037G>A and g.3836G>C) were highly associated with milk yield (p<0.0001), fat yield (p = 0.0006 to <0.0001), protein yield (p = 0.0232 to <0.0001), and protein percentage (p<0.0001). No significant associations were observed between the SNPs and milk fat percentage. Greater than 1% of phenotypic variation accounted for by the three SNPs was detected in six significant SNP-trait pairs. Within these pairs, the pairs of g.3836G>C and milk yield, g.3836G>C and milk protein yield and g.3836G>C and milk protein percentage accounted for up to 2.61%, 3.00%, and 1.08% of phenotypic variation, respectively. In addition, significant additive effects, dominant effects and allele substitution effects were observed for the significant related traits ( Table  4).

DISCUSSION
Polymorphisms located in the promoter of a gene may affect transcription by altering transcription factor binding sites or RNA stability [15], which indicated the importance of the two novel SNPs (g.2114A>G and g.2037G>A) identified in the 5'-UTR of GALE. The intronic SNP (g.3836G>C) may have a potential regulatory effect on gene expression, regulation, transcription and mRNA splicing, although it does not hold a sequence encoding a protein [16][17][18][19]. The greater expression of GALE in mammary tissues of cows with high versus low milk protein percentage [7] agreed with such effect. To our knowledge, this was the first evidence showing significant associations of the GALE gene with milk protein traits in dairy cattle. From a statistical standpoint, the single SNP association analysis was less powerful than multiple SNPs analysis due to the lack of simultaneous use of multiple SNPs information [20,21]. Thus, the haplotype-based association analysis was further performed to investigate the association of GALE variants with milk production traits in the present study. We observed that the three identified SNPs were associated with milk yield and milk protein traits, which was further con-  Protein GALE is UDP-galactose-4-epimerase, which catalyzes the interconversion of UDP-galactose and UDP-glucose in the final step of the Leloir pathway [22,23], and catalyzes the epimerization of UDP-N-acetylglucosamine to UDP-N-acetylgalactosamine [24,25]. GALE plays critical roles in dietary galactose metabolism, endogenous galactose production, and glycoprotein and glycolipid biosynthesis [26,27]. String interaction network (https://string-db.org/network/ 9606.ENSP00000363621) revealed that GALE protein interacts with LALBA, UDP-Gal: betaGlcNAc B4GALT1, and UGDH. Among this list, LALBA is the major component of milk protein and a subunit of lactose synthase. As one of the well-studied glycosyltransferases, B4GALT1 is responsible for the synthesis of complex-type N-liked oligosaccharides in many glycoproteins [8]. The association of polymorphisms of the B4GALT1 with milk production traits in Holstein cows has been reported in previously published research [9]. The UGDH gene was suggested to be associated with milk yield and milk composition [10]. Taken altogether, the influences of GALE on milk production and composition are likely due to the interaction of GALE with those known genes.

CONCLUSION
Results in the present study demonstrated the significant genetic effects of GALE on milk protein traits, which is in close agreement with our previous RNA-Seq study. Results also confirmed the phenotypes of milk protein were directly af-  Figure 1.   1) < 0.0001 < 0.0001 0.0807 < 0.0001 < 0.0001 GALE, UDP-galactose-4-epimerase; LSM, least square mean; SE, standard error. 1) p-value refers to the results of association analysis between each haplotype and milk production traits. Different letter superscripts indicate significant differences among the haplotypes (p < 0.01). H1, H2, and H3 represented the types of haplotypes, of these, H1 = GAG, H2 = AGC, H3 = AGG.