Animals, management, and feeding
The dataset comprised 131 bulls with phenotypic records for VOL (n = 13,533), NS (n = 12,773), and MOT (n = 12,660) obtained from the Semen Production and Dairy Genetic Evaluation Center of the Dairy Farming Promotion Organization of Thailand (DPO). These bulls were the offspring of 62 sires and 112 dams. The sires of the bulls were associated with the Semen Production and Dairy Genetic Evaluation Center of the DPO, while the dams were from 87 dairy farms situated across the Central, Northeastern, Northern, and Southern regions of Thailand. The bull population consisted of purebred Holstein (H) individuals as well as H crossbred animals with various fractions of Brahman, Brown Swiss, Jersey, Red Dane, Red Sindhi, Sahiwal, and Thai Native [
1]. Crossbred bulls accounted for 95% of the population, with most of them having a predominant H fraction and smaller fractions of other breeds. Further, all bulls in the population had H fractions ranging from 62.5% to 100%, with an average of 92%. The pedigree file encompassed a total of 304 animals, including bulls, sires, and dams.
The bulls were housed in open-barn stalls throughout the study period, except during semen collection, and were provided unrestricted access to mineral supplements, water, and fresh roughage. Concentrate feed (Charoen Pokphand Foods, Bangkok, Thailand) containing 16% crude protein, 2% fat, 14% fiber, and 13% moisture was administered to the bulls once daily. Fresh roughage comprised Guinea grass (Panicum maximum), Ruzi grass (Brachiaria ruziziensis), Napier grass (Pennisetum purpureum), and Para grass (Brachiaria mutica) harvested and transported to the bull stalls. Additionally, Guinea and Ruzi grass hay and silage were provided to the bulls during the dry season (November to June) when fresh grass was scarce.
Genotypic data
The genotypic data came from semen samples collected from 61 out of the 131 bulls with available phenotypic records. Additionally, blood samples were obtained from 11 dams of sires [
14]. Genomic DNA was extracted from frozen semen samples using the GenElute Mammalian Genomic DNA Miniprep kit (Sigma, Ronkonkoma, NY, USA) and from blood samples using the MasterPure DNA Purification kit (Epicentre Biotechnologies, Madison, WI, USA). The quality of the DNA samples was assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA). DNA samples with a concentration of 15 ng/μL and an absorbance ratio of 1.8 at 260/280 nm were sent to GeneSeek for genotyping using GeneSeek Genomic Profiler (GGP) chips (GeneSeek Inc., Lincoln, NE, USA). Specifically, the 61 bulls were genotyped using GGP80K (76,694 SNP), whereas the 11 dams were genotyped using GGP9K (8,590 SNP). Dams genotyped with GGP9K were imputed to GGP80K [
8] using version 2.2 of the FImpute program [
15]. This imputation step utilized 2,661 animals genotyped with GGP9K (1,412 cows), GGP20K (570 cows), and GGP26K (540 cows), and GGP80K (89 sires and 50 cows) in a previous research project [
14]. SNP genotypes from autosomes and the X chromosome with either a call rate lower than 0.90 or a minor allele frequency below 0.05 were excluded from the analysis. Thus, the edited genotype file had 76,519 actual and imputed SNP markers per animal. Due to limitations of the imputation program utilized for transitioning from low-density chips (GGP9K, GGP20K, and GGP26K) to a high-density chip (GGP80K), SNP markers from the Y chromosome were excluded during the construction of the input file.
Figure 1 provides an overview of the total number of SNP markers per chromosome.
Genome-wide association analysis
The selection of five distinct window sizes (1, 10, 30, 50, and 100 SNPs) for both overlapping and non-overlapping windows, defined in terms of the values of SNP variances for all three semen traits, was based on the SNP variance values obtained through a GWAS (Wang et al [
16]). The estimation of SNP variances was performed using a single-step genomic best linear unbiased prediction (GBLUP) procedure [
17] implemented in program POSTGSF90 from the BLUPF90 family of programs [
18].
A 3-trait genomic-polygenic repeatability model was employed to estimate the variance and covariance components among semen traits using restricted maximum likelihood. The estimation was performed with an average information algorithm implemented in the AIREMLF90 program [
19]. Fixed effects in the model comprised contemporary group (year and month of semen collection), ejaculate order (first or second), age of the bull (months), ambient temperature (°C), and heterosis calculated as a function of heterozygosity. Heterozygosity was determined based on expected Holstein fraction in the sire × expected O fraction in the dam + expected O fraction in the sire × expected Holstein fraction in the dam, where O = other breeds (Brahman, Brown Swiss, Red Danish, Jersey, Red Sindhi, Sahiwal, and Thai Native [
1]). Random effects were animal additive genetic, permanent environment, and residual. The mean of the animal additive genetic effects, permanent environment effects, and residual was assumed to be zero. The model in matrix notation can be represented as follows:
where y represents the vector of phenotypic records (VOL, NS, and MOT), b was a vector of fixed effects, aa was a vector of random animal additive genetic effects, pp was a vector of random permanent environmental effects, and e was a vector of random residuals. The incidence matrices X, Za, and Zp related records to fixed effects in vector b, to random animal additive genetic effects in vector aa, and to random permanent environmental effects in vector pp, respectively. The mean of the animal additive genetic effects, permanent environment effects, and residual was assumed to be zero.
The variance-covariance matrix among animal additive genetic effects in the 3-trait genomic-polygenic repeatability model was defined as follows:
where H represented the genomic-polygenic relationship matrix, Va denoted a 3×3 matrix of additive genetic variances and covariances among VOL, NS, and MOT, and ⊗ was the Kronecker product. I was the identity matrix and Vpp represented a 3×3 matrix of permanent environment variances and covariances among three semen traits. Similarly, Ve was a 3×3 matrix of residual variances and covariances among semen traits.
The genomic-polygenic relationship matrix
H [
20] was calculated as follows:
where
A11 represents the matrix of additive genetic relationships among non-genotyped animals,
A12 denotes the matrix of additive relationships between non-genotyped and genotyped animals,
A22−1 is the inverse of the matrix of additive relationships among genotyped animals, and
G22 designates the matrix of genomic relationships among genotyped animals [
20]. The matrix
G22 was constructed as follows:
where
pj is the frequency of allele 2 in locus j, and
zij was defined as follows: (0–2
pj) for the homozygous genotype 11 in locus j, (1–2
pj) for the heterozygous genotypes 12 or 21 in locus j, and (2–2
pj) for the homozygous genotype 22 in locus j [
21,
17]. The matrix
G22 was constructed using default weight factors (tau = 1, alpha = 0.95, beta = 0.05, gamma = 0, delta = 0, and omega = 1) and scaled using default restrictions (mean of diagonal elements of
G22 = mean of diagonal elements of
A22 and mean of off-diagonal elements of
G22 = mean of off-diagonal elements of
A22) as defined by the PREGSF90 program of the BLUPF90 Family of Programs [
18]. The proportion of the additive genetic variance explained by overlapping and non-overlapping SNP windows for SW1, SW10, SW30, SW50, and SW100 was determined using program POSTGSF90. The percentage of additive genetic variance explained by each SNP window was computed using the following formula:
where Var(ai) represents the additive genetic variance associated with the ith SNP window. For overlapping windows, Var(ai) was equal to the sum of the variances of all the contiguous SNP markers in the ith SNP window. For non-overlapping windows, Var(ai) was equal to the sum of the variances of all the SNP markers the ith SNP window. The term σa2 represents the total additive genetic variance in the population. By applying this formula, we quantified the proportion of the additive genetic variance explained by each SNP window as a percentage of the total additive genetic variance. This measure provides information on the relative contribution of each SNP window to the total additive genetic variance of each semen trait in this study.
Identification of SNP markers, genes, and pathway analysis
SNP markers that explained a minimum of 0.001% of the additive genetic variance for the three semen traits were selected to identify genes associated with VOL, NS, and MOT in both overlapping and non-overlapping windows. Thus, the minimum percentage of the additive genetic variance explained per overlapping and non-overlapping window was 0.001% for SW1, 0.01% for SW10, 0.03% for SW30, 0.05% for SW50, and 0.1% for SW100. The base pair (bp) positions of these SNP markers were used to locate genes or nearby genes in the UMD Bos taurus 3.1 assembly of the bovine genome database from the National Center for Biotechnology Information (NCBI) with R package Map2NCBI [
22]. The pathway analysis included SNP markers that were located inside genes, within 2,500 bp, between 2,500 bp and 5,000 bp, between 5,000 bp and 25,000 bp, and more than 25,000 bp away from genes in the NCBI database.
Genes identified by the five overlapping and non-overlapping window sizes were used to identify biological pathways related to VOL, NS, and MOT. The Kyoto encyclopedia of genes and genomes database (KEGG) and the ClueGo plugin of Cytoscape [
23] were employed for this analysis. The statistical test used for pathway analysis was a two-sided hypergeometric test, and multiple testing was corrected using the Bonferroni step-down procedure [
24]. Biological pathways were considered significantly enriched or depleted for those traits if their p-values were lower than 0.05.