A genome-wide association study on growth traits of Korean commercial pig breeds using Bayesian methods

Objective This study aims to identify the significant regions and candidate genes of growth-related traits (adjusted backfat thickness [ABF], average daily gain [ADG], and days to 90 kg [DAYS90]) in Korean commercial GGP pig (Duroc, Landrace, and Yorkshire) populations. Methods A genome-wide association study (GWAS) was performed using single-nucleotide polymorphism (SNP) markers for imputation to Illumina PorcineSNP60. The BayesB method was applied to calculate thresholds for the significance of SNP markers. The identified windows were considered significant if they explained ≥1% genetic variance. Results A total of 28 window regions were related to genetic growth effects. Bayesian GWAS revealed 28 significant genetic regions including 52 informative SNPs associated with growth traits (ABF, ADG, DAYS90) in Duroc, Landrace, and Yorkshire pigs, with genetic variance ranging from 1.00% to 5.46%. Additionally, 14 candidate genes with previous functional validation were identified for these traits. Conclusion The identified SNPs within these regions hold potential value for future marker-assisted or genomic selection in pig breeding programs. Consequently, they contribute to an improved understanding of genetic architecture and our ability to genetically enhance pigs. SNPs within the identified regions could prove valuable for future marker-assisted or genomic selection in pig breeding programs.


INTRODUCTION
The pig breeding industry is comprised in several layers [1].Due to these structural characteristics of the industry, the maternal and paternal lines are separated and selected, respectively, maximizing hybrid vigor in the crosses.In addition, since breeding pigs are maintained through generations of candidate selection by performance tests, the genes of the selected individuals are transferred to the lower layer and used to produce excellent finishers [2].Therefore, the performance of breeding pigs is a factor that determines competitiveness in the pig industry.Since it takes a long time to move traits from purebreds to finishers, the ripple effect on the improvement of excellent breeding pigs is very large.In domestic Korean GGP farms, the main breeds used are the Yorkshire, Landrace, and Duroc purebreds.
In pig breeding programs, growth traits are economically important indicators of pig production performance and affect farm profits.Adjusted backfat thickness (ABF), average daily gain (ADG), and days to 90 kg body weight (DAYS90) are vital growth-related traits that are used to measure growth rate because of their significant impact on production efficiency [3].Effectively managing and improving these traits can lead to higher production efficiency, better profitability, and a more sustainable and competitive pig farming operation.Pigs that grow faster with optimal backfat thickness and reach market weight more quickly are cost-effective for producers.Therefore, improvements in growth traits are necessary in the pig industry.
With the development of DNA technology that incorporates statistical methodology, it has become possible to study quantitative trait loci (QTL) or candidate genes [4,5].Genomewide association studies (GWAS) are a powerful approach in genetics and genomics to identify genetic variants (single nucleotide polymorphisms [SNPs]) associated with specific phenotypic traits or diseases, and are widely used to detect effective causal alterations and define narrow genomic regions in contact with these alterations [6].In association analysis, DNA markers related to major genes can be identified because a SNP chip that covers the entire genome is densely integrated with SNP markers.
Bayesian methods in GWAS are invaluable when dealing with a large number of SNPs and a relatively small number of individuals [7].In addition, by specifying an appropriate prior distribution for a SNP effect, only the effect affecting the phenotype is fitted to the model, and a small SNP effect is assumed to be zero depending on its π-value (a measure of statistical significance).By setting a threshold for SNP effects, the number of false positives during analysis can be controlled.SNPs with small effects that do not pass this threshold may be considered as not having a significant impact on the phenotype.Therefore, it is advantageous to find the candidate region for GWAS because the inferences are based on joint posterior distribution, which accounts for all unknown parameters [8,9].Furthermore, it is possible to increase the efficiency of genetic improvement in livestock by increasing the accuracy of genetic ability evaluation through additional information on DNA markers with significant effects discovered during GWAS [10].
The aim of this study was to identify significant regions and candidate genes related to ABF, ADG, and DAYS90 in commercial Korean GGP pig (Duroc, Landrace, and Yorkshire) populations.This was achieved by conducting a GWAS using SNP chip platforms and the Bayesian method (BayesB with response variable).

Phenotypic data
In this study, phenotypic data were collected from purebred Duroc (38,941), Landrace (23,451), and Yorkshire (91,146) pigs raised at a GGP farm in Korea between 2005 and 2022.The ADG was obtained from the difference in final weight and initial weight divided by the number of feed intake days.The BFAT was calculated based on the average fat thickness values of the shoulder (on the fourth thoracic vertebrae), mid-back (on the last thoracic vertebrae), and loin (on the last lumbar vertebrae) measured using an amplitude mode (A-mode) ultrasound device (PIGLOG 105).The DAYS90 was estimated according to the suggestions of the Korean Swine Performance Recording Standards (KSPRS), following previously reported procedures [3].
SNP markers without map information, those existing on sex chromosomes, and those with a call rate of 0.95 or less and genomic data that overlapped were excluded from the analysis.In total, 43,861, 52,580, and 52,403 SNPs from Duroc, Landrace, and Yorkshire pigs, respectively, were used for the analysis.The collected genomic data were analyzed by imputation with a medium-density (MD) platform (Illumina Porcine SNP60Kv2) using FImuteV3 [11] for each breed (Table 1).Since the FImputeV3 program does not provide an r-square value, the imputation accuracy was estimated using leave-one-out cross validation.

Estimation of genetic parameters:
To estimate the genetic parameters, a linear animal model including additive genetic effects and fixed effects (sex, contemporary group) was applied to the multivariate model.The likelihood function logarithm (log e L) was used to determine the most appropriate models for the trait.The analysis was performed using the ASREML4.1 program [12], and the analysis model was as follows: y = Xb+Z a u a +e where y is the vector of observation (ABF, ADG, DAYS90), X and Z a are the incidence matrices for fixed and random effects, u a is the vector of the additive genetic effect, and e is the vector of the residual effect.
Response variables: Response variables for genomic analysis were estimated using estimated genetic parameters, variance components, estimated breeding values (EBV), and reliability according to the model for the genetic evaluation of each trait.Genomic analysis typically involves estimating response variables for specific traits.In this case, the response variables are related to growth traits such as ABF, ADG, and DAYS90.The estimates of genetic and residual variances and heritability for these traits are shown in Table 2.
Deregressed estimated breeding values (DEBV) including parent average were re-estimated using the EBV and reliabilities of each individual to help provide more accurate and reliable estimates of an individual's genetic merit, particularly in situations where data is limited or unreliable.In addition, the weighting factor was calculated using the following formula [10]: ase, the response variables are related to growth traits such as ABF, ADG, and genetic and residual variances and heritability for these traits are shown in Table breeding values (DEBV) including parent average were re-estimated using the each individual to help provide more accurate and reliable estimates of an particularly in situations where data is limited or unreliable.In addition, the lated using the following formula [10]: of EBVs, ℎ � is the heritability of each trait,  is the genetic variance ratio ker information assumed to be 0.4 [13].The weighting factor was then applied the heterogeneous variance resulting from the different reliabilities of each ated response variables.
riables were converted into DEBVs, GWAS was performed after removing y (reliability EBVs -reliability of parent average) of 0.1 or less.
where for specific traits.In this case, the response variables are related to growth traits such as ABF, ADG, and DAYS90.The estimates of genetic and residual variances and heritability for these traits are shown in Table 2.
Deregressed estimated breeding values (DEBV) including parent average were re-estimated using the EBV and reliabilities of each individual to help provide more accurate and reliable estimates of an individual's genetic merit, particularly in situations where data is limited or unreliable.In addition, the weighting factor was calculated using the following formula [10]: where  � � is the reliability of EBVs, ℎ � is the heritability of each trait,  is the genetic variance ratio explained by the SNP marker information assumed to be 0.4 [13].The weighting factor was then applied to the GWAS to explain the heterogeneous variance resulting from the different reliabilities of each individual from the re-estimated response variables.
Once the response variables were converted into DEBVs, GWAS was performed after removing individuals with a reliability (reliability EBVs -reliability of parent average) of 0.1 or less.
Bayesian Method for GWAS: To estimate the effects of SNP markers for GWAS analysis, BayesB with is the reliability of EBVs, h 2 is the heritability of each trait, c is the genetic variance ratio explained by the SNP marker information assumed to be 0.4 [13].The weighting factor was then applied to the GWAS to explain the heterogeneous variance resulting from the different reliabilities of each individual from the re-estimated response variables.
Once the response variables were converted into DEBVs, GWAS was performed after removing individuals with a reliability (reliability EBVs -reliability of parent average) of 0.1 or less.

Bayesian Method for GWAS:
To estimate the effects of SNP markers for GWAS analysis, BayesB with a fixed π value of 0.99 was used [14].This analysis usually employs a highdensity marker dataset which contains an extensive number of genetic loci as well as multiple samples for each locus.Therefore, a more targeted analysis was performed by focusing on identifying candidate gene regions.Parameters and effects were obtained using Gibbs sampling for a total number of 50,000 Markov chain Monte Carlo (MCMC) iterations and a sampling interval (thinning) of 5.The first 5,000 MCMC iterations were discarded for burn-in before estimating posterior means of marker effects and variances.As autocorrelation occurs within the Markov chain, it was carried out to prevent the occurrence of bias in advance, and the response variable was estimated as follows: where  � is the response variable (DEBVs or EBVs),  is the population mean,  is the number of markers, 153  is the allelic state (0, 1, 2) at marker j in individual i,  is the random substitution effect for marker j, 154 where y i is the response variable (DEBVs or EBVs), μ is the population mean, k is the number of markers, Z ij is the allelic state (0, 1, 2) at marker j in individual i, u j is the random substitution effect for marker j, and δ j indicates the presence or absence of a marker in the model (0 or 1).
In the case of the threshold, to search for a significant 1 Mb region through GWAS analysis, a region with ≥1% additive genetic variance explanatory power among the total genetic variance that can be explained by SNP markers is defined as a region with a significant effect.The model frequency statistic was estimated and used to select SNP markers with significant effects on each trait.A total of 2,451 1 Mb regions were included in the total region, and the additive dielectric dispersion ratio of each region was approximately 0.041% (100%/2,451).
The BayesB method uses the t-distribution as a prior distribution for SNP marker effects, and is sampled from the prior assumption that each SNP marker has a different variance.This process was performed using the GenSel4R program [15].
Therefore, to identify the significant 1 Mb windows, including SNPs, GWAS was performed with three traits (ABF, ADG, and DAYS90) based on BayesB.A total of 28 window regions were found, including 52 informative SNPs based on their genetic effects (Tables 3 to 5; Figure 1).
In Yorkshire pigs, the most significant window explained 1.16% of the additive genetic variance and was captured on SSC17 at 13 Mb (ASGA0089601).The NFE2L3 gene adjacent to H3GA0051040, which was the most significant region in Landrace pigs, is required for myoblast differentiation and fusion in cellular processes, autophagy, and endoplasmic reticulum [17], and it is also associated with lipids in humans [18] and pigs [19].
The MRPS22 gene, which located on SSC13 at the 80 Mb region, impairs mitochondrial mRNA translation and lowers coupling efficiency and energy storage, thereby altering the energy balance with potential consequences on lipid accumulation and adiposity [20].Moreover, it has been suggested as a new susceptibility gene for human obesity [21,22].
All regions associated with ABF in Duroc pigs were found on SSC6.Although there have been no studies related to ABF, many studies have searched for the QTL in this SSC6 region associated with ABF [23][24][25].In Landrace pigs, the SLC23A2 gene on SSC17 at the 13 Mb region encodes sodium-coupled vitamin C transporter 2 (SVCT2), whose expression may be regulated by insulin-like growth factor signaling [26].Similarly, its association with ABF has not been studied.Thus, research using a high-density genotyping chip is needed to identify the region.

Average daily gain
For ADG, 25 informative SNPs were identified in 10 significant window regions (≥1% genetic variance) (Table 5).These regions explained 1.00% to 5.46% of the total genetic variance for ADG, and 31 genes were annotated in these genomic regions.Candidate regions associated with ADG were identified in SSCs 1, 3, 5, 6, 10, 13, 15, and 17 in Duroc pigs, SSCs 10 In Landrace pigs, the most significant 1 Mb window region explained 5.46% of the additive genetic variance and was captured on SSC10 at the 11 Mb region (DRGA0010301).In Duroc pigs, the most significant window explained 3.53% of the genetic variance and was captured on SSC3 at the 123 Mb region (MARC0091117).In Yorkshire pigs, the most significant window explained 1.46% of the genetic variance and was captured on SSC5 at the 95 Mb region (ASGA0026863).
The FAM177B gene adjacent to SSC10 at the 11 Mb re-gion, which was the region most significantly related to ADG, is related to body size in Hulun Buir sheep [27].The TRIB2 gene, which is close to SSC3 at the 123 Mb region, the most significant region in Duroc pigs, is related to growth traits in Thoroughbred pigs [28].The RBMS3 gene, which is close to SSC13 at the 15 Mb region, is related to growth traits in cattle [29].
In Yorkshire pigs, the MGAT4C gene close to SSC5 at the 95 Mb region coincided with the region discovered in a study on Nero Siciliano pigs [30], which is also consistent with the results of a study on Italian Large White pigs [31].The RNF152 gene adjacent to SSC1 at the 159 Mb region coincided with the region discovered in a study on ADG in Landrace×Large Whites [32] and that discovered in a study on ABF and DAYS100 in Duroc pigs. Lee et al [16] examined ABF, DAYS90, loin muscle area, and lean percentage in Duroc pigs.
The FERMT1 gene adjacent to SSC17 at the 15 Mb region positively regulates the transforming growth factor beta (TGFbeta) receptor signaling pathway in pigs and is significantly associated with carcass length.The BMP2 gene is involved in the TGF-beta signaling pathway, playing a role in bone and cartilage development, and has been proposed as a strong candidate gene for carcass length [33].In addition, it has been associated with body weight and body conformation traits in pigs [27].

Days to 90 kg
For DAYS90, 25 SNPs were identified in eight significant windows (≥1% genetic variance) (Table 6).The candidate regions associated with DAYS90 were identified at SSCs 1, 5, 6, 10, 12, 15, and 16 in Duroc pigs, SSC10 in Landrace pigs, and SSCs 5 and 17 in Yorkshire pigs.These regions explained 1.02% to 5.07% of the total genetic variance, and 28 genes were annotated in these genomic regions.
In Landrace pigs, the most significant 1 Mb window region explained 5.07% of the additive genetic variance and was captured on SSC10 at the 11 Mb region (DRGA0010301).
In Duroc pigs, the most significant region, explaining 4.86% of the variance, was captured on SSC6 at the 80 Mb region (H3GA0018314).In Yorkshire pigs, the most significant region explained 1.54% and was captured on SSC5 at the 71 Mb region (INRA0019895, ASGA0026241, ASGA0026236, and ALGA0032782).
The LACTBL1 gene is located on SSC6 at the 80 Mb region, the most significant region in Duroc pigs, and has been associated with weight among the British population [34].
The FAIM2 gene, which is located on SSC5 at the 15 Mb region, is correlated with muscle mass in cattle and it is a candidate gene for growth and carcass traits [35].It is also closely related to obesity in humans, and many studies have been conducted with the MC4R gene [36,37].NCKAP5L, located at the same position, has also been reported as a candidate gene for daily weight gain, which is a growth trait in cattle [38,39].
PIK3R1, located on SSC6 at the 45 Mb region, is directly related to lipid metabolism [40] and is involved in skeletal muscle differentiation and proliferation [41].In addition, Chen et al [42] reported that it regulates feed intake and fat deposition in Chinese Laiwu pigs, while another study reported a QTL closely related to backfat thickness in pigs [43,44].
For the identified significant regions, there were six overlapping windows for ADG and DAYS90, which explained  the different proportions of genetic variance in these two traits.For complex quantitative traits, it was assumed that the linear effects of genes fit the average of the traits completely.However, in practice, the effects of genes are not always linear for traits, and the nonlinear assumption is more appropriate, which means that genes contribute differently and QTL has pleiotropic effects between traits.The region with the largest explained genetic variance for ADG and DAYS90, located on SSC10 at the 57 Mb region, seemingly had pleiotropic effects on growth traits in pigs.

CONCLUSION
We identified 28 significant window regions associated with three growth traits (ABF, ADG, and DAYS90) in Duroc, Landrace, and Yorkshire pig populations using the Bayesian GWAS method.The genetic variance of the identified windows varied from 1.00% to 5.46%.Furthermore, 14 genes with related functional validation in previous studies were identified as candidate genes for growth traits.Such a full use of phenotypic and genotypic data and genealogical information will further advance our understanding of genetic architecture and accelerate the genetic improvement of economically important traits in pigs.
In addition, the SNPs within the identified regions may be useful for marker-assisted selection or genomic selection in future pig breeding programs.

Table 1 .
Basic statistics of single-nucleotide polymorphism data set SNP, single-nucleotide polymorphism; QC, quality control.

Table 2 .
Variance components and heritabilities for adjusted backfat thickness, average dairy gain, days to 90 kg by pig breed

Table 3 .
Frequencies, means and standard deviations for adjusted backfat thickness, average dairy gain, days to 90 kg by pig breed ABF, adjusted backfat thickness; ADG, average daily gain; DAYS90, days to 90 kg.www.animbiosci.org811

Table 5 .
Summary of informative single-nucleotide polymorphisms in the significant 1-Mb windows for average daily gain by breed