Genome-wide association study to reveal new candidate genes using single-step approaches for productive traits of Yorkshire pig in Korea

Objective The objective is to identify genomic regions and candidate genes associated with age to 105 kg (AGE), average daily gain (ADG), backfat thickness (BF), and eye muscle area (EMA) in Yorkshire pig. Methods This study used a total of 104,380 records and 11,854 single nucleotide polymorphism (SNP) data obtained from Illumina porcine 60K chip. The estimated genomic breeding values (GEBVs) and SNP effects were estimated by single-step genomic best linear unbiased prediction (ssGBLUP). Results The heritabilities of AGE, ADG, BF, and EMA were 0.50, 0.49, 0.49, and 0.23, respectively. We identified significant SNP markers surpassing the Bonferroni correction threshold (1.68×10−6), with a total of 9 markers associated with both AGE and ADG, and 4 markers associated with BF and EMA. Genome-wide association study (GWAS) analyses revealed notable chromosomal regions linked to AGE and ADG on Sus scrofa chromosome (SSC) 1, 6, 8, and 16; BF on SSC 2, 5, and 8; and EMA on SSC 1. Additionally, we observed strong linkage disequilibrium on SSC 1. Finally, we performed enrichment analyses using gene ontology and Kyoto encyclopedia of genes and genomes (KEGG), which revealed significant enrichments in eight biological processes, one cellular component, one molecular function, and one KEGG pathway. Conclusion The identified SNP markers for productive traits are expected to provide valuable information for genetic improvement as an understanding of their expression.


INTRODUCTION
Over time, genetic improvement in pig breeding for economically important traits has been achieved through continuous research efforts.Productive traits, including average daily gain (ADG), age to 105 kg (AGE), backfat thickness (BF), and eye muscle area (EMA) exhibit moderate to high heritability, enabling their enhancement through selective breeding strategies [1].ADG and AGE directly impact pig growth [2,3], while BF plays a crucial role in the reproductive performance of Landrace and Yorkshire sows [4], thereby influencing the breeding potential of the maternal line.
According to the Korean Swine Performance Recording Standards (KSPRS) established by the Ministry of Agriculture, Food and Rural Affairs (MAFRA), performance testing is conducted within the weight range of 70 to 110 kg, with the current endpoint set at 90 kg.To assess growth trait performance, the number of days required to reach 90 kg and backfat thickness are considered.However, the 90 kg endpoint weight has remained unchanged since its establishment in 1984, reflecting the market weight of finishing pigs at that time.
Considering the prevailing trend of market weights exceed ing 110 kg, there is a consensus emerging that the endpoint weight for performance testing should be adjusted accord ingly.Consequently, a new adjustment formula based on a weight of 105 kg has been developed by the National Institute of Animal Science (NIAS).This updated formula aligns more closely with market weights, enabling accurate evaluation of productive traits in breeding animals and enhancing genetic improvement and efficiency.
Genomewide association study (GWAS) have been exten sively applied to various domains, including the identification of genetic variants associated with economically significant traits.Most economic traits in livestock exhibit quantitative nature with polygenic inheritance patterns, thereby making their underlying genetic mechanisms not fully elucidated.Multiple candidate genes and significant markers have been reported for the same trait, often showing associations be tween multiple traits at the same genomic locus.While these findings are inherent to quantitative traits, GWAS by single marker analysis may have limited power in detecting quanti tative trait loci (QTLs) and mapping accuracy [5].Moreover, the cost associated with analyzing single nucleotide poly morphism (SNP) panels and the availability of genomic data across individuals pose additional challenges.Several recent studies have employed this approach to explore production, carcass, and reproductive traits in livestock species [68].
In this study, our objective is using singlestep approaches to identify genomic regions and candidate genes associated with productive traits (AGE, ADG, BF, and EMA) in York shire pig.Additionally, we conducted gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) en richment analyses to gain deeper insights into the underlying biological processes and functional terms associated with the identified candidate genes for productive traits.

Animals and phenotypes
The animals used in this study were raised in five greatgrand parents farms in Korea.In brief, a total of 104,380 Yorkshire (17,899 males and 86,481 females) born between 2015 and 2021 were used in this study (Supplementary Table S1).In this study, productive traits such as AGE, ADG, BF, and EMA adjusted to 105 kg were calculated as reported by the NIAS in Korea (https://www.nias.go.kr/images/promote/result/file/ 2021_2_5.pdf).

Single nucleotide polymorphism data and quality control
In this study, the Illumina Porcine 60K V1 and V2 were used, and V2 was selected as the reference panel for imputation.Prior to imputation, phasing was performed using Shapeit4 [9], which is a fast and accurate method for haplotype estima tion that uses a PBWTbased approach to select informative conditioning haplotypes.Imputation was then conducted using Impute5 [10], which assumes phased samples with no missing alleles.After imputation, quality control (QC) was performed by PLINK v1.90 [11] to exclude SNPs with low call rates (<90%), low minor allele frequencies (<0.01), or deviation from HardyWeinberg equilibrium (10 -6 ).After QC, we used the number of animals and SNPs were 11,854 and 29,732, respectively.

Statistical analysis
We estimated the genetic parameters of AGE, ADG, BF, and EMA by average information restricted maximum likelihood method.We considered two approaches: pedigreebased best linear unbiased prediction (PBLUP) and singlestep genomic best linear unbiased prediction (ssGBLUP).Each trait was estimated with a singletrait animal model, and the equation as follows: where y is the vector of phenotypic observations; b is the vector of fixed effects (herdbirth yearseason, sex); a is the vector of additive genetic effects; e is the vector of residuals; and X and Z are the incidence matrices for b, a, and e. Heri tability was estimated as We estimated the genetic parameters of AGE, ADG, BF, and EMA by average information restric 93 maximum likelihood method.We considered two approaches: pedigree-based best linear unbia 103 Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from th 104 GEBVs.In contrast to the conventional BLUP approach, ssGBLUP substituted the inverse of the pedig 105 relationship matrix ( �� ) with the inverse of the combined matrix  �� , which incorporated both 106 pedigree and genomic relationship matrices [12].The  �� can be represented as follows: 107 108 where  is the numerator relationship matrix based on pedigree for all animals;  �� is the numera 111 relationship matrix for genotyped animals; and  is the genomic relationship matrix [13].The SNP eff 112 of each marker was estimated using the reverse operation method of GEBVs, using the following equati 113 114 where,  � is the vector of the SNP effect,  � � is the vector of the GEBV,  is the coefficient matrix of 117 SNP, and  is a weighted vector [14].This study utilized the BLUPF90 software family [15], wh

118
, where 103 Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from th 104 GEBVs.In contrast to the conventional BLUP approach, ssGBLUP substituted the inverse of the pedig 105 relationship matrix ( �� ) with the inverse of the combined matrix  �� , which incorporated both 106 pedigree and genomic relationship matrices [12].The  �� can be represented as follows: 107 108 where  is the numerator relationship matrix based on pedigree for all animals;  �� is the numer 111 relationship matrix for genotyped animals; and  is the genomic relationship matrix [13].The SNP ef 112 of each marker was estimated using the reverse operation method of GEBVs, using the following equat 113 114 where,  � is the vector of the SNP effect,  � � is the vector of the GEBV,  is the coefficient matrix of 117 SNP, and  is a weighted vector [14].This study utilized the BLUPF90 software family [15] 103 Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from t 104 GEBVs.In contrast to the conventional BLUP approach, ssGBLUP substituted the inverse of the ped 105 relationship matrix ( �� ) with the inverse of the combined matrix  �� , which incorporated both 106 pedigree and genomic relationship matrices [12].The  �� can be represented as follows: 107 108 where  is the numerator relationship matrix based on pedigree for all animals;  �� is the nume 111 relationship matrix for genotyped animals; and  is the genomic relationship matrix [13].The SNP e 112 of each marker was estimated using the reverse operation method of GEBVs, using the following equa 113 114 where,  � is the vector of the SNP effect,  � � is the vector of the GEBV,  is the coefficient matrix o 117 SNP, and  is a weighted vector [14].This study utilized the BLUPF90 software family [15], w 118 were additive genetic and residual variances, respectively.Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from these GEBVs.In con trast to the conventional BLUP approach, ssGBLUP substituted the inverse of the pedigree relationship matrix (A -1 ) with the inverse of the combined matrix H -1 , which incorporated both the pedigree and genomic relationship matrices [12].The H -1 can be represented as follows: We estimated the genetic parameters of AGE, ADG, BF, and EMA by average information restricted 93 maximum likelihood method.We considered two approaches: pedigree-based best linear unbiased 103 Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from these 104 GEBVs.In contrast to the conventional BLUP approach, ssGBLUP substituted the inverse of the pedigree 105 relationship matrix ( �� ) with the inverse of the combined matrix  �� , which incorporated both the 106 pedigree and genomic relationship matrices [12].The  �� can be represented as follows: 107 108 where  is the numerator relationship matrix based on pedigree for all animals;  �� is the numerator 111 relationship matrix for genotyped animals; and  is the genomic relationship matrix [13].The SNP effect 112 of each marker was estimated using the reverse operation method of GEBVs, using the following equation: 113 where,  � is the vector of the SNP effect,  � � is the vector of the GEBV,  is the coefficient matrix of the 117 SNP, and  is a weighted vector [14].This study utilized the BLUPF90 software family [15], which 118 where A is the numerator relationship matrix based on pedi gree for all animals; A 22 is the numerator relationship matrix for genotyped animals; and G is the genomic relationship matrix [13].The SNP effect of each marker was estimated using the reverse operation method of GEBVs, using the fol lowing equation: We estimated the genetic parameters of AGE, ADG, BF, and EMA by average information restricted 93 maximum likelihood method.We considered two approaches: pedigree-based best linear unbiased 103 Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from these 104 GEBVs.In contrast to the conventional BLUP approach, ssGBLUP substituted the inverse of the pedigree 105 relationship matrix ( �� ) with the inverse of the combined matrix  �� , which incorporated both the 106 pedigree and genomic relationship matrices [12].The  �� can be represented as follows: 107 108 where  is the numerator relationship matrix based on pedigree for all animals;  �� is the numerator 111 relationship matrix for genotyped animals; and  is the genomic relationship matrix [13].The SNP effect 112 of each marker was estimated using the reverse operation method of GEBVs, using the following equation: where,  � is the vector of the SNP effect,  � � is the vector of the GEBV,  is the coefficient matrix of the 117 SNP, and  is a weighted vector [14].This study utilized the BLUPF90 software family [15], which 118 where, 103 Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from 104 GEBVs.In contrast to the conventional BLUP approach, ssGBLUP substituted the inverse of the ped 105 relationship matrix ( �� ) with the inverse of the combined matrix  �� , which incorporated bot 106 pedigree and genomic relationship matrices [12].The  �� can be represented as follows: 107 108 where  is the numerator relationship matrix based on pedigree for all animals;  �� is the nume 111 relationship matrix for genotyped animals; and  is the genomic relationship matrix [13].The SNP 112 of each marker was estimated using the reverse operation method of GEBVs, using the following equa 113 114 where,  � is the vector of the SNP effect,  � � is the vector of the GEBV,  is the coefficient matrix o 117 SNP, and  is a weighted vector [14].This study utilized the BLUPF90 software family [15], w 118 is the vector of the SNP effect, where,  � is the vector of the SNP effect,  � � is the vector of the GEBV,  117 SNP, and  is a weighted vector [14].This study utilized the BLUPF 118 is the vector of the GEBV, M is the coefficient matrix of the SNP, and D is a weighted vector [14].This study utilized the BLUPF90 soft ware family [15], which includes RENUMF90, BLUPF90+, and POSTGSF90, for the GWAS.The pvalues were used to generate a Manhattan plot using the R software and CMplot package [16,17].

Linkage analyses
In this study, we conducted linkage analyses between the most significant regions associated with teat traits and the identi fied SNPs within these regions.To investigate the distribution of linkage disequilibrium (LD) blocks in the genotype data after completing SNP QC, we utilized the Haploview software, which generates marker quality statistics, LD information, haplotype blocks, population haplotype frequencies, and single marker association statistics in a userfriendly format [18].Haploview calculates various pairwise measures of LD and utilizes them to create a graphical representation.Thus, we performed LD block analysis and visualization of the identi fied SNPs on SSC 1 in the three pig breeds.The visualization using Haploview represents LD values between markers with colors, with stronger associations displayed in shades of red.

Identification of candidate genes and functional enrichment analysis
We conducted gene annotation for the markers showing sig nificance in the GWAS analysis.The significance level was determined using the Bonferroni suggestive threshold, and markers surpassing this threshold were subjected to gene annotation and functional enrichment analysis.To identify genes within the identified QTL regions, particularly within the significant windows, we utilized the ensemble Sus scrofa 11.1 database (https://www.ensembl.org/biomart).Further more, to gain deeper insights into the biological processes associated with these regions, we performed GO and KEGG analyses using the Database for Annotation, Visualization, and Integrated Discovery (DAVID v6.8, https://david.ncifcrf.gov/).GO terms and KEGG pathways showing significant enrichment were determined based on a pvalue threshold of <0.05.Through these analyses, we gained valuable knowl edge regarding crucial molecular pathways and biological functions associated with the observed genetic variations.

Heritability
We compared the heritability of productive traits obtained between PBLUP and ssGBLUP (Table 1).Except for EMA, all other traits exhibited heritability values of 0.45 or higher.Comparing PBLUP and ssGBLUP, the estimated heritability values were consistently higher with ssGBLUP.The ssGBLUP method, which incorporates both pedigree and genotypic information, theoretically provides more accurate estimates of genetic parameters [6].

Genome-wide association study and linkage disequilibrium analysis
In the majority of instances, livestock's primary economic characteristics comprise quantitative traits, with the excep tion of select specific traits.These quantitative traits possess an intricate genetic architecture, rendering the identification of candidate genes a paramount goal in animal breeding programs.These traits represent pivotal determinants that significantly influence the financial gains of agricultural en terprises, thus necessitating their careful consideration during breeding assessments.In this study, SNP markers showing significance above the Bonferroni correction (1.68×10 -6 ) were found to be identical for AGE and ADG, with a total of 9 markers, while for BF and EMA, 4 markers were identified (Table 2; Figure 1).The region with the highest number of markers detected, excluding BF, was found on SSC 1 for all three traits, and for BF, two markers were found on SSC 5. Previous GWAS studies on productive traits adjusted to 100 kg in Duroc, Landrace, Yorkshire, and Pietrain pig popula tions reported significant SNPs associated with ADG on SSC 1, 3, 7, 10, and 13, and significant SNPs associated with AGE on SSC 1, 3, 4, 6, 7, 8, 9, and 13 [19].
In our study, the significant regions identified through GWAS for AGE and ADG were consistently the same.The most significant marker for both traits was ALGA0006623,  2) Gene symbols when intronic or gene symbols (distance) adjacent to the marker when intergenic.
which is located in the 5' untranslated region region of the gene ENSSSCG00000048538.ALGA0006623 has been re ported as a significant marker and gene associated with lean meat percentage (LMP) and ADG in American Duroc and Canadian Duroc pigs [20].Furthermore, it was identified as the most significant SNP in relation to the fatness trait in pigs [21].The marker ASGA0038525 was found to be com mon for AGE, BF, and EMA, with significance above the Bonferroni threshold observed for BF.For complex quanti tative traits, it is often more appropriate to assume a nonlinear relationship of gene effects rather than a linear one [19], as genes may contribute differently, and pleiotropic effects of QTLs between traits can occur [5].Pleiotropic QTLs are common in the pig genome, as exemplified by QTLs related to vertebral number, body length, and nipple number located on SSC 7 [22].The ASGA0038525 marker identified in our study can be considered as an example of such pleiotropic effects on quantitative traits.Additionally, the 6 markers found on SSC 1 can be regarded as having pleiotropic effects on AGE, ADG, and EMA.In this study, we observed strong LD among significant markers identified on SSC 1 for AGE, ADG, and EMA, as confirmed by the LD analysis using Haploview (Figure 2; Supplementary Figure S1).Specifically, we detected LD be tween the SNP ASGA0005017 located within the CCBE1 gene and adjacent SNPs, spanning a 1.9 Mb region for AGE and ADG, and a 1.8 Mb region for EMA.Within these re gions, the widely known MC4R gene, associated with pig growth, fat deposition, and feed intake [23,24], is located, indicating its potential as a candidate gene for the observed LD and significant effects within this region.Therefore, if MC4R is considered a causal gene, this supports the presence of LD and suggests that the observed significant effects in this region may be influenced by the observed LD.
According to a GWAS study on feed efficiency and related traits in Yorkshire and Duroc pigs, the markers ASGA0004992, ASGA0005017, and ALGA0006707, located on SSC 1, were reported as markers associated with ADG in Yorkshire pigs in the PigQTLdb [25].The marker ALGA0006707, located near the MC4R gene, is widely known as a gene that signifi cantly influences pig growth traits and average feed intake [23,2628].The ALPK2 gene on SSC 1 plays important roles in cardiogenesis and is specifically expressed in muscle tissue, including the longissimus dorsi muscle, where it was upreg ulated in Wannanhua pigs compared to Yorkshire pigs [29,30].
The ABL1 gene has been reported as a gene associated with fat metabolism in Yorkshire pigs [31] and considered as a candidate gene related to meattofat ratio [32].The CPE gene is hypothesized to be a candidate gene for meat quality traits related to intramuscular fat level and glucose metabo lism in bovines [33].
The CCND2 gene's expression is significantly associated with the lead SNP in the liver, lung, and spleen, and it has been reported as a gene that influences backfat thickness [34].CCND2, a candidate gene for back quality in Landrace pigs, is also essential for the growth of pancreatic islets, which regulate animal growth through hormonal activities [35].Recent research identified CCND2 as the most likely causal gene for backfat thickness and osteochondrosis in pigs [36].The APBB2 gene, located on SSC 8, may have roles in cellular and physiological functions related to BF accumulation [37].
In this study, we compared the markers and candidate genes showing significance for each growth trait with previ ous research findings.While some of the identified markers and candidate genes were consistent with traits reported in previous studies, we also found instances where candidate genes associated with ADG showed significance for other traits such as BF or LMP in other studies.This suggests that economic traits in pigs, being quantitative traits, can exhibit pleiotropic effects.Furthermore, the involvement of genetic 2) Gene symbols when intronic or gene symbols (distance) adjacent to the marker when intergenic.
associations between productive traits can also account for these results.Moreover, the research findings specifically related to EMA in pigs were sparse, and no reports on the candidate genes identified through GWAS were found.How ever, we observed that the markers showing significance for EMA in this study were mostly overlapping with the markers  identified for ADG and AGE, and they also included markers found for BF.Therefore, the markers and candidate genes identified in this study provide valuable information for understanding the complex genetic influences underlying quantitative traits.

Gene ontology terms and Kyoto encyclopedia of genes and genomes pathway enrichment analysis
In this study, gene set enrichment analyses were conducted to examine the associations between various terms and pro ductive traits.The results revealed significant enrichments in 8 biological processes, 1 cellular component, 1 molecular function, and 1 KEGG pathway.Among them, the most sig nificant GO term identified was GO:0002020, which pertains to protease binding (Table 3).Although the GO terms asso ciated with the CCDH20 and CDH8 genes were frequently observed, there have been no reports on their association with productive traits in pigs and other livestock.
Positive regulation of interleukin2 production (GO:003 2743) refers to any process that activates or increases the fre quency, rate, or extent of interleukin2 production.One of the genes related to this term, MAP3K7, has been reported to be associated with growth traits [38].Positive regulation of bone resorption (GO:0045780) refers to any process that activates or increases the frequency, rate, or extent of bone resorption.FSHB, one of the genes associated with this term, has been reported to be linked to the major gene controlling litter size [39], and it also affects reproductive capacity [40].Phosphatidylinositol phosphate biosynthetic process (GO: 0046854) refers to the chemical reactions and pathways that result in the formation of phosphatidylinositol phosphate.Phosphatidylinositol is involved in various physiological functions in the body, including muscle contraction, cell proliferation, and differentiation [41].Furthermore, one of the genes associated with this term, IP6K3, has recently been announced as a candidate gene related to body weight [5].IP6K3 encodes inositol hexakisphosphate kinase 3, which generates inositol pyrophosphates and regulates diverse  cellular functions, including metabolism and body weight.Mice lacking this gene exhibited lower growth rates, impaired metabolism, and shorter lifespans [42].
The SOCS2 gene is one of the genes associated with cell growth, metabolism, and immunity [43].It has been reported as a gene that prevents the occurrence of various tumors and liver diseases caused by fat accumulation [44,45].The results of the GO enrichment analyses further support the involve ment of numerous genes in growth development.

CONCLUSION
The identified SNP markers for productive traits are expect ed to provide valuable information for genetic improvement as an understanding of their expression.In this study, we have identified novel genes (ENSSSCG00000052156, ENSSSCG 00000033497, ENSSSCG00000046071) associated with pro ductive traits in Yorkshire pigs.Furthermore, the GO and KEGG analyses revealed that, except for GO:0032743 and GO:0042981, the remaining GO terms and KEGG pathways have not been directly implicated in the study of productive traits in Yorkshire pigs.With the accumulation of more phe notype and SNP data in the future, it is anticipated that more effective SNP markers can be identified.
94 prediction (PBLUP) and single-step genomic best linear unbiased prediction (ssGBLUP).Each trait w 95 estimated with a single-trait animal model, and the equation as follows: 96 97  �  �  �  98 99where  is the vector of phenotypic observations;  is the vector of fixed effects (herd-birth year-seas 100 sex);  is the vector of additive genetic effects;  is the vector of residuals; and  and  are the incide 101 matrices for , , and .Heritability was estimated as ℎ � 94 prediction (PBLUP) and single-step genomic best linear unbiased prediction (ssGBLUP).Each trait was 95 estimated with a single-trait animal model, and the equation as follows: 96 97  �  �  �  98 99where  is the vector of phenotypic observations;  is the vector of fixed effects (herd-birth year-season, 100 sex);  is the vector of additive genetic effects;  is the vector of residuals; and  and  are the incidence 101 matrices for , , and .Heritability was estimated as ℎ � 94 prediction (PBLUP) and single-step genomic best linear unbiased prediction (ssGBLUP).Each trait was 95 estimated with a single-trait animal model, and the equation as follows: 96 97  �  �  �  98 99where  is the vector of phenotypic observations;  is the vector of fixed effects (herd-birth year-season, 100 sex);  is the vector of additive genetic effects;  is the vector of residuals; and  and  are the incidence 101 matrices for , , and .Heritability was estimated as ℎ �

Table 1 .
Variance components and heritabilities for productive traits

Table 1 .
Variance components and heritabilities for productive traits 399

Table 1 .
Variance components and heritabilities for productive traits 399

Table 1 .
Variance components and heritabilities for productive traits 399

Table 2 .
Significant SNPs associated with productive traits and their annotated candidate genes SNP, single nucleotide polymorphism; SSC, Sus scrofa chromosome; AGE, age to 105 kg; UTR, untranslated region; ADG, average daily gain; BF, backfat thickness; EMA, eye muscle area.1)Boldindicate p-values exceeding the Bonferroni significance threshold.

Table 2 .
Significant SNPs associated with productive traits and their annotated candidate genes (Continued) Sus scrofa chromosome; AGE, age to 105 kg; UTR, untranslated region; ADG, average daily gain; BF, backfat thickness; EMA, eye muscle area.1)Boldindicate p-values exceeding the Bonferroni significance threshold.