Genome-wide association study to reveal new candidate genes using single-step approaches for productive traits of Yorkshire pig in Korea
Article information
Abstract
Objective
The objective is to identify genomic regions and candidate genes associated with age to 105 kg (AGE), average daily gain (ADG), backfat thickness (BF), and eye muscle area (EMA) in Yorkshire pig.
Methods
This study used a total of 104,380 records and 11,854 single nucleotide polymorphism (SNP) data obtained from Illumina porcine 60K chip. The estimated genomic breeding values (GEBVs) and SNP effects were estimated by single-step genomic best linear unbiased prediction (ssGBLUP).
Results
The heritabilities of AGE, ADG, BF, and EMA were 0.50, 0.49, 0.49, and 0.23, respectively. We identified significant SNP markers surpassing the Bonferroni correction threshold (1.68×10−6), with a total of 9 markers associated with both AGE and ADG, and 4 markers associated with BF and EMA. Genome-wide association study (GWAS) analyses revealed notable chromosomal regions linked to AGE and ADG on Sus scrofa chromosome (SSC) 1, 6, 8, and 16; BF on SSC 2, 5, and 8; and EMA on SSC 1. Additionally, we observed strong linkage disequilibrium on SSC 1. Finally, we performed enrichment analyses using gene ontology and Kyoto encyclopedia of genes and genomes (KEGG), which revealed significant enrichments in eight biological processes, one cellular component, one molecular function, and one KEGG pathway.
Conclusion
The identified SNP markers for productive traits are expected to provide valuable information for genetic improvement as an understanding of their expression.
INTRODUCTION
Over time, genetic improvement in pig breeding for economically important traits has been achieved through continuous research efforts. Productive traits, including average daily gain (ADG), age to 105 kg (AGE), backfat thickness (BF), and eye muscle area (EMA) exhibit moderate to high heritability, enabling their enhancement through selective breeding strategies [1]. ADG and AGE directly impact pig growth [2,3], while BF plays a crucial role in the reproductive performance of Landrace and Yorkshire sows [4], thereby influencing the breeding potential of the maternal line.
According to the Korean Swine Performance Recording Standards (KSPRS) established by the Ministry of Agriculture, Food and Rural Affairs (MAFRA), performance testing is conducted within the weight range of 70 to 110 kg, with the current endpoint set at 90 kg. To assess growth trait performance, the number of days required to reach 90 kg and backfat thickness are considered. However, the 90 kg endpoint weight has remained unchanged since its establishment in 1984, reflecting the market weight of finishing pigs at that time. Considering the prevailing trend of market weights exceeding 110 kg, there is a consensus emerging that the endpoint weight for performance testing should be adjusted accordingly. Consequently, a new adjustment formula based on a weight of 105 kg has been developed by the National Institute of Animal Science (NIAS). This updated formula aligns more closely with market weights, enabling accurate evaluation of productive traits in breeding animals and enhancing genetic improvement and efficiency.
Genome-wide association study (GWAS) have been extensively applied to various domains, including the identification of genetic variants associated with economically significant traits. Most economic traits in livestock exhibit quantitative nature with polygenic inheritance patterns, thereby making their underlying genetic mechanisms not fully elucidated. Multiple candidate genes and significant markers have been reported for the same trait, often showing associations between multiple traits at the same genomic locus. While these findings are inherent to quantitative traits, GWAS by single marker analysis may have limited power in detecting quantitative trait loci (QTLs) and mapping accuracy [5]. Moreover, the cost associated with analyzing single nucleotide polymorphism (SNP) panels and the availability of genomic data across individuals pose additional challenges. Several recent studies have employed this approach to explore production, carcass, and reproductive traits in livestock species [6–8].
In this study, our objective is using single-step approaches to identify genomic regions and candidate genes associated with productive traits (AGE, ADG, BF, and EMA) in Yorkshire pig. Additionally, we conducted gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses to gain deeper insights into the underlying biological processes and functional terms associated with the identified candidate genes for productive traits.
MATERIALS AND METHODS
Animals and phenotypes
The animals used in this study were raised in five great-grand-parents farms in Korea. In brief, a total of 104,380 Yorkshire (17,899 males and 86,481 females) born between 2015 and 2021 were used in this study (Supplementary Table S1). In this study, productive traits such as AGE, ADG, BF, and EMA adjusted to 105 kg were calculated as reported by the NIAS in Korea (https://www.nias.go.kr/images/promote/result/file/2021_2_5.pdf).
Single nucleotide polymorphism data and quality control
In this study, the Illumina Porcine 60K V1 and V2 were used, and V2 was selected as the reference panel for imputation. Prior to imputation, phasing was performed using Shapeit4 [9], which is a fast and accurate method for haplotype estimation that uses a PBWT-based approach to select informative conditioning haplotypes. Imputation was then conducted using Impute5 [10], which assumes phased samples with no missing alleles. After imputation, quality control (QC) was performed by PLINK v1.90 [11] to exclude SNPs with low call rates (<90%), low minor allele frequencies (<0.01), or deviation from Hardy-Weinberg equilibrium (10−6). After QC, we used the number of animals and SNPs were 11,854 and 29,732, respectively.
Statistical analysis
We estimated the genetic parameters of AGE, ADG, BF, and EMA by average information restricted maximum likelihood method. We considered two approaches: pedigree-based best linear unbiased prediction (PBLUP) and single-step genomic best linear unbiased prediction (ssGBLUP). Each trait was estimated with a single-trait animal model, and the equation as follows:
where y is the vector of phenotypic observations; b is the vector of fixed effects (herd-birth year-season, sex); a is the vector of additive genetic effects; e is the vector of residuals; and X and Z are the incidence matrices for b, a, and e. Heritability was estimated as
Furthermore, GEBVs calculated using ssGBLUP approach, and marker effects were derived from these GEBVs. In contrast to the conventional BLUP approach, ssGBLUP substituted the inverse of the pedigree relationship matrix (A−1) with the inverse of the combined matrix H−1, which incorporated both the pedigree and genomic relationship matrices [12]. The H−1 can be represented as follows:
where A is the numerator relationship matrix based on pedigree for all animals; A22 is the numerator relationship matrix for genotyped animals; and G is the genomic relationship matrix [13]. The SNP effect of each marker was estimated using the reverse operation method of GEBVs, using the following equation:
where, û is the vector of the SNP effect,
Linkage analyses
In this study, we conducted linkage analyses between the most significant regions associated with teat traits and the identified SNPs within these regions. To investigate the distribution of linkage disequilibrium (LD) blocks in the genotype data after completing SNP QC, we utilized the Haploview software, which generates marker quality statistics, LD information, haplotype blocks, population haplotype frequencies, and single marker association statistics in a user-friendly format [18]. Haploview calculates various pairwise measures of LD and utilizes them to create a graphical representation. Thus, we performed LD block analysis and visualization of the identified SNPs on SSC 1 in the three pig breeds. The visualization using Haploview represents LD values between markers with colors, with stronger associations displayed in shades of red.
Identification of candidate genes and functional enrichment analysis
We conducted gene annotation for the markers showing significance in the GWAS analysis. The significance level was determined using the Bonferroni suggestive threshold, and markers surpassing this threshold were subjected to gene annotation and functional enrichment analysis. To identify genes within the identified QTL regions, particularly within the significant windows, we utilized the ensemble Sus scrofa 11.1 database (https://www.ensembl.org/biomart). Furthermore, to gain deeper insights into the biological processes associated with these regions, we performed GO and KEGG analyses using the Database for Annotation, Visualization, and Integrated Discovery (DAVID v6.8, https://david.ncifcrf.gov/). GO terms and KEGG pathways showing significant enrichment were determined based on a p-value threshold of <0.05. Through these analyses, we gained valuable knowledge regarding crucial molecular pathways and biological functions associated with the observed genetic variations.
RESULTS AND DISCUSSION
Heritability
We compared the heritability of productive traits obtained between PBLUP and ssGBLUP (Table 1). Except for EMA, all other traits exhibited heritability values of 0.45 or higher. Comparing PBLUP and ssGBLUP, the estimated heritability values were consistently higher with ssGBLUP. The ssGBLUP method, which incorporates both pedigree and genotypic information, theoretically provides more accurate estimates of genetic parameters [6].
Genome-wide association study and linkage disequilibrium analysis
In the majority of instances, livestock's primary economic characteristics comprise quantitative traits, with the exception of select specific traits. These quantitative traits possess an intricate genetic architecture, rendering the identification of candidate genes a paramount goal in animal breeding programs. These traits represent pivotal determinants that significantly influence the financial gains of agricultural enterprises, thus necessitating their careful consideration during breeding assessments. In this study, SNP markers showing significance above the Bonferroni correction (1.68×10−6) were found to be identical for AGE and ADG, with a total of 9 markers, while for BF and EMA, 4 markers were identified (Table 2; Figure 1). The region with the highest number of markers detected, excluding BF, was found on SSC 1 for all three traits, and for BF, two markers were found on SSC 5. Previous GWAS studies on productive traits adjusted to 100 kg in Duroc, Landrace, Yorkshire, and Pietrain pig populations reported significant SNPs associated with ADG on SSC 1, 3, 7, 10, and 13, and significant SNPs associated with AGE on SSC 1, 3, 4, 6, 7, 8, 9, and 13 [19].
In our study, the significant regions identified through GWAS for AGE and ADG were consistently the same. The most significant marker for both traits was ALGA0006623, which is located in the 5' untranslated region region of the gene ENSSSCG00000048538. ALGA0006623 has been reported as a significant marker and gene associated with lean meat percentage (LMP) and ADG in American Duroc and Canadian Duroc pigs [20]. Furthermore, it was identified as the most significant SNP in relation to the fatness trait in pigs [21]. The marker ASGA0038525 was found to be common for AGE, BF, and EMA, with significance above the Bonferroni threshold observed for BF. For complex quantitative traits, it is often more appropriate to assume a nonlinear relationship of gene effects rather than a linear one [19], as genes may contribute differently, and pleiotropic effects of QTLs between traits can occur [5]. Pleiotropic QTLs are common in the pig genome, as exemplified by QTLs related to vertebral number, body length, and nipple number located on SSC 7 [22]. The ASGA0038525 marker identified in our study can be considered as an example of such pleiotropic effects on quantitative traits. Additionally, the 6 markers found on SSC 1 can be regarded as having pleiotropic effects on AGE, ADG, and EMA.
In this study, we observed strong LD among significant markers identified on SSC 1 for AGE, ADG, and EMA, as confirmed by the LD analysis using Haploview (Figure 2; Supplementary Figure S1). Specifically, we detected LD between the SNP ASGA0005017 located within the CCBE1 gene and adjacent SNPs, spanning a 1.9 Mb region for AGE and ADG, and a 1.8 Mb region for EMA. Within these regions, the widely known MC4R gene, associated with pig growth, fat deposition, and feed intake [23,24], is located, indicating its potential as a candidate gene for the observed LD and significant effects within this region. Therefore, if MC4R is considered a causal gene, this supports the presence of LD and suggests that the observed significant effects in this region may be influenced by the observed LD.
According to a GWAS study on feed efficiency and related traits in Yorkshire and Duroc pigs, the markers ASGA0004992, ASGA0005017, and ALGA0006707, located on SSC 1, were reported as markers associated with ADG in Yorkshire pigs in the PigQTLdb [25]. The marker ALGA0006707, located near the MC4R gene, is widely known as a gene that significantly influences pig growth traits and average feed intake [23,26–28]. The ALPK2 gene on SSC 1 plays important roles in cardiogenesis and is specifically expressed in muscle tissue, including the longissimus dorsi muscle, where it was upregulated in Wannanhua pigs compared to Yorkshire pigs [29,30]. The ABL1 gene has been reported as a gene associated with fat metabolism in Yorkshire pigs [31] and considered as a candidate gene related to meat-to-fat ratio [32]. The CPE gene is hypothesized to be a candidate gene for meat quality traits related to intramuscular fat level and glucose metabolism in bovines [33].
The CCND2 gene's expression is significantly associated with the lead SNP in the liver, lung, and spleen, and it has been reported as a gene that influences backfat thickness [34]. CCND2, a candidate gene for back quality in Landrace pigs, is also essential for the growth of pancreatic islets, which regulate animal growth through hormonal activities [35]. Recent research identified CCND2 as the most likely causal gene for backfat thickness and osteochondrosis in pigs [36]. The APBB2 gene, located on SSC 8, may have roles in cellular and physiological functions related to BF accumulation [37].
In this study, we compared the markers and candidate genes showing significance for each growth trait with previous research findings. While some of the identified markers and candidate genes were consistent with traits reported in previous studies, we also found instances where candidate genes associated with ADG showed significance for other traits such as BF or LMP in other studies. This suggests that economic traits in pigs, being quantitative traits, can exhibit pleiotropic effects. Furthermore, the involvement of genetic associations between productive traits can also account for these results. Moreover, the research findings specifically related to EMA in pigs were sparse, and no reports on the candidate genes identified through GWAS were found. However, we observed that the markers showing significance for EMA in this study were mostly overlapping with the markers identified for ADG and AGE, and they also included markers found for BF. Therefore, the markers and candidate genes identified in this study provide valuable information for understanding the complex genetic influences underlying quantitative traits.
Gene ontology terms and Kyoto encyclopedia of genes and genomes pathway enrichment analysis
In this study, gene set enrichment analyses were conducted to examine the associations between various terms and productive traits. The results revealed significant enrichments in 8 biological processes, 1 cellular component, 1 molecular function, and 1 KEGG pathway. Among them, the most significant GO term identified was GO:0002020, which pertains to protease binding (Table 3). Although the GO terms associated with the CCDH20 and CDH8 genes were frequently observed, there have been no reports on their association with productive traits in pigs and other livestock.
Positive regulation of interleukin-2 production (GO:003 2743) refers to any process that activates or increases the frequency, rate, or extent of interleukin-2 production. One of the genes related to this term, MAP3K7, has been reported to be associated with growth traits [38]. Positive regulation of bone resorption (GO:0045780) refers to any process that activates or increases the frequency, rate, or extent of bone resorption. FSHB, one of the genes associated with this term, has been reported to be linked to the major gene controlling litter size [39], and it also affects reproductive capacity [40].
Phosphatidylinositol phosphate biosynthetic process (GO: 0046854) refers to the chemical reactions and pathways that result in the formation of phosphatidylinositol phosphate. Phosphatidylinositol is involved in various physiological functions in the body, including muscle contraction, cell proliferation, and differentiation [41]. Furthermore, one of the genes associated with this term, IP6K3, has recently been announced as a candidate gene related to body weight [5]. IP6K3 encodes inositol hexakisphosphate kinase 3, which generates inositol pyrophosphates and regulates diverse cellular functions, including metabolism and body weight. Mice lacking this gene exhibited lower growth rates, impaired metabolism, and shorter lifespans [42].
The SOCS2 gene is one of the genes associated with cell growth, metabolism, and immunity [43]. It has been reported as a gene that prevents the occurrence of various tumors and liver diseases caused by fat accumulation [44,45]. The results of the GO enrichment analyses further support the involvement of numerous genes in growth development.
CONCLUSION
The identified SNP markers for productive traits are expected to provide valuable information for genetic improvement as an understanding of their expression. In this study, we have identified novel genes (ENSSSCG00000052156, ENSSSCG 00000033497, ENSSSCG00000046071) associated with productive traits in Yorkshire pigs. Furthermore, the GO and KEGG analyses revealed that, except for GO:0032743 and GO:0042981, the remaining GO terms and KEGG pathways have not been directly implicated in the study of productive traits in Yorkshire pigs. With the accumulation of more phenotype and SNP data in the future, it is anticipated that more effective SNP markers can be identified.
Notes
CONFLICT OF INTEREST
We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.
FUNDING
The authors received no financial support for this article.
SUPPLEMENTARY MATERIAL
Supplementary file is available from: https://doi.org/10.5713/ab.23.0255
Supplementary Table S1. Basic statistics for productive traits of Yorkshire pig
Supplementary Figure S1. LD block analysis between significant markers found SSC 1 region using Haploview for (A) AGE and (B) EMA.
ab-23-0255-Supplementary.pdf