Multifactor Dimensionality Reduction ( MDR ) Analysis to Detect Single Nucleotide Polymorphisms Associated with a Carcass Trait in a Hanwoo Population

Studies to detect genes responsible for economic traits in farm animals have been performed using parametric linear models. A non-parametric, model-free approach using the ‘expanded multifactor-dimensionality reduction (MDR) method’ considering high dimensionalities of interaction effects between multiple single nucleotide polymorphisms (SNPs), was applied to identify interaction effects of SNPs responsible for carcass traits in a Hanwoo beef cattle population. Data were obtained from the Hanwoo Improvement Center, National Agricultural Cooperation Federation, Korea, and comprised 299 steers from 16 paternal half-sib proven sires that were delivered in Namwon or Daegwanryong livestock testing stations between spring of 2002 and fall of 2003. For each steer at approximately 722 days of age, the Longssimus dorsi muscle area (LMA) was measured after slaughter. Three functional SNPs (19_1, 18_4, 28_2) near the microsatellite marker ILSTS035 on BTA6, around which the QTL for meat quality were previously detected, were assessed. Application of the expanded MDR method revealed the best model with an interaction effect between the SNPs 19_1 and 28_2, while only one main effect of SNP19_1 was statistically significant for LMA (p<0.01) under a general linear mixed model. Our results suggest that the expanded MDR method better identifies interaction effects between multiple genes that are related to polygenic traits, and that the method is an alternative to the current model choices to find associations of multiple functional SNPs and/or their interaction effects with economic traits in livestock populations. (


INTRODUCTION
Detection of genes responsible for economic traits in farm animals has been widely practiced.Most traits of economic importance in livestock species are multi-factorial, i.e., influenced by multiple genes and their interactions with environmental factors.Generally, models used to test the effects of genes on traits have been based on parametric methods, such as general linear models or the Animal model (Henderson, 1976).However, when multiple factors, e.g., multiple genes and their interaction effects, are taken into account, model building may be cumbersome and overparameterization problems can arise.As an option for efficiently detecting multiple genes and their interaction effects, a multifactor dimensionality reduction (MDR) method was introduced (Ritchie et al., 2001;Chung et al., 2006), which was designed to handle high-order dimensional data and to uncover complex relationships without relying on specified models fitting multiple genes' interactions (Bastione et al., 2004).
Most quantitative trait loci (QTL) studies have been performed using experimental crosses between different breeds or commercial populations comprising large paternal half-sib families (Kim and Farnir, 2006;Kim et al., 2007;Lee et al., 2007).Results of primary genome scans, i.e., detection of QTL, are followed by candidate gene studies to find functional single nucleotide polymorphisms (SNPs) around the QTL region that are associated with the production trait of interest.As more SNP information is available due to increasing numbers of SNPs in livestock genome DBs (www.animalgenome.org) as well as in human and other mammalian genomes, selection of multiple genes around QTL, i.e., functional SNPs, is a better option for simultaneous identification of several SNPs using highthroughput tools such as DNA chips (Barendse et al., 2007).
We herein report association studies using multiple SNPs by applying a MDR method to test main and interaction effects of multiple SNPs on meat quality around the QTL on BTA6, which is located near the ILST035 microsatellite regionin Hanwoo cattle (Yeo et al., 2004).The results were also compared with results using parametric linear models.

Animals, phenotypes and genetic markers
Data were obtained from the Hanwoo Improvement Center, National Agricultural Cooperation Federation, Korea, and comprised 229 steers that were born at Namwon or Daegwanryong livestock testing station from 16 paternal half-sib proven sires between spring of 2002 and fall of 2003.For each steer at approximately 722 days of age, the Longssimus dorsi muscle area (LMA) was measured after slaughter according to the standards of the Korean Animal Products Grading Service.Six functional SNPs were selected that were located near the microsatellite marker, ILSTS035 on BTA6, around which the QTL for meat quality were detected in our previous study (Yeo et al., 2004).To determine whether the SNPs were independently distributed, linkage disequilibrium was tested between pairs of SNPs.There were complete linkage disequilibria between two SNPs and between another three SNPs, which left three independent SNPs (19_1, 18_4, 28_2) for this study (results not shown).

Statistical analysis using general linear model
The general linear model to analyze the phenotype was: Where, Y ijklmn is an observed phenotype, μ is the overall mean, C i is the i th contemporary group (i = 1 to 8), S j is the j th sire's random effect (j = 1 to 16), β is a linear effect of the steer's age, Mi k is the k th genotype effect of a marker (k = 1, 2, 3) for SNP i, MiMj is an interaction term between markers i and j, M1M2M3 is a three-way interaction term for the three markers and ε ijklmn is random error.The contemporary group was defined as a group of individuals fed in the same pen and slaughtered on the same date.The analyses were performed using MIXED procedure of SAS v.9.1.

MDR analysis
Multifactor-dimensionality reduction (MDR) method is non-parametric and model-free, and was initially implemented in case-control studies (Hahn et al., 2003).For application to continuous data, the CART (classification and regression tree) algorithm was developed and combined into the MDR method (Paolo, 2003).The expanded MDR method involves classification into two groups using a regression tree, i.e., high and low for the phenotypes, and impurity in the group can be defined as: Where, g y is a mean value for the node (group) and gj y is the observation of the j th individual in a total of N g individuals of the group (g = high or low).Each individual is assigned to a cell (e.g., if there are two SNPs with three genotypes per SNP, then there are 9 possible cells), and then each cell can be defined as high or low group.The expected numbers of the high and low group are: Where n is the number of total cells and I g (i) is a indicator function where Then the average squared error (ASE) is: is the sum of squared errors in the high group, is the sum of squared errors in the low group, and Y ij is the j th individual phenotype in the i th cell.
The procedures involved in the expanded MDR method are displayed in Figure 1 and summarized in the following steps: Step Step 4. The impurity function in the CART algorithm uses the variance impurity, so that the group with the higher average value is labeled as high and the remainders are labeled as low.
Step 5.The expanded MDR model with the smallest ASE is chosen among all of the two-factor combinations (e.g., SNPs 1 and 2, 1 and 3, and 2 and 3).Step 6.In order to evaluate the predictive ability of the model, the predicted ASE (P_ASE) is estimated using 10-fold cross-validation.
These six steps were repeated for each possible combination of given n (= 1, 2, and 3).The model with the minimum predicted ASE was selected as the best-model.However, for the selected best model, statistical significance was not determined by the predicted ASE.Thus, permutation tests were performed to determine empirical significance thresholds by applying the same MDR method (Good, 1994).Before the MDR implementation, phenotypes were adjusted for the contemporary, sire and steer's age effects using residuals that were obtained after fitting the general linear model without SNP effects.

RESULTS
The effects of contemporary groups and sire effects on LMA were statistically significant (p<0.01).However, no interaction term between marker pairs, i.e., M1M2, M1M3, M2M3, M1M2M3, was associated with the LMA variation (p>0.3)(results not shown).The analysis was conducted again after removing the marker interaction factors, and only one SNP (19_1) had a significant effect on LMA (p<0.01).For this SNP, the G allele conferred a LMA 3.1 cm 2 greater than the A allele in additive fashion, i.e., with no dominance effect, explaining 4.5% of the phenotypic variation (Table 1).
Table 2 presents ASEs and P_ASEs for different combinations of SNPs that were obtained by applying the expanded MDR method to LMA analysis.Among the models with one SNP, the model with SNP19_1 had the smallest P_ASE value of 599.1.However, when considering two SNPs, the model with SNP19_1 and SNP28_2 had a the P_ASE value of 596.71, which was lower than that for the one SNP model with SNP19_1, and thus represented the best model among all combinations of SNPs.Permutation tests also revealed statistical significance (p = 0.004) for the interaction effects of SNP19_1 and SNP28_2 (p = 0.0045 for the one SNP model with SNP19_1).

DISCUSSION
A novel method, MDR, was applied to reduce the dimensionality caused by simultaneously fitting multiple genes and their interactions into models.In this MDR method, a CART algorithm was added to adjust for continuous properties of phenotypes.This model-free and non-parametric method produced different results when comparing with the linear mixed model.While the latter model did not detect any interaction effects between the SNPs, but detected only the main effect of one SNP, SNP19_1, the MDR method enabled the choice of the best model with an interaction effect between two SNPS, SNP19_1*SNP28_2 (Tables 1 and 2).In the linear mixed model, factors to estimate parameters are fitted in orderly form, i.e., main factors that are followed by interaction effects between the main effects.That is, quadratic or higher-orderly effects are tested as conditional on the main effects.In contrast, the MDR method is free of factordimension orders such that a model that has an effect with the most significant contribution to phenotypic variation can be chosen first.Intrigued by the results of the MDR analysis, another linear parametric model was applied in which, among all possible SNP effects, only the interaction effect of SNP19_1*SNP28_2 was fitted without main effects of SNPs.Sum of squares explained by the interaction term was 940.7, which was greater than the value of 661.0 for SNP19_1 when fitting only main SNP effects in the model (Table 1).However, the p value for the interaction effect was not lower (0.018) than that (p = 0.002) under the model with main SNP effects, because the p value is a function of the sum of squares and degrees of freedom between numerator and denominator in the Fstatistics as well as factors fitted in the model.
The number of SNPs used in this study to evaluate effects of high-order interactions under the parametric and non-parametric models was limited in comparison to other MDR reports, in which more than 10 genes or SNPs were tested in human multi-factorial diseases (Bastone et al., 2004;Cho et al., 2004).However, the MDR method applied in this study allowed detection of an interaction effect of two SNPs, while the parametric linear model did not.This result suggests that the expanded MDR method is an alternative to model choices that can find associations of multiple functional SNPs and/or their interaction effects with economic traits in livestock.Indeed, most of these traits are influenced by multiple genes with environmental interactions, which may be strongly affected by interactions among genes (Carlborg and Haley, 2004).
One disadvantage of using the expanded MDR method is that this MDR method requires demanding computational analysis when applied at genome-wide level, e.g., using 10 K or 50 K DNA chips (Barendse et al., 2007).However, application of the method with moderate numbers of genes or SNPs, e.g., around several QTL regions that were previously detected, may be a good alternative for better identifying interaction effects between genes that are related to polygenic traits.
1.The data are randomly divided into 10 equal parts: one testing set and nine training sets as parts of the cross-validation.Step 2. A set of n (n = 1, 2, or 3) SNPs is selected from the pool of all SNPS (= 3).Step 3. Based on the observed level of each n, steers are partitioned into classes, referred to as cells.If n = 2, then a set of two SNPs is selected and, as one SNP has three genotypes, there are 3 2 = 9 possible cells.Phenotypic means are calculated within each cell.

Figure 1 .
Figure 1.The expanded MDR method procedures.Steps 1-6 are repeated for each possible cross-validation interval.Numerals in cells represent the means with each multifactor combination.The darker-shaded cells in steps 4 and 6 represent high genotype combinations and the lighter-shaded cells represent low genotype combinations using the regression tree.LMA, the Longissimus muscle dorsi area.

Table 1 .
Least squares means and standard errors (SE) of the SNP genotypes for Longissimus muscle dorsi area (LMA) using a The linear model included a fixed effect of contemporary groups, a covariate of steer's age, and a random effect of sire.In addition, three SNPs were fitted as fixed factors in the model.a The same letter indicates no significant difference between the means (p>0.05). *

Table 2 .
Average squared error (ASE) and predicted ASE (P_ASE) for different numbers of SNPs using the expanded MDR