Complex Segregation Analysis of Categorical Traits in Farm Animals : Comparison of Linear and Threshold Models

Main objectives of this study were to investigate accuracy, bias and power of linear and threshold model segregation analysis methods for detection of major genes in categorical traits in farm animals. Maximum Likelihood Linear Model (MLLM), Bayesian Linear Model (BALM) and Bayesian Threshold Model (BATM) were applied to simulated data on normal, categorical and binary scales as well as to disease data in pigs. Simulated data on the underlying normally distributed liability (NDL) were used to create categorical and binary data. MLLM method was applied to data on all scales (Normal, categorical and binary) and BATM method was developed and applied only to binary data. The MLLM analyses underestimated parameters for binary as well as categorical traits compared to normal traits; with the bias being very severe for binary traits. The accuracy of major gene and polygene parameter estimates was also very low for binary data compared with those for categorical data; the later gave results similar to normal data. When disease incidence (on binary scale) is close to 50%, segregation analysis has more accuracy and lesser bias, compared to diseases with rare incidences. NDL data were always better than categorical data. Under the MLLM method, the test statistics for categorical and binary data were consistently unusually very high (while the opposite is expected due to loss of information in categorical data), indicating high false discovery rates of major genes if linear models are applied to categorical traits. With Bayesian segregation analysis, 95% highest probability density regions of major gene variances were checked if they included the value of zero (boundary parameter); by nature of this difference between likelihood and Bayesian approaches, the Bayesian methods are likely to be more reliable for categorical data. The BATM segregation analysis of binary data also showed a significant advantage over MLLM in terms of higher accuracy. Based on the results, threshold models are recommended when the trait distributions are discontinuous. Further, segregation analysis could be used in an initial scan of the data for evidence of major genes before embarking on molecular genome mapping. (Asian-Aust. J. Anim. Sci. 2005. Vol 18, No. 8 : 1088-1097)


INTRODUCTION
Traditional quantitative genetics theory and its application to animal breeding are based on the classical assumption that traits are controlled by a very large number of independent genes each having a small effect.Nevertheless, several genes having major effects on quantitative traits of economic importance have been identified in livestock.Notable examples are double muscling gene in cattle (Hanset and Michaux, 1985), Booroola gene in sheep (Piper and Bindon, 1982), the gene determining high milk flow in goats (Ricordeau et al., 1990) and Natural resistance-associated macrophage protein 1 (NRAMP1) gene controlling resistance to various intracellular parasites in pigs (Yan et al., 2004).
Segregation analysis (Elston and Stewart, 1971) or Mixed Inheritance Models or MIM (to denote polygenic and major gene influence on the trait) has been proposed as a suitable and powerful method to identify segregating single major gene in livestock populations (Le Roy et al., 1989;Hill and Knott, 1990;Knott et al., 1991).It involves maximizing and comparing the likelihoods of the data under different genetic transmission models to determine whether the inheritance of the trait is controlled by a major gene.A significant improvement in the likelihood obtained by incorporating a major gene in the model provides evidence for the segregation of major gene in the population under study (Knott et al., 1991).In a Bayesian inference framework, the Gibbs sampler algorithm was adapted by Guo and Thompson (1994) in order to solve computing problems in complex pedigrees in animal genetics.The Gibbs sampling algorithms have now found a wide-spread use in genetic analysis of quantitative traits recorded in pedigreed animal populations, due to its flexibility in solving complex and demanding statistical models, especially for categorical traits (e.g.Kadarmideen et al., 2001;Lee, 2002).
In segregation analysis of pedigreed animal populations, both maximum likelihood and Bayesian approaches were first developed for normally distributed traits.For example, Bayesian models with Gibbs sampling has been recently applied to dairy cows in Switzerland to detect major genes for milking speed (Ilahi and Kadarmideen, 2004) or egg weight in chickens (Hagger et al., 2004), a normally distributed continuous trait.However, in many animal breeding applications, the data comprises observations which are expressed in two or more categories, representing binary and categorical traits, respectively (e.g.diseases traits, ovulation rate in sheep, non-return rates in cows, degree of calving difficulty, conformation and type traits).Linear and non-linear (threshold) models have been applied and compared using disease data from dairy cattle (e.g.Kadarmideen et al., 2000a), in quantitative trait loci (QTL) mapping in livestock populations and their efficiency (Kadarmideen et al., 2000b;Yi and Xu, 2000;Kadarmideen and Dekkers, 2001).However, no study has so far evaluated and compared such models (linear and threshold model) or the statistical methods (Maximum likelihood versus Bayesian method based on Monte carlo Markov Chain algorithms) in the context of segregation analysis of binary or categorical traits in animal populations.
The main objectives of this study were to compare linear models (LM) and threshold models (TM) segregation analyses based on maximum likelihood and Bayesian methods, respectively.
i. Investigate the impact of distribution of the trait (normal versus categorical or binary data) on the accuracy and power of detecting major genes in the population, using maximum likelihood linear model (MLLM) method.
ii. Investigate the impact of different incidences of binary trait on the accuracy and power of detection of major genes by segregation analysis under both MLLM and Bayesian Threshold Model (BATM) method.
iii.Apply the developed Bayesian methods (under linear and threshold models) to osteochondral diseases (categorical scores) in Swiss pig populations.

Liability theory
The genetic analyses for categorical traits are difficult because the observed phenotype cannot be described by a linear function of genetic and environmental effects.Wright (1934) proposed the assumption that there is a continuous underlying variable, called liability, which determines the expression of a categorical trait.The link between the observable discrete variable and the underlying variable is generated by a set of fixed thresholds.The underlying liability variable is then described by the usual linear model (Gianola, 1982;Gianola and Foully, 1983;Falconer and Mackay, 1996).These liability-threshold models have been applied to gene or quantitative trait loci (QTL) detection and mapping (Xu and Atchley, 1996;Yi and Xu, 2000;Kadarmideen et al., 2000b;Kadarmideen andDekkers, 2001, Kadarmideen andJanss, 2003).

Data simulations
Normally distributed liability (NDL) data : The liability data were simulated using a mixed inheritance model (polygenes plus a major gene) and according to a hierarchical and balanced family structure: one population consisted of 20 sire families with 100 dams per sire, which resulted in 2000 dams.A total of 3 records/phenotypes per dam was simulated (i.e.300 records per sire).Therefore the total number of records in the population is 6000.We assumed more than one phenotype per dam, because in most cases, livestock data sets consist of repeated records.Sires and dams were assumed to be unrelated and have the same allele frequency of the major gene.
The liability data were simulated as follows: Where z ij is the liability, m i is the effect of the i th genotype at the major gene, u ij is the polygenic effect of the j th individual bearing the i th genotype, with u ij ~ N (0, σ u 2 ), pe ij is the permanent environmental effect, with pe ij ~ N (0, ) and e ij is the residual effect, with e ij ~ N (0, σ e 2 ), where σ u 2 , σ pe 2 and σ e 2 are genetic, permanent environmental and residual variances, respectively.The single major gene is assumed to be an additive, biallelic (A 1 and A 2 ) and an autosomal locus with Mendelian transmission probabilities.We consider here that p 1 = 0.6 and p 2 = 1-p 1 are the frequencies of alleles A 1 and A 2 .Three genotypes can be encountered: A 1 A 1 , A 1 A 2 and A 2 A 2 , with a frequency of p 1 2 , 2p 1 p 2 and p 2 2 , respectively.The A 2 allele is assumed to increase the trait value, and is called the favorable allele (depending on whether the increase in trait value is favorable or not).Further, we assume no dominance, the difference between the values of the two homozygotes (a) was 3.7 phenotypic standard deviation units of the trait, and the major gene variance, σ m 2 = 2p 1 p 2 a 2 (Falconer and Mackay, 1996) was 5.88, which is 80% of the total genetic variance.Some traits in animal populations that are controlled by major gene effects close to this magnitude have been reported in literature (Ilahi et al., 2000;Hagger et al., 2004).
The liability data were simulated using heritability, h 2 , of 0.41 and repeatability, r, of 0.52 on the liability scale (the total heritability and repeatability taking into account the major gene effect were 0.78 and 0.82 respectively).The genotype of the offspring was determined according to the Mendelian transmission probabilities.The polygenic effect of the offspring was determined as the summation of the mean of the parents' polygenic effect and the Mendelian sampling effect.The true values of parameters (major gene and polygene) used in the simulation of the population are illustrated in the Table 1.
Categorical data : The simulated NDL data, z ij , were standardized using the average, µ and the standard deviation, σ z , of the trait as: z ij * = (z ij -µ/σ x ), and hence z ij * is the standardized liability data with z ij * ~ N (0, 1).Then the standardized liability data was transformed into categorical data (5 categories) using 4 different thresholds.The expression of the discrete phenotype is based on the existence of thresholds tλ (λ = 1,….,4):when the standardized NDL data z ij * is in the range (t λ-1 , t λ ), the discrete phenotype y ij c = λ.The extreme thresholds t 1 and t 4 are respectively -∞ and +∞.Category threshold and their corresponding incidences used in this study were: t 1 = 0, t 2 = 0.674, t 3 = 1.036 and t 4 = 1.282, corresponding to 4 category incidences, p 1 = 50%, p 2 = 25%, p 3 = 15% and p 4 = 10%.Binary (0/1) data : Binary data are the simplest form of categorical traits with just two categories.Based on the liability concepts, the standardized liability data z ij * could be transformed into binary data as follows: Where t is the threshold point.Here y b taking value of '1' could be considered as diseased and '0' as healthy, thus representing liability model for complex polygenic diseases.
The values for t were chosen in such a way that it represents two scenarios: a less common disease with 15% and more common disease with 40% incidence.The corresponding values of thresholds t were: t = 1.036 for 15% and t = 0.253 for 40% incidence.
The NDL data, categorical and binary data resulting from the transformation of the same NDL data were kept for segregation analyses by MLLM and BATM methods.

Maximum likelihood linear model (MLLM) analysis
There were 4 types of data sets.The original NDL data, the categorical data and the two binary data sets with 15% and 40% incidences.Same MLLM method was performed on all datasets.Simulations and analyses were replicated 100 times for each data set.Different values of parameters were used as initial values for the calculations of the estimated parameters.The segregation analysis method used in this study was based on the comparison of the likelihoods under two inheritance hypotheses (Le Roy et al., 1990;Ilahi et al., 2000;Bodin et al., 2002): Mixed inheritance hypothesis (H 1 ) : This model describes the genetic transmission of the simulated trait by polygenic effects and a single major gene effect.The parameters to be estimated are: the mean of each genotype ) and the genotypic frequencies ).These estimated parameters allowed the computation of the within major gene heritability, the repeatability and the fraction of total genetic variance explained by the major gene.
Polygenic inheritance hypothesis (H 0 ) : This model, which is a sub-model of the H 1 mixed inheritance hypothesis, is given by µ A1A1 = µ A1A2 = µ A2A2 = µ.In this case, the parameters to be estimated are: µ, σ u 2 , σ pe 2 , σ e 2 from which we can compute the heritability, and the repeatability.
Using the simulated population structure and following the model (1), λ 1 can be written, (Le Roy et al., 1995;Ilahi, 1999) as: with: and and λ 0 was defined as: This likelihood ratio is compared to the value of x d 2 with degrees of freedom d equal the difference in number of parameters between the mixed and polygenic inheritance hypotheses (Le Roy et al., 1989;Kadarmideen et al., 2000b;Kadarmideen and Dekkers, 2001).In this analysis, d = 4.
The estimation of parameters maximising the likelihoods was carried out using the Gauss-Hermit quadrature (D01BAF) and optimization (E04JBF) subroutines of the NAG FORTRAN Library (1990) with a quasi-Newton algorithm in which the derivatives were estimated by finite differences.

Bayesian analysis
The analyses were carried out on the same simulated binary data sets (with 15 and 40% incidences) using a Bayesian threshold model (BATM) with Gibbs sampling.MAGGIC software package (Janss, 1998) was used to estimate the genetic parameters of the population.This method constructs Monte Carlo chains of realizations of the model parameters through Gibbs-sampling.These samples constitute the marginal posterior distributions of the model parameters, from which Bayesian inferences on these parameters can be drawn.The method was adapted to our current study as follows.
Animals starting genotypes were randomly allocated, and the initial gene frequency was 0.50.Uniform prior distributions were assumed for variance components: polygenic variance σ u 2 , permanent environmental variance σ pe 2 and residual variance σ e 2 as well as for additive major gene effect a, and for allele frequencies p 1 and p 2 .Variance components are a priori positive, and the allele frequencies are bounded between zero and one including the bounds (Janss et al., 1995;Janss, 1998).The joint density of all the parameters to be estimated, given the data y c or y b , is denoted as follows: Multivariate normal distributions were adopted as priors for animal genetic and permanent environmental effects: , uniform bounded priors were adopted.
In case of Bayesian Threshold Model (BATM) applied to a binary disease, the observed data vector, y b is replaced by the underlying liability z, with the same model terms as in (1).Further details and distributional assumptions in threshold models are given in Janss (2004).
The Gibbs sampler based on Markov chain is primarily used to generate samples from the joint density (Geman and Geman, 1984;Gelfand and Smith, 1990).These samples allow the study of all marginal densities from the joint density.The Gibbs Markov chain is a continuing series of realisations for the parameters given in the joint density (2).
Let θ [t] = (µ [t] , σ u 2 [t] , σ pe 2 [t] , σ e 2 [t] , a [t] , p 1 ) denote the set of realisations for the parameters to be estimated at cycle t in the Gibbs chain.Construction of Gibbs chain requires a set of realisations θ [t+1] , given the current set of realisations θ [t] .
The Gibbs sampler is generally used to study marginal posteriors densities of parameters, considering other parameters as nuisances.The marginal distributions for each parameter are updated in the order given in the joint density (2): General mean : The construction of cycle t+1 from t of the general mean µ is: Polygenic, permanent environmental and residual variance : Variance components follow inverted chi-square distributions (Janss et al., 1995), the new realisations for σ u 2 , σ pe 2 and σ e 2 are obtained as: Sample σ e 2[t+1] as e [t+1] , e [t+1] /x 2 (n-2) (6 where A is the numerator relation matrix, q is number of animals, n is the total number of observations, and x 2 (q-2) and x 2 (n-2) are random deviates from chi-square distributions with q-2 and n-2 degrees of freedom.
Major gene effect : Using genotypes as a known classification factor, effect of a major gene effect is estimated as the deviation of homozygotes from an assumed mean of zero (Janss et al., 1995).
Sample a [t+1] from where ỹ 3. and ỹ 1 .are the sums of corrected data per genotype A 2 A 2 and A 1 A 1 , respectively, n 1 and n 3 are the number of animals with genotype A 1 A 1 and A 2 A 2 respectively.
Allele frequency : Given the genotypes of the base animals, allele frequency in the base population has a beta distribution (Janss et al., 1995): where N1 is the number of A 1 alleles and N 2 is the number of A 2 alleles in genotypes of base animals.
Ten replicates of Gibbs chains of 50,000 cycles were run, using a spacing of 50 cycles, obtaining 1,000 Gibbs samples per chain and 10,000 samples in total for each trait.A burn-in period of 1,000 cycles was used to allow the Gibbs chains to reach the equilibrium.
Gibbs samples of the following parameters were directly obtained in each Gibbs cycle, variance components σu 2 , σpe 2 and σe 2 , additive major gene effect a, and allele frequencies p 1 and p 2 .These parameters allowed the computation in each Gibbs cycle of the within major gene heritability h 2 , repeatability r, genotypic frequencies f(A 1 A 1 ), f(A 1 A 2 ), f(A 1 A 2 ) and f(A 2 A 2 ) and major gene variance σm 2 .
For the post Gibbs analysis of the samples, an analysis of variance was used to check for equality of chains.This test also yielded information about the dependency of the samples kept (Janss et al., 1997)

within chain (convergence of chain).
It is well known that with Bayesian method, there is no construction of LR test or significance testing similar to what is applied regularly in standard statistical methods such as ML or regression methods.Therefore, we were unable to apply MAGGIC software under two different hypotheses and construct likelihood ratio test to test for a significant contribution of major gene.
However, the highest posterior density regions, HPD, according to Box and Tiao (1973), based on a nonparametric density estimate using the average shifted histogram, were determined for all model parameters.The HPD guarantees that the density of each point within this region is equal or above the density of each point outside it.The HPD allows, e.g., the following reasoning: If the region for a variance component or a frequency includes the boundary value of zero then this parameter is not of importance for this particular trait.In the present investigation 1-α = 0.95 was used to construct the HPD.We could derive HPD regions to test for the significance of the major gene as in other studies (e.g.Miyake et al., 1999;Ilahi and Kadarmideen, 2004;Hagger et al., 2004).
For the post Gibbs analysis of the samples, an analysis of variance was used to check for equality of chains.This test also yields information about the dependency of the samples kept (Janss et al., 1997) within chain (convergence of chain).The marginal posterior means were used as estimator of the parameters.

Detection of major gene for bone lesions in pigs
Osteochondrosis (OC) in pigs is an abnormal bone development represented by ossification on the growth plates and cartilages at bone joints (Kadarmideen et al., 2004).Data on OC were obtained from SUISAG, a stock company for services in pig production including herd book, field and station tests and artificial insemination in Switzerland.Animals subject to performance test are sampled and recorded for osteochondrosis (OC) lesions during slaughter.Trained personnel in SUISAG conducted morphological examinations of front and hind leg bones of slaughtered pigs and manually recorded OC lesions with a score of: 1 = 'Normal' and 2 to 6 = 'mild to severely affected', depending on the lesion.There are 10 different parts of the bone scored OC lesions.To test the MIM methods proposed here, we applied Bayesian Linear Model (BALM) and Bayesian Threshold Models (BATM) to one of these lesions in front legs: condylus medialis humeri (CMH).A total of 1,291 animals were scored for OC lesions (score 1-5).The original data on CMH was a categorical score trait and were treated as normally distributed trait and analysed by BALM.To apply a BATM, the same data set was transformed to binary data sets as follows: animals with a score of '1' received a score of '0' (= 'healthy') and animals with a score of '2' and above received a score of 1 (='diseased').This grouping of categories resulted in a binary data set (0 or 1) with 'healthy' and 'diseased' pigs.With this recoding, incidence of binary CMH lesion was 10.0%.Three replicates of Gibbs chains of 50,000 cycles were run, using a spacing of 50 cycles, obtaining 1,000 Gibbs samples per chain and 3,000 samples.A burn-in period of 1,000 cycles was used to allow the Gibbs chains to reach the equilibrium.Using these 3,000 samples, posterior mean (means) and standard deviations (SD) were computed.

Liability data
The results of parameter estimates by segregation analyses using MLLM for liability data are given in Table 1.The empirical mean of the test statistic (likelihood ratio) comparing mixed and polygenic transmission models was about 165, greatly exceeding 13.3, the tabulated value of x 4 2 distribution at 1% significance level.This has confirmed the true mixed genetic determinism of the simulated trait.
Estimated parameters under mixed inheritance (H 1 ) were similar to the true values of parameters used in the simulation except the major gene variance was underestimated.The estimated favorable allele A 2 frequency was 0.41.The major gene variance accounts for 73% of the total genetic variance of the trait and the difference between the values of the two homozygotes is 3.8 phenotypic standard deviation units of the trait.These estimates were very close to the true values of major gene parameters.However, the estimated parameters under polygenic inheritance (H 0 ) were overestimated, especially for the genetic and the permanent environmental variances.This is explained by the genetic model used in the simulation of data set: the major gene has a large additive effect on the trait.Moreover, under H 0 , the major gene effect was not taken into account to explain the genetic variability of the analysed trait, which resulted in overestimation of genetic and permanent environmental variances.Similar findings were reported in the literature by Ilahi (1999) and Ilahi et al. (2000).

Categorical data
The results of parameters estimates by segregation analyses using MLLM for categorical data are given in Table 1.Under H 1, genetic parameters were underestimated (e.g.true value of h 2 = 0.41 and r = 0.52 vs. estimates of 0.25 and 0.36, respectively).The estimated favorable allele A 2 frequency was 0.37.The major gene accounted for 74% of the total genetic variance of the trait and the difference between the values of the two homozygotes was 3.2 phenotypic standard deviation units.
It should be noted that, under both H 0 and H 1 hypotheses, the estimated variance components and genetic parameters obtained for discrete trait were lower than those for continuous trait (Table 1), as would be expected.These results are similar to those reported in the previous study (Le Roy and Elsen, 1991).Similar findings were also reported for QTL detection in non-normal traits (Rebai, 1997).

Binary data
Maximum likelihood linear model : Results of segregation analyses using this model for binary traits are given in Table 2. Using the formula proposed by Robertson and Lerner (1949), the true parameters values in the normal liability scale used in the simulation were transformed to the observed scale.Kadarmideen et al. (2000b) showed that this transformation also works well for transforming major gene/QTL effects on liability versus observed scales.Here, the transformation of the true values was applied to only high incidence 40% (Table 2), as the transformation using the low incidence 15% resulted in estimates outside the parameters space (e.g.h 2 >1).For both incidences 15% and 40% under H 0 , the estimates of heritabilities and repeatabilities were the same and higher than their expected values on the observed scale.In the case of H 1 , however, the estimates are lower than the true values.This is due to the fact that under H 0 and for both incidences, the 3 variance components (σ u 2 , σ pe 2 and σ e 2 ) estimates were lower than the true values and their scale of underestimation was generally the same, which is not the case under H 1 .Moreover, the MLLM did not allow the estimation of the permanent environmental variance (σ pe 2 ).This may be due to the loss of variability and information when normal distributed data were truncated to 0/1 binary form (Xu and Atchley, 1996;Rebai, 1997;Kadarmideen et al., 2000b).In the study on segregation analyses for binary traits, Miyake et al. (2002) have also found similar problems in the estimation of variance components and to obtain a good convergence to true values.With low incidence (15%), there is an underestimation of the favorable genotype A 2 A 2 frequency; however with high incidence (40%) there is an overestimation.This is similar to earlier findings that statistical power is lower and bias is higher for low incidence than those for intermediate incidence (Kadarmideen et al., 2000b).
Bayesian threshold model : The results of segregation analyses using Bayesian threshold model for binary traits are given in Table 3.These estimates of model parameters are based on 10,000 Gibbs samples from ten replicated chains.Tests for convergence of the Gibbs sampler were performed by comparison of multiple chain output using ANOVA on the total samples.These tests showed that Gibbs samples of parameters (for major gene effect, genotype frequencies and all variances) were not able to achieve a good stationary phase.The density estimates were higher than the true values.Similar finding on the overestimation of major gene variance was reported by Janss et al. (1995).Miyake et al. (2002) have also had problems to obtain a good convergence of the estimated parameters for binary trait.Thaller et al. (1996) found that segregation analysis of binary traits by maximum likelihood method based on the infinitesimal mixed model is computationally feasible only when there are no relationships between parents and no additional random or fixed effects in the model.
Significance testing for major gene under BATM : The highest posterior density regions of major gene variances were from 2.25 to 26.50 and from 3.5 to 29.60 for 15 and 40% incidences respectively, which did not include the value zero.This confirmed the existence of a segregating major gene in both simulated binary datasets.The estimated polygenic and permanent environmental variances using 15% incidence were higher than those using 40% incidence.Conversely, the estimated major gene effect and variance were higher using 40% incidence than using 15% incidence (Table 3).This may be due to the fact that when the founder population has the same allele frequency at the major locus, this might infer a severe under or overestimation problems of parameters (Janss, 2003, personnel communication).Normally the magnitude of estimated parameters (of polygenic background and major gene) increase with the incidence (Kadarmideen et al., 2000b).In this study the genetic variability of simulated trait was almost explained by the major gene effect, this may have reduced the magnitude of parameter estimates of polygenic background when the incidence increases.
Comparison of liability vs. binary data : The expected value of test statistic (likelihood ratio) yielded by segregation analysis of normal data (165) would be expected to be higher than the likelihood ratio given by segregation analysis of the same normal data transformed to categorical or binary data.However, it is not the case in the present study.The empirical means of the test statistic were 511.40, 729.55 and 1,160.83for categorical, 40 and 15% incidences binary data sets respectively.The assumptions of normality for discrete traits considerably increase the test statistic values and may therefore lead to false inference of a segregating major gene.This suggested that the MLLM is sensitive to deviation from normality.Similar finding was reported by Elsen and Le Roy (1990).
For liability data, the estimated polygenic heritabilities and repeatabilities were lowered from H 0 to H 1 , from 0.54 to 0.38 and from 0.80 to 0.51, respectively.Similar trends were observed for categorical trait (from 0.39 to 0.25 and from 0.74 to 0.36, respectively).This was expected due to the taking into account of major gene effect in H 1 (Ilahi et al., 2000).For binary traits, using MLLM, these estimates for both incidences were dramatically lowered from H 0 to H 1 .It decreased from 0.38 to 0.01 and from 0.60 to 0.01 for 15% incidence, and from 0.39 to 0.012 and 0.60 to 0.012 for 40% incidence.Moreover, even with the very large effect of the postulated major gene, the MLLM was not able to obtain accurate estimates for the major gene parameters.However, when using Bayesian threshold method (BATM) we obtained more reasonable and accurate estimates than those obtained with MLLM.This was in agreement with earlier studies (Yi and Xu, 2000).With MLLM method, estimated residual variance (σ e 2 ) did not change from H 0 to H 1 for data on any distribution, there was an underestimation of polygenic variance (σ u 2 ) for discrete traits, especially for binary traits with low incidence and non-estimability of the permanent environmental variance (σ pe 2 ) for binary traits (both incidences).BATM estimates of polygenic effects under 40% incidence were more accurate than those estimates with 15% incidence.Since the loss of information due to the truncation of NDL data to binary data is far larger than the loss of information when NDL data is truncated to categorical data (5 categories), the bias in estimated parameters using MLLM is higher for binary scale than those for categorical scale.Similar results were reported for QTL detection (Rebai, 1997).

Application to bone diseases in pigs
Bayesian methods based on linear models (BALM) applied to CMF observed on original scales (scores 1-5) versus those based on threshold models (BATM) applied to transformed binary scales (0/1) are given in Table 4.Both methods showed a presence of major gene with significant additive effect at the major gene (0.587 for BALM and 4.358 for BATM) and the very high additive genetic variance (0.0477 for BALM and 17.993 for BATM) compared to the polygenic variance for disease.Therefore the heritability at the major gene, h m 2 , was much higher than the heritability at the polygenes, h p 2 .The magnitude of the estimated parameters under BATM were higher than the same parameter estimated under BALM, as expected from the theory (on the continuous liability scale, variables have larger range than those on truncated binary data with probability range from 0 to 1.0).This has also been shown in the other studies (e.g.Kadarmideen et al., 2001Kadarmideen et al., , 2004)).The frequency of allele that increases the incidence of the disease (A 2 ) was high 0.71 for BATM whereas it was 0.38 for BATM.In general, the probability of HPD region (lowest interval with zero) at 95% level was small for both methods.The posterior SDs for all parameters and for both methods was high, confirming that estimation of genetic parameters for categorical (ordinal or binary) traits are difficult to have good precision.
In general, both the methods developed here and application to bone disease data in pigs shows that segregation analysis could be used to first look for evidence of major gene affecting a trait, before embarking on establishment of resource populations for such a trait and genotyping animals for hundreds of DNA markers, as for example, shown in Kim et al. (2003) and Yan et al. (2004).

CONCLUSION
Maximum Likelihood Linear Model (MLLM) segregation analysis accurately estimated major gene parameters for ordinal or score data (e.g.disease severity), similar to those for normal data, under various simulation scenarios.The MLLM method applied to binary traits (e.g.healthy or diseased), however, failed to estimate all the model parameters under both polygenic and mixed inheritance models, whereas, Bayesian Threshold Models (BATM) detected segregating major gene and estimated its parameters.BATM has more accuracy and less bias than MLLM for detecting major genes in binary traits.The application of developed Bayesian method under linear and threshold models to real bone diseases in pigs showed a presence of major gene with significant additive effect at the major gene and a very high additive genetic variance compared to the polygenic variance, indicating that this disease could be under the control of a major gene.The power and precision to detect major genes are generally lower for diseases scored on binary scale than those diseases that are more variable (ordinal or more than 2 categories).When incidence of diseases (on binary scale), is close intermediate levels (near 50%), segregation analysis had more accuracy and less bias compared to diseases with extreme or rare incidences.Normally distributed data (liability) usually had higher accuracy and lesser problems in inference/estimation of polygene and major gene parameters than categorical data.
relationship matrix and I is the identity matrix.For the dispersion parameters, σ u 2 ,

Table 1 .
True values of parameters and parameter estimates by maximum likelihood linear model segregation analyses for liability and categorical data: means and standard deviations of 100 replicates Means with ** are significantly different from the true value (p<0.01).

Table 2 .
True values of parameters and parameter estimates by maximum likelihood linear model segregation analyses for binary trait using two incidences on the observed scale: means and standard deviations of 100 replicates

Table 3 .
True values of parameters and Bayesian marginal posterior means and marginal posterior standard deviations of parameters for binary trait 1 in a mixed inheritance model, based on 10,000 Gibbs samples from ten replicated chains

Table 4 .
Estimated major gene and polygenetic parameters for osteochondral disease in pigs by mixed inheritance models, using Bayesian Linear Models (BALM) and Bayesian Threshold Models (BATM).Results are based on 10,000 Gibbs samples from three L : Highest probability density region at 95% level-lower limit.HPD U : Highest probability density region at 95% level-upper limit. HPD