Methodology of Mapping Quantitative Trait Loci for Binary Traits in a Half-sib Design Using Maximum Likelihood

Maximum likelihood methodology was applied to analyze the efficiency and statistical power of interval mapping by using a threshold model. The factors that affect QTL detection efficiency (e.g. QTL effect, heritability and incidence of categories) were simulated in our study. Daughter design with multiple families was applied, and the size of segregating population is 500. The results showed that the threshold model has a great advantage in parameters estimation and power of QTL mapping, and has nice efficiency and accuracy for discrete traits. In addition, the accuracy and power of QTL mapping depended on the effect of putative quantitative trait loci, the value of heritability and incidence directly. With the increase of QTL effect, heritability and incidence of categories, the accuracy and power of QTL mapping improved correspondingly.


INTRODUCTION
Since the advent of molecular markers, novel statistical techniques to detect and map individual genes affecting quantitative traits have been developed and subsequently refined.Most methods of QTL mapping are based on the interval mapping method, using either least squares regression or maximum likelihood (Haley et al., 1992;Grignola et al., 1996;Weller, 2001;Feingold et al., 2002;Liu et al., 2004;Tae-Hun Kim et al., 2004), in which a putative QTL is fitted at every possible position along a chromosome and the best fitting position is taken as the estimate of QTL position.In maximum likelihood (ML), the likelihood function of the observed data is maximized with respect to all parameters in the model postulated, and hypothesis testing is based on the likelihood ratio statistic.
Many important traits in livestock are binary or categorical traits (observed as two or more discrete categories) with a multifactorial polygenic mode of inheritance.Examples are mastitis, calving ease and insemination success or failure in cattle.Such traits vary in a discontinuous manner but are not inherited in a simple Mendelian fashion.Genetic analyses for categorical traits are difficult because the observed phenotype cannot be described by a linear function of genetic and environmental effect.For such traits threshold models are more appropriate than linear models, which assumes that there is a continuous underlying variable which determines the expression of discrete trait.The link between the observable discrete variable and the underlying variable is generated by a set of fixed thresholds, and the underlying variable is then described by the usual linear quantitative genetic model (Gianola et al., 1982;Falconer and Mackay, 1996).Threshold models have been used to estimate genetic parameters and breeding values for categorical traits in livestock population (Kadarmideen et al., 2000).Interval mapping for QTL affecting a binary polygenic trait in crossbred and outbred populations has also been developed based on threshold model concepts (Yi and Xu, 1999).This method will henceforth be referred to as generalized interval mapping or GIM.The GIM method is computer intensive because the likelihood has to be maximized through an iterative procedure for each putative QTL position.Yi and Xu (1999) reported that GIM had more power than linear model method to detect QTL for categorical traits, which strengthens the need for threshold models in mapping QTL for categorical traits.The advantages of the threshold model over the linear model could increase as complexity increases (e.g.multiple families in outbred populations).Combining data from multiple families is deemed to be more useful in outbred populations.For example, animal and plant breeders usually combine data from many half-or full-sib families.The main advantages of QTL mapping using multiple families are the increased power of QTL detection.However, systematic investigation of QTL mapping for discrete traits under the threshold model has been lacking.Here, we applied maximum likelihood methodology to analyze the efficiency of interval mapping under the threshold model, and analyze the power of QTL mapping in different parameter case.Application of the method is illustrated using a set of simulated data.

Model and estimation
A complex binary trait is assumed to be controlled by a latent variable, referred to as liability, which is considered to be continuous and normally distributed.It can be described by the usual linear model z = g QTL +g poly +e (1 where z is the liability, g QTL is the QTL genotypic value , g poly represents the polygenic effect with a distribution of N(0, σ 2 poly ), and e is the residual with a distribution of N(0, σ 2 e ).The discrete phenotypes are determined by some underlying thresholds.The device that translates liability into discrete phenotype is the physiological threshold model (Wright, 1934).For a discrete trait with two categories of phenotype, there is only one fixed threshold (v).When the value of the liability is above the threshold value, an individual shows one status of phenotype on the observed scale, otherwise, it shows the other status of phenotype.The translation can be summarized by Mapping quantitative trait loci (QTL) for this complex binary traits is more challenging than for normally distributed traits due to the nonlinear relationship between the observed phenotype and unobservable genetic effects, especially when the mapping population contains multiple outbred families.
Under a single locus model for the QTL is assumed with alleles Q and q coming from heterozygous sires and random dam population, the liability of daughters is three mixed distribution.The density can be described as follows Where fre i(k) = Pr(G=k⏐m), means the QTL genotype (k = QQ, Qq, qq) probabilities conditional on marker genotype.The conditional probabilities are summarized in Table 1.f (k) (z i ) represent the normal distribution density of each QTL genotype, assuming with same variance and different mean (a, d, -a).Where a and d means additive and dominance effect of the putative QTL respectively.
The likelihood of population is: where Pr(y i = k⏐m) is the conditional probability of phenotypic status (k = 0, 1) with given marker genotype (m) of the ith individual.
where Pr(y i =1⏐G = k) stands for the conditional probability of phenotypic status 1 with given QTL genotype (k).
where Φ(ξ) stands for the standardized cumulative normal distribution function and ξ is the argument., it stands for the residual of the liability model, and ef QTL(k) is three genotypic effects.Analysis involving Φ(ξ) is referred to as probit analysis.However, the probit model is difficult r is the recombination fraction between the two markers; r1 and r2 is the recombination fraction between the putative QTL and each marker, respectively; p is the frequency of alleles in dam population.
to manipulate because numerical integration is required.So, a logistic model is employed to approximate Φ(ξ) for estimation purpose.Logistic regressions have been used by human geneticists in segregation analysis (e.g.Bonney, 1986).The logistic model is expressed by .Therefore, The maximum likelihood estimation (MLE) for parameters can be derived by computing the derivative of the log of the likelihood with respect to the a, d, v, and setting this function equal to zero.We can obtain a set of equations as follow The MLE of P( k ) can obtain from ( 5), ( 6) and ( 7) using iterative methods, and can obtain the maximum log likelihood value (lnL max ).A statistical test for H 0 is carried out by the likelihood ratio (LRT) approximation.The likelihood ratio test involves calculation of log likelihood under the full model (lnL max ), and under the restricted model (lnL 0 ).The likelihood ratio is 2(lnL max -lnL 0 ), which asymptotically follows a chi-square distribution with two degree of freedom under the null hypothesis.The additive and dominance effect of QTL can obtain the following formula

Experimental design
Half-sib design is applied in our study, and half-sib population was used for linkage analysis of putative QTL and genetic markers.There are 10 sires each of which mate 50 dams, and produce 50 half-sib progeny.So, the size of segregating population is 500 with ten half-sib families.The unrelated individuals in base population are mated randomly.The phenotype for a binary trait is determined by many genes of small effect and by a relatively major QTL with two alleles, Q and q, at population frequencies of p and 1-p, respectively.The putative QTL is situated between two flanking markers, M1 and M2, at a recombination distance of r.The notation will assume only two alleles at each genetic marker, Mi and m i .The linkage phase of QTL and markers is assumed to be known or estimated with a high degree of accuracy.The frequencies of all alleles in the dam population are assumed equal, setting to 0.5, and the population is assumed to be at Hardy-Weinberg equilibrium with respect to the QTL and marker locus.

Data simulation
The QTL additive variance (σ 2 q ) can derive from total genetic variance (σ 2 g ) and QTL effect (a ratio of QTL variance to the total genetic variance, ∆ q = σ 2 q /σ 2 g ).We can educe these variance component giving heritability (h 2 ), ∆ q and phenotypic variance (σ 2 p ).
The genotypic values are derived from Falconer's model, σ 2 q = 2pqa 2 (where σ 2 q is QTL additive variance, p and q are the allele frequencies at the QTL, a is the average effect of gene substitution), assuming that p is equal to q and QTL has no dominance effect in the base population, for a given QTL variance σ 2 q , 2 2 q a σ = , and the genotypic values of the three QTL genotypes QQ, Qq and qq are a, 0 and -a, respectively.
The genotypes of the offspring are generated according to the haplotypes produced by their parents, the polygenic effect of offspring i with sire s and dam d is simulated by gi = 0.5g s i +0.5g d i +m i , where g s i and g d i represent the polygenic value of its sire and dam, respectively, and m i represents the Mendelian sampling effect, which follow a normal distribution with mean 0 and variance σ 2 m (σ 2 m = 0.25[(1- a , where f s and f d are inbreeding coefficients of the sire and dam).The liability is generated in the same way as for the individuals in the base population, assuming σ 2 e is constant across generations.

Parameter case
The recombination rate (r) between two markers is set to be 0.40.The recombination rate (r 1 ) between the putative QTL and left marker is set to be 0.20.The QTL effect is set to be 0.1, 0.3, and 0.50, respectively.The proportion of type 1 animals (incidence, π) is set to be 0.1 and 0.40, respectively.The phenotypic variance is set to be 1, and heritability is 0.1, 0.2, and 0.4, respectively.Therefore, the number of parameter combinations is 18.For each parameter combination we simulate 100 replicates.

Parameters estimation
Average parameter estimates obtained from analyses of data sets simulated under the threshold model are presented in Table 2.The efficiency and accuracy of QTL mapping are affected by heritability directly, and with the increase of heritability, the accuracy of parameter estimates improved correspondingly.The bias between parameter estimates and true value is the lowest under the high heritability and QTL effect.As the usual linear methods, the power of QTL mapping is affected by the putative QTL effect under the threshold model.The efficiency of the interval mapping largely depends on the incidence of categories (π).The maximum efficiency occurs when π is 0.40 in our study.Consider that statistical power is a monotonically increasing function of the heritability of the putative.Then, this heritability has a maximum value when the incidence is 0.50.As the incidence of categories deviates from 50%, more information will be lost, therefore, the efficiency and statistical power of QTL mapping is decreasing.

Statistical power
The power is defined as the proportion of the replicate that the value of the test statistic (LRT) larger than 5.991 (significance level is 0.05).When r = 0.5, the null hypothesis of no linkage is true, and the corresponding power of test is the type I error.As shown in Table 2, the type I error of threshold model under all the parameter case is close to the pre-specified type I error rate of 0.05.This is a robust method even when the incidence of categories is low (π = 0.1).
The values of likelihood ratio (LRT) are affected by heritability and QTL effect level directly.With the increase of QTL effect and heritability, the value of LRT and the power improved correspondingly.The values of LRT are also affected by incidence of categories (π).When π is low, the genotypic information which can be used is less.As a result, it is difficult to detect QTL.The upper and lower limit values of LRT in different replicate and different location are shown in Figure 1.In the true location of the putative QTL, the range and interval of LRT is the widest.n is the number of daughter; h 2 is heritability; ∆ is the QTL effect (QTL variance contribution); π is the incidence of categories; a stands for the genotype value of QQ; r 1 is the recombination fraction between the putative QTL and the left marker; level of significance is 0.05.

DISCUSSION
Many quantitative traits of economical importance are discrete in nature.Although methods of mapping quantitative trait loci for continuous quantitative characters are well developed, such methods for discrete traits are generally lacking.Mapping QTL for this discrete trait is more challenging than for normally distributed traits due to the nonlinear relationship between the observed phenotype and unobservable genetic effects (Xu, 1996;Chen et al., 2004), especially when the mapping population contains multiple outbred families.In this paper, A QTL mapping technique for binary traits in half-sib design is proposed.A binary trait is assumed to be controlled by an underlying liability with normal distribution.The liability is treated by the usual quantitative trait.Translation from the liability into the binary phenotype is through the threshold model.The conditional probability of discrete phenotype given QTL and marker genotypes is described by the probit model.The genotype of a putative QTL is uncertain, but it is inferred from the genotypes of two flanking markers.Therefore, a mixture of likelihood is used for the threshold model analysis.
The interval mapping method for binary characters is usually less powerful than the well developed mapping methods for continuous traits.This is because some information will be lost during the translation from the underlying liability into the observed binary phenotype.The threshold value is the leading parameter that determines the amount of information loss.The threshold determines the incidence of categories in the population of interest.The efficiency of the interval mapping largely depends on the incidence.Although the threshold value cannot be controlled in natural populations, it can be manipulated in designed experiments.
In this paper we only consider the single threshold trait, but for discrete trait with multiple phenotypes or multiple threshold value, threshold model can also be applied.The main difference is the likelihood function which be selected.Example, for litter size, we can apply a Poisson-based model.Here we propose a modified method.We can combine multiple observed phenotypic value into two categories, which transforming discrete trait with multiple thresholds into one with single threshold, e.g. for monotocous species, the litter size of, double and more can be grouped into one category.
Although a series of experimental designs and analysis techniques for QTL mapping based on linkage analysis have been developed, and some QTLs with large effect had been located successfully (Zhang et al., 1998;Lee et al., 2003), linkage analysis based on marker and QTL is still a coarse-scale QTL mapping.It usually maps QTL on a region of 10-20 cM, and for multiple linked QTLs, linkage analysis probably yield larger bias (Zeng et al., 1994).Recombination events are the key to QTL fine mapping.For linkage analysis, marker density and population size are two factors to increase the frequency of the recombination events.But higher marker density doesn't simply imply a better QTL fine mapping (Darvasi et al., 1997), on the other hand, it is not always realistic to enlarge population size.Therefore, QTL fine mapping must look for new strategy, the main technique is to combine linkage analysis with some new methods, such as linkage disequilibrium (LD) analysis, transmission disequilibrium test (TDT) and identity-by-descent-based (IBD), and so on.

Table 1 .
The conditional probabilities of a QTL genotype with given marker genotype (Pr(G = k⏐m)

Table 2 .
Efficiency of QTL detection computed by threshold model (In parentheses are the standard errors (n = 500, r 1 = 0.