Detection of QTL on Bovine X Chromosome by Exploiting Linkage Disequilibrium

A fine-mapping method exploiting linkage disequilibrium was used to detect quantitative trait loci (QTL) on the X chromosome affecting milk production, body conformation and productivity traits. The pedigree comprised 22 paternal half-sib families of Black-and-White Holstein bulls in the Netherlands in a grand-daughter design for a total of 955 sons. Twenty-five microsatellite markers were genotyped to construct a linkage map on the chromosome X spanning 170 Haldane cM with an average inter-marker distance of 7.1 cM. A covariance matrix including elements about identical-by-descent probabilities between haplotypes regarding QTL allele effects was incorporated into the animal model, and a restricted maximum-likelihood method was applied for the presence of QTL using the LDVCM program. Significance thresholds were obtained by permuting haplotypes to phenotypes and by using a false discovery rate procedure. Seven QTL responsible for conformation types (teat length, rump width, rear leg set, angularity and fore udder attachment), behavior (temperament) and a mixture of production and health (durable prestation) were detected at the suggestive level. Some QTL affecting teat length, rump width, durable prestation and rear leg set had small numbers of haplotype clusters, which may indicate good classification of alleles for causal genes or markers that are tightly associated with the causal mutation. However, higher maker density is required to better refine the QTL position and to better characterize functionally distinct haplotypes which will provide information to find causal genes for the traits. (


INTRODUCTION
Mapping chromosomal regions or quantitative trait loci (QTL) responsible for economically important traits in livestock species has been practiced routinely in populations of experimental crossbreds or commercial within-breeds (Andersson, 2001;Kim et al., 2007).Most of the mapping approaches relating to genome-wide scans for QTL are based on 'linkage' analysis, in which the patterns of co-inheritance between a putative gene (or locus) and flanking genetic markers are analyzed in a structured pedigree.However, in this 'linkage' mapping, information on QTL allele segregation most likely comes from a few families and only the chromosomal recombinants that are generated in the observable pedigree in a limited number of generations are utilized; this limits the detection and precise localization of the QTL.
One option to improve the mapping efficiency is to exploit population-wide linkage disequilibrium (LD).This utilizes a resource of historic informative recombinants since a mutation occurred many generations ago in the unobservable pedigree.LD has been shown to extend over long distances (on the order of tens of centimorgans) in livestock species (Farnir et al., 2000;McRae et al., 2002;Harmegnies et al., 2006), in contrast to the small extent of LD in the human genome.This can allow efficient implementation of QTL fine mapping even with mediumdensity microsatellite maps (www.animalgenome.org).Some QTL studies on farm animals that exploited LD information also showed promising (mapping power/ precision) results (Kim and Georges 2002;Blott et al., 2003;Schnabel et al., 2005;Kim and Farnir 2006).
Some studies on QTL of economic importance on X chromosome were reported in pig QTL populations under experimental breed cross models (Knott et al., 1998;Harlizius et al., 2000;Malek et al., 2001).However, QTL mapping studies in dairy cattle commonly use the granddaughter design (Weller et al., 1990), in which only sires and their sons are genotyped and information from maternal gametes are not considered.Thus, this design will not allow QTL analysis of the X chromosome by conventional linkage analysis.
In this study, a LD mapping approach was presented as an extension of Meuwissen and Goddard (2001), which included a hierarchical haplotype clustering step to detect QTL responsible for apparent sex-related characteristic that resided on the bovine X chromosome (BTX).

Pedigree material and phenotypes
The pedigree material used in this study was the previously described Holstein-Friesian "grand-daughter design" (GDD) sampled in the Netherlands and was composed of 22 paternal half-sib families for a total of 955 sons (Fanir et al., 2000).The phenotypes of the sires were "daughter yield deviations" (DYD) that were obtained directly from Holland Genetics (Arnhem-The Netherlands).DYDs correspond to unregressed weighted averages of the daughters' lactation performances adjusted for systematic environmental effects and breeding values of the daughters' dams and are expressed as deviations from the population mean (van Raden and Wiggans, 1991).Forty-seven traits including milk production, body conformation, fertility, udder health, and productivity were analyzed in this study.Detailed descriptions of the traits can be found in Sandor et al. (2006).

Marker genotypes
As the records or genotypes of dams are not known in the GDD, only BTXs of sons (N = 955) were considered in this LD mapping analysis.Twenty two-and 3microsatellite markers were genotyped on the non-pseudoand pseudo-autosomal region, respectively.As the linkage map construction was not available in the GDD, map distances between the markers were obtained from publicly available maps (www.animalgenome.org).This Xchromosome covers 170 Haldane cM (135 cM for nonpseudo-autosomal region) with an average distance of 7.1 cM between adjacent markers.

QTL mapping exploiting LD
To test for the presence of a QTL at map position p of BTX, Identity-by-descent (IBD) probabilities (φ p ) were computed for all pair-wise combinations of the son's BTX haplotypes according to the protocol of Meuwissen and Goddard (2001).However, as the derivation of IBD probability in their report was based on autosomal chromosomes, some equations were modified to fit to an Xchromosome analysis in non-pseudo-autosomal regions (Sandor et al., 2006).This method approximates the probability that two chromosomes are IBD at a given map position (p) conditional on the identity-by-state (IBS) status of flanking markers in the defined haplotype based on a coalescent model (Meuwissen and Goddard, 2001).Two parameters, the number of generations since the founders (T) and the effective population size (Ne), are assumed to be known under the coalescent model, such that the most likely set of T = 10 was based on the history of the population (Farnir et al., 2000).Ne was set at 200 (thus male Ne (Nm) = 100 and female Ne (Nf) = 200 for the Xchromosome), which was estimated as the average of all pairs of syntenic markers in autosomal chromosomes using the equation Ne = 1/(4r 2 c), where r 2 is a LD measure and c is the recombination rate between the two markers (Sandor et al., 2006).A set of ten linked markers was considered a haplotype unit to compute φ p.
Using (1-φ p ) as a distance measure, a hierarchical clustering algorithm, unweighted pair-group method average (UPGMA) was applied to generate a dendrogram representing the genetic relationship at position p between all haplotypes encountered in the population (Mount, 2001).
The logical framework provided by this dendrogram was applied to group the haplotypes in functionally distinct clusters.In this work, the clusters were defined such that all haplotypes within a cluster had a distance measure of (1-φ p ) <T (Blott et al., 2003).
The following linear mixed model was used: y is the vector of all sons' phenotype records.b is a vector of fixed effects that reduces to the overall mean in this study.X is an incidence matrix relating fixed effects to individual sons, which in this study reduces to a vector of ones.h is the vector of random QTL effects corresponding to the defined haplotype clusters.Z h is an incidence matrix relating haplotype clusters to individual sons.In Z h , only one element per line has the value of "1" in the column corresponding to the cluster to which the haplotype belongs.
Haplotype cluster effects with corresponding variance, σ 2 H , individual polygenic effects with corresponding variance, σ 2 A , and individual error terms with corresponding variance, σ 2 E , were estimated using the AIREML program (Johnson and Thompson, 1995) by maximizing the natural log restricted likelihood function L: In this equation, V equals: The covariance between the QTL effects of the different haplotype clusters is set at zero based on the principle of phylogeny that each branch has been evolved independently.Thus H reduces to an identity matrix with a size equal to the number of clusters.A is the additive genetic relationship matrix.
Steps 3 and 4 are repeated for all possible values of T (from 1 to 0 by 0.01 decrement units), in order to identify a restricted maximum likelihood (REML) solution for map position p.The hypothesis corresponding to this REML solution is denoted as H 1 .
The likelihood of the data under the H 1 hypothesis was compared with that under the null hypothesis, H 0 (σ 2 H = 0), of no QTL at map position p.The latter is computed as described above but using the reduced model: Evidence in favor of a QTL at map position, p, can then be expressed as a LRT (likelihood ratio test statistic) score: Hypothetical positions of the QTL were chosen at the point near the former marker and at the mid-point of each interval throughout the BTX map totaling 48 map points, and LRT scores were computed at each map position to generate chromosome-wide LRT score profiles using LDVCM software (Blott et al., 2003;Schnabel et al., 2005).

Significance thresholds
Significance threshold values for QTL detection to control the experiment-wise type I error rate were based on single trait analyses.Three hundred permutations were performed for the seven traits that had LRT values higher than 10 at most likely (ML) positions by randomly shuffling linked sets of markers and assigning them to phenotypes to determine threshold values of the null hypothesis.From the obtained empirical distributions for the ML QTL positions, LRT p had a chi-squared distribution with 2 degrees of freedom corrected (Bonferroni correction) for 4.5 to 7 independent traits when testing the seven traits.The α values of the null hypothesis obtained at the ML positions were then compared with the 'suggestive' level (α = 0.033), for which it is expected to obtain one significant result per genome scan by chance when considering that a total of 30 independent chromosomes were analyzed, each with a probability P of yielding a significant result (assuming that the number of significant chromosomes follows a binomial distribution, the expected number of significant chromosomes, 30P, is then set equal to one), and the 'significant' level (α = 0.0017), for which one significant result in twenty genome scans is expected by chance and thus (1-0.0017) 30= 1-0.05(Lander and Kruglyak, 1995).
An alternative to obtain significance threshold values was false discovery rate (FDR), which defines the expected proportion of true null hypotheses among the tests in which the null hypotheses were rejected (Benjamini and Hochberg, 1995;Weller et al., 1998).The expected FDR can be computed as FDR i = N×P (i) /i, where P is the comparisonwise type I error rate for a test statistic, i is the number of the null hypothesis ranked by descending P and N is the total number of tests (47 traits×48 map points = 2256).P values can be obtained from the one half chi-square variable with 1 degree of freedom because the 2×LRT in a variance component model with H 0 of σ 2 H = 0 has a test statistic with a mixture distribution of 0.5×chi-square variable with 1 df and 0.5×a point mass at zero (Self and Liang, 1987).However, the test statistic with chi-square distribution of 1 df was used to standardize p values such that the maximum p value should be 1.

RESULTS
Among the forty-seven traits analyzed in this study, seven QTL responsible for conformation types (teat length, rump width, rear leg set, angularity and fore udder attachment), behavior (temperament) and mixture of production and health (durable prestation sum) were detected at the suggestive level (Table 1 and Figure 1).The QTL with the highest LRT value (14 LRT value), responsible for 'teat length', was found at 113.1 cM between HAUT37 and XBM16 (α = 0.0059).A QTL for durable prestation was localized at 142.3 cM in the pseudoautosomal region flanked by TGLA325 and MAF45.The two QTL for rump width and temperament were positioned at 130.8 cM in the marker bracket of MCM74 and BMS911 neighboring the pseudo autosomal region.Three QTL for fore udder attachment, rear leg set and angularity were positioned in the marker brackets of URB010-BMS811 (23.9 cM), BMS811-BMS1616 (42.7 cM) and BMS417-ILSTS017 (68.3 cM), respectively.The estimates of proportion of trait variance due to the QTL were largest for teat length (12.6% for the QTL at 113.1 cM), followed by durable prestation (8.1%), angularity (6.6%) and the rest of the traits (below 5%) (Table 1).
The FDR for the first ranked test for teat length was 42% and decreased gradually; the distributions of the FDR values showed zigzag patterns around 20% along the top twelve tests (Table 1).This was due to disproportionate changes in the numerator and denominator of the FDR, in which increments of p values were small while one unit in the denominator increased per each ranked test (Weller et al., 1998;Lee et al., 2002).The FDR values never went below 10% for these top ranked tests and none of the α values reached the 'significant' level (0.0017) of QTL evidence.
The numbers of haplotype groups that generated the highest LRT values were 3, 4, 7 and 8 for the QTL influencing teat length, rump width, durable prestation and rear leg set, respectively (Table 1).

DISCUSSION
QTL mapping by exploiting LD information has been used to fine-map the position of QTL that are detected and localized with a moderately sized confidence region by conventional 'linkage' analysis in livestock species (Kim and Georges 2002;Blott et al., 2003;Schnabel et al., 2005).To efficiently refine the QTL position, a high marker density is required, the degree of which depends on how much LD is distributed in the tested chromosomes (Georges and Andersson, 1996).
In this study, a primary gene scan was performed on BTX with an average marker distance of 7.1 cM by applying a variance component approach, where the relationships (IBD) between random QTL alleles in the corresponding multi-marker haplotypes in sire×chromosomes were used with LD information extracted from the unobservable pedigree under a coalescence model.From the LD mapping analyses, five QTL for body conformation, one QTL for behavior characteristics and one QTL for a mixed trait of milk production and health were found (Table 1).Further QTL evidence could be confirmed with higher marker density around the QTL region, e.g., QTL studies using LD and linkage information on BTA14 and 20 of a Holstein dairy cattle population (Kim and Georges, 2002;Blott et al., 2003).Sandor et al. (2006) previously reported detection of QTL on BTX using the same LD model, phenotypes and marker genotypes as in this study.They found five QTL affecting fat yield, direct durability, milking speed, durable prestation and rear leg set, with limited statistical support, i.e., at the marginal 5% chromosome-wide level.However, this study confirmed only the latter two QTL, and another five QTL were detected (Table 1).The differing results from the two studies may be caused partly by application of different methods to obtain significance thresholds, i.e., simulations vs. permutations in Sandor et al. (2006) and this study, respectively.Also, two QTL map positions were chosen in each marker interval in this study, i.e., the point, 0.1 cM left of the former marker and the mid-point of the interval, while Sandor et al. (2006) investigated QTL only at the mid-points of intervals.Two QTL for angularity and fore udder attachment were detected at positions near the left markers in their respective intervals (Figure 1 and Table 1).In addition, all of the detected QTL in Sandor et al. (2006) were detected with limited statistical support, which is likely to be undetected when using different significant thresholds methods.
Application of LD mapping can be justified to detect QTL in BTX, for which the conventional linkage analysis cannot be conducted in the GDD design by exploiting extensive LD (Sandor et al., 2006).Also, exploiting LD information by using multi-marker haplotypes in a variance component approach enabled QTL detection and high precision for QTL localization in the simulation studies of Meuwissen and Goddard (2000), where the most likely QTL positions were within a 3-cM interval and in which the real QTL was positioned in the middle in 76% of replicates with 1 cM inter-marker distance, 500 phenotypes and 100 generations since the founders (T = 100).However, they reported that the QTL position was poorly estimated in the simulations with 1, 0.5 or 0.25 cM marker distance when T = 10 (the parameter estimate used in this study).The detected QTL would be in the flanking or right marker intervals where the true QTL reside, because the sizes of marker intervals are large (average 7.1 cM) and small numbers of T would generate large non-recombined blocks and allow flanking markers to be in LD state.This was confirmed by analyses on the simulation data that were generated according to the population structure of this study (Farnir et al., 2000), in which most QTL with at least moderate size were detected on the right or flanking intervals in a 5 cM inter-marker map on autosomal chromosomes when T = 10 (results not shown).Also, when T was set to 100, all of the most likely positions for the detected QTL were in the same or flanking marker intervals (results not shown).
The small numbers of haplotype clusters, e.g., 3, 4, 7 and 8 for the QTL influencing teat length, rump width, durable prestation and rear leg set, may indicate good classification of alleles or QTN (quantitative trait nucleotide) within the haplotype clusters for causal genes or markers that are tightly associated with the causal mutation (Kim and Georges, 2002;Schnabel et al., 2005).However, there was no consistent marker haplotypes (at least for the flanking markers) within the defined haplotype clusters (results not shown).This may be due to low densities of marker distributions (about 2 or 3 cM distance between the flanking markers) around the QTL region, which allowed recombination between QTL and neighboring markers to occur more frequently.Therefore, higher maker density is needed to better characterize functionally distinct haplotypes and to refine the QTL position.

Figure 1 .
Figure 1.Likelihood ratio test (LRT) score profiles obtained for teat length, rump width, rear leg set, durable prestation, temperament, angularity and fore udder attachment along the X chromosome (BTX) using the LDVCM programs.The names of the markers on the map are given at the top of the figure and their respective positions in centimorgan (Haldane) at the bottom.

Table 1 .
QTL on BTX that were detected with at least suggestive evidence Position (Haldane cM) at which the test statistic was maximized for the QTL model.b Number of clusters in the haplotype dendrogram that yielded the maximum test statistic.c Likelihood ratio statistic value (two times difference of the likelihood values between the full model including QTL allelic effect and the reduced model with no QTL effect.d Chromosome-wide p value taken from the adjusted chi-square distributions with 2 df under the null hypothesis of no QTL determined from 300 permutations.e FDR: false discovery rate (= p-value×N/Rank), where p value is comparison-wise type I error rate (at the point-wise level) taken from chi-square distribution with 1 df, N is number of all tests (47 traits×48 map points = 2,256), and Rank is the number of the null hypothesis ranked by descending p values across all N tests.Fraction of the trait variance due to the QTL, computed as Fraction of the trait variance due to the polygenic background, computed as a f g