Bootstrap Analysis and Major DNA Markers of BM 4311 Microsatellite Locus in Hanwoo Chromosome 6

LOD scores related to marbling scores and permutation test have been applied for the purpose detecting quantitative trait loci (QTL) and we selected a considerable major locus BM4311. K-means clustering, for the major DNA marker mining of BM4311 microsatellite loci in Hanwoo chromosome 6, has been tried and five traits are divided by three cluster groups. Then, the three cluster groups are classified according to six DNA markers. Finally, bootstrap test method to calculate confidence intervals, using resampling method, has been adapted in order to find major DNA markers. It could be concluded that the major markers of BM4311 locus in Hanwoo chromosome 6 were DNA marker 100 and 95 bp. (Asian-Aust. J. Anim. Sci. 2004. Vol 17, No. 8 : 1033-1038)


INTRODUCTION
The problems on detecting and locating quantitative trait loci (QTL) have received considerable attention over the past several years.A variety of methods have been developed to analyze quantitative trait data (Weller, 1986;Lander and Bostein, 1989;Churchill and Deorge 1994).Many research groups have intensively analyzed the linkage between markers and traits, in order to identify the chromosomal regions responsible for economically important traits such as meat quality and carcass length.Some traits such as "double muscle" in cattle, and RN in swine were revealed to be the results of particular genes (McPherron and Lee, 1997).Such identification of genes responsible for traits requires huge research work, time and some luckiness.If gene arrangement along chromosomes is determined completely or nearly completely, one can select genes candidate for traits very efficiently, and speed up identification of the genes responsible for the traits.A common problem to all of these methods is the difficulty of determining appropriate significance thresholds (critical value) against which to compare test statistics (usually LOD scores or likelihood ratios) for the purpose of detecting QTL.Knott and Haley (1992) used simulation study for the distributional properties of likelihood ratio tests for QTL detection.They suggested that the chi-square approximation to the distribution of likelihood ratio test statistic is not reliable in many cases and is needed further theoretical work.In 1994, Churchill and Deorge proposed permutation tests to detect QTL effect in the genome.An introduction to the theory of permutation testing is provided by Good (1994).
In this paper, we try a method based on the concept of permutation test (Good, 1994).Because major LOD scores candidate don't have theoretical significant levels (critical value or p-value).10,000 repetitions of permutation process were used for critical value.BM4311 microsallite locus was selected by permutation test and includes 6 genes which are DNA marker 95, 100, 103, 105, 107 and 110 bp.Next, the relations between DNA markers and the economic trait are identified by K-means clustering analysis.Finally, we applied the bootstrap test (Visscher et al., 1996) to calculate confidence intervals of QTL locations for traits.The number of bootstrap samples for each DNA was 1,000 and 95% confidence intervals were calculated for economically important traits (Figure 2 through 6).

Animals and trait
One hundred thirty seven steers from 10 paternal halfsib families were used for linkage mapping and QTL from Hanwoo Improvement Center, National Agricultural Cooperation Federation, Korea.Daily gain from birth to 720 days of age and marbling scores at slaughter of 720 days of age were measured.Marbling was scored as 19 degrees and classified by 1+, 1, 2 and 3 for market systems.The grading of marbling score, backfat thickness and M. longissimus dorsi area was measured according to standards of the Korean Animal Products Grading Service.

Permutation tests
LOD graphs for detecting and locating quantitative trait loci (QTL) from the Hanwoo marbling scores (exceed 3) have been selected in Table 1.But LOD scores at which significance is declared cannot be obtained theoretically, we applied the genomewise (experimentwise) permutation test (Churchill and Doerge, 1994).A permutation test in the simplest case is used to detect a location shift in data that are divided into two sets of observations.We will follow the five-step procedures (Good, 1994;p20): Step 1 Analyze the problem (hypothesis, distribution drawn etc.).
Step 2 Choose the test statistic (sum of observations in first sample) which will distinguish the hypothesis.
Step 3 Compute the test statistic for the original labeling of the observations, Step 4 Rearrange (permute) the labels and recomputed the test statistic for the arrange labels.Repeat until you obtain the distribution of the test statistic for possible permutations.
Step 5 Calculate labels of significances using this permutation distribution of the statistic.
An empirical 100 (1-P) percentile obtained by 10,000 repetition of permutation process was referred to as an estimated critical value of genomewise significance level of P. The critical value of p=0.01 was used to detect the presence of a QTL somewhere in the genome so that the type I error rate may be 0.01 or less (Table 1).Six loci, including BM4311, have been selected.After permutation test, we need to identify the major DNA marker mining of BM4311 based on economically important traits such as meat quality and carcass length.

K-means clustering method
Grouping or clustering can provide an informal means for assessing dimensionality, identifying outliers and suggesting interesting hypotheses concerning relationships.K-means method, which was suggested by MacQueen, is non-hierarchical clustering techniques.The process is composed of the following three steps: Step 1 Partition the items into k initial clusters.
Step 2 Proceed through the list of items, assigning an item to the cluster whose centroid (mean) Recalculate the centroid for the cluster receiving the new item and for the cluster losing the item, Step 3 Repeat step 2 until no more reassignments take place.
Distance is usually computed using Euclidean distance with either standardized or unstandardized observations χ i (i=1, •••, n).That is, from data matrix X and variancecovariance matrix S.
the Euclidean distance depends on (hand standardized Euclidean distance depends on Rather than starting with a partition of all items into K preliminary groups in step 1, we could specify K initial centroids and then proceed to step 2.
The K-means for BM4311 results have been obtained in Table 2 through Table 4 with Figure 1.
That is, five traits are divided by three cluster groups and the cluster groups are classified according to genes (DNA marker bps).(Good, 1994).( )

Bootstrapping (BCa (bias-corrected and accelerated)) analysis
Bootstrap samples were created by sampling with replacement of n individual observations.An observation consists of a marker genotype and a phenotype.So, at each bootstrap sample, we draw, with replacement, n observations out of the pool of (n) original observations.Some records can appear more than once in a bootstrap sample, while others are not included at all.After n bootstrap samples, the empirical central 90 and 95% confidence interval (CI) of the QTL positions was determined by ordering the n estimates and taking the bottom and top 5th and 2.5th percentile, respectively.The bootstrap idea is simply to replace the unknown population distribution with the known empirical distribution function.
The bootstrap distribution for is the distribution obtained by generating values by sampling independently with replacement form the empirical distribution F n.The bootstrap estimate of the standard error of is then the standard deviation of the bootstrap distribution for .It should be noted here that almost any parameter of the bootstrap distribution may serve as a "bootstrap" estimate of the corresponding population parameter.We could consider the skewness, the kurtosis, the median, or the 95th percentile of the bootstrap distribution for .The basic idea behind the bootstrap is that the variability of θ* around will be similar to the variability of around σ.There is good reason to believe this will be true for large sample sizes, since as n gets larger and larger F n is almost like random sampling from F.
We have the following steps to produce BCa (biascorrected and accelerated) bootstrap intervals: Step 1 : Generate a sample of size n with replacement from the empirical distribution Step 2 : Compute θ*, the value of obtained by using the bootstrap sample in place of the original sample Step 3 : Repeat steps 1 and 2 k times.Step 4 : The BCa interval endpoints are also given by percentiles of the bootstrap distribution.But, the percentiles used depend on two number α (acceleration) and Z 0 (bias-correction).
Z (α) is the 100th percentile point of standard normal distribution.
If α and 0 equal zero then the BCa interval is the same as the percentile interval If α and 0 are not-zero then the BCa interval endpoints change.Bias-correction Z 0 is obtained Φ -1 is the inverse function of standard normal cumulative distribution function.

QTL methodology
LOD graph and permutation test for detecting and locating quantitative trait loci (QTL) from the Hanwoo marbling scores have been selected in Table 1.We select several candidate loci that the maximum LOD score exceeds 3 (It is generally considered significant, Chotai, 1984).But LOD scores at which significance is declared cannot be obtained theoretically; we applied the genomewise (experimentwise) permutation test (Churchill and Doerge, 1994).An empirical 100 (1-P) percentile obtained by 10,000 repetition of permutation process for each locus was referred to as an estimated critical value of genomewise significance level of P. The critical value of p=0.01 was used to detect the presence of a QTL somewhere in the genome so that the type I error rate may be 0.01 or less (Table 1).
In Table 1, AFR227 is not significant statistically, but others keep very significance level of P. In particular, ILSTS035 and BM4311 were demonstrated best.In this paper, first, we want to try a major DNA marker mining of BM4311 microsatellite locus in Hanwoo chromosome 6.

K-means clustering and results
One hundred thirty seven steers from Hanwoo Improvement Center, National Agricultural Cooperation Federation, Korea have been used for the analysis.We analyzed BM4311 micro locus in chromosome 6.The obtained DNA markers were eleven including 95, 100, 103 bp etc. and five economic traits which were marbling score, daily gain, backfat thickness, M. longissimus dorsi area and carcass weight.
K-means clustering analysis method with those data above has been applied and five traits were divided by three cluster groups in Table 2 and DNA markers were clustered in Figure 1.In Table 2, we conclude that cluster 1 is backfat thickness useful group (high value=0.34431),Cluster 2 is marbling score useful group (high value=1.265126),Cluster 3 is carcass weight, daily gain and M. longissimus dorsi area useful group.Next, Figure 1 represents clustering proportional comparisons for DNA markers.Cluster 1 has much proportion for DNA marker 103, 105, 107, and 110 bp, Cluster 2 has much proportion for 100bp, and Cluster 3 has much proportion for 95 and 110 bp.But 110 bp is very a few (n=4) and it may not sufficient to decide conclusions.
Similarly, we have standardized mean results of DNA markers in BM4311 based on traits, in Table 3.DNA marker 100 bp has higher standardized marbling score (0.2996), 95 bp has higher values for all traits except marbling score.A marker 110 bp has higher backfat thickness and carcass weight.
Based on Table 2, 3 and Figure 1, we summarize the results in Table 4.That is, DNA marker 110 bp is useful for backfat thickness but not enough data, 100 bp is useful for marbling score and 95 and 110 bp are useful for daily gain, M. longissimus dorsi area and carcass weight.But, although 110bp may be an important for backfat but the individuals are just 4, it is insufficient for the conclusion.Now, we want to try more specific analysis for DNA marker that is bootstrap testing method.

Bootstrap (BCa method) analysis
Bootstrap samples were created by sampling with replacement each individual DNA marker and trait.We applied the bootstrap testing method (Visscher et al., 1996) to calculate confidence intervals for finding major DNA markers.The number of bootstrap samples for each DNA were 1,000 and 95% confidence intervals of bootstrap testing were calculated for five traits, i.e. marbling score, daily gain, backfat thickness, M. longissimus dorsi area and carcass weight (Figure 2 through 6).
In Figure 2, DNA marker 100 bp has better marbling interval (7.6536-9.6626)and mean 8.6125 than others.DNA 100 bp has a little better bootstrap confidence interval for backfat thickness in Figure 3.In Figure 4, DNA 95 bp has a high mean value 0.6898, but a little wide confidence interval for daily gain.In Figure 5 and 6, DNA 95 bp is good trait about M. longissimus dorsi area and carcass weight.But DNA 110 bp showed that it is bad influence marker for M. longissimus dorsi area and carcass weight.Therefore, we conclude that DNA marker 100 bp presents good confidence interval for marbling score and 95 bp for M. longissimus dorsi area and carcass weight.

DISCUSSION
LOD scores related to marbling scores and permutation test have been applied for the major DNA marker mining.QTLs for BM3026, BMS690, ILSTS035, BM4311, BMS511 and BMC4203 were demonstrated significances, whereas no significant QTL was detected for AFR227.BM4311 microsatellite was selected as most considerable major locus.Next, K-means clustering, for the major DNA marker mining of BM4311 microsatellite loci in Hanwoo chromosome 6, has been tried and five traits are divided by three cluster groups.Then, the three cluster groups are classified according to genes (DNA marker bps).It was shown DNA maker 95, 100 and 110 bp are most useful genes in BM4311 locus.By the way, DNA marker 110 bp is useful for backfat thickness but not enough data.DNA marker 100 bp is useful for marbling score.DNA marker 95 and 110 bp are useful for daily gain, M. longissimus dorsi area and carcass weight.Furthermore, the sample size 4 of DNA marker 110 bp is insufficient for the conclusion.Therefore, we applied the bootstrap test to calculate confidence intervals for traits.The 95% confidence intervals were calculated for all traits and then DNA marker 110 bp showed bad influence marker for both M. longissimus dorsi area and carcass weight.We concluded that the major markers of BM4311 locus in Hanwoo chromosome 6 are only both DNA 100 and 95 bp.In future research, we have to apply 100 and 95 bp of BM4311 locus not only progeny testing Hanwoo but performance Hanwoo.
By replicating steps 1 and 2 k times, we obtain a Monte Carlo approximation to the distribution of θ*.Let * (α) indicate the 100×αth percentile of B=1,

Figure 4 .
Figure 4. Bootstrap confidence intervals for BM4311 daily gain

Table 1 .
Permutation test results of Hanwoo chromosome 6 based on marbling score * Test statistic for this significance level is sum of observations in first sample

Table 3 .
Standardized mean results of five traits and DNA markers in BM4311 ( ): total number of individuals.

Table 4 .
Clustering comparison between standardized means and K-means mining results