Simulation Study on Parentage Analysis with SNPs in the Japanese Black Cattle Population

,


INTRODUCTION
The production of livestock animals through such reproductive technologies as artificial insemination and embryo transfer is common.Although the use of these technologies has many advantages, it involves the risk of intentional or accidental recording of incorrect parental information; from the perspective of consumer protection, this risk should be minimized to the greatest possible extent.The accumulation of correct pedigree information is also fundamental to the precise prediction of genetic merits of animals.For the avoidance of the incorrect pedigree recording, importance of parentage analysis using polymorphic DNA markers is recognized in the field of livestock production (Heaton et al., 2002;Werner et al., 2004;Dodds et al., 2005;Eenennaam, 2007;Gomez-Raya et al., 2008).The use of SNPs for analysis is recommended because of its low mutation and genotyping error rates, abundance across the genome, and adaptability to low-cost genotyping (Anderson and Garza, 2006;Rohrer et al., 2007;Baruch and Weller, 2008).
Parentage analysis techniques are largely classified as exclusion and likelihood-based methods.Exclusion methods exclude the parentage of a putative parent who has an incompatible genotype as a parent, based on Mendelian rules of inheritance; likelihood-based methods assign parentage to the putative parent having the largest likelihood ratio among several putative parents.The accuracy of both techniques depends on such specific breeding and genetic structure in the population of interest as allele frequencies, genotype frequencies, mating systems, the number of putative parents, and the degree of their relatedness.Although there are a number of recent theoretical developments in likelihood-based parentage analysis for livestock (Dodds et al., 2005;Hill et al., 2008) and natural populations (Marshall et al., 1998;Jones and Ardren, 2003;Kalinowski et al., 2007), most methods are evaluated by simulations assuming randomly mating population under Hardy-Weinberg equilibrium (HWE) or nearly so.Therefore, the results of such evaluations are not simply applicable to the livestock population under intensive selection.
Currently, in the Japanese Black cattle, an exclusion method using microsatellites is carried out for candidate AI sires before selection and offspring produced by embryo transfer, only to confirm that their fathers or mothers are correctly reported.However, an alternative SNP system aimed at parentage analysis for the breed has recently been developed, and nearly 80 loci are now available (Dr.Mannen of Kobe University, personal communication).Evaluation of accuracy prior to the introduction of a new DNA marker set is necessary.In addition, since the introduction of the likelihood-based method widens the applicable scope of parentage analysis (e.g., when several putative parents cannot be excluded by a given marker set), the accuracy of this type of parentage analysis is also of interest.
In this study, we used computer simulations to examine the precision of several parentage tests using SNP system.In the simulations, pedigree data of subpopulations of Japanese Black cattle were used to examine breeding and genetic structure largely deviating from the randomly mating population.

Data
We simulated situations in which parentage tests were carried out for all the offspring born in each of the two Hyogo and Shimane prefecture subpopulations in 2004.We examined whether the difference in degree of relatedness between individuals in each population affects test precision.Pedigrees of Japanese Black cattle, which are recorded by the Wagyu Registry Association, can be traced to the ancestral population of the foundation period (1944 or before).All available pedigree records of the two subpopulations were used in the simulation.According to conventional pedigree analysis, ancestors with unknown parents are referred to as founders and all their descendants as non-founders.Table 1 presents the numbers of offspring in the two subpopulations, along with those of their sires, dams, and all the ancestors that appeared in the pedigree data.Since the subpopulation of Hyogo has been closed to other populations for a long time, the amount of inbreeding and relationships among animals is considerably higher than for the other subpopulations in the breed (cf.Honda et al., 2002).Average inbreeding coefficients and average coancestries of the two offspring populations are also presented in Table 1.

Simulation
We conducted two simulations.Simulation 1 investigated the precision of current parentage tests carried out in the Japanese Black cattle, with the assumption that the SNP system was used instead of microsatellites.Simulation 2 examined the extent to which the likelihoodbased method is useful if used with the SNP system.
In the simulation, all the offspring and parents were assumed to be genotyped at n l (= 10, 30, and 50) unlinked SNP loci segregating two alleles with minor allele frequencies (MAF) of q (= 0.05≤q≤0.15,0.25≤q≤0.35,and 0.45≤q≤0.5) in each parental population of sires and dams (i.e., nine conditions in terms of marker information were simulated).To generate possible genotype patterns of all the offspring and their parents, slightly modified gene dropping simulation (MacCluer et al., 1986) was applied to the pedigree data.For a single locus, the simulation randomly generated the genotype of each founder by sampling two alleles from a hypothetical base population, in which frequencies of the two types of alleles followed the Dirichlet distribution with all parameters set to one (i.e., the marginal distribution of each allele frequency is uniform).Following the Mendelian segregation rules, alleles assigned to the founders were then transmitted along the pedigree, and all the genotypes of the non-founders in the pedigree were determined.Mutations were ignored because SNPs have considerably low mutation rates and would not substantially affect the results.This process was repeated until n l loci having MAF of q in the parental populations of each sex were generated.
We assumed in the simulation that the dam of each offspring is known but the sire is unknown, and investigated the results of excluding a putative sire that is not the true sire of the offspring (simulation 1), and of sire identification among multiple putative sires related to each other (simulation 2).
Simulation 1 : After the genotypes of all the offspring and actual parents in the pedigree were determined at n l loci, one hypothetical full-sib was generated for each actual sire, using the simulated genotypes of sires and dams of the actual sires (i.e., paternal grand sire and grand dam of the offspring).Hypothetical paternity tests were then carried out for each offspring, assuming that the full-sib of the true sire was a putative sire.If incompatibility of genotypes between the putative sire and offspring-dam set was found at one or more loci, the test was regarded as successful.The success rate of this paternity test in each subpopulation was computed as , where N T is the number of tested trios (set of offspring, dam, and putative sire), which is equivalent to the number of offspring in each subpopulation, and n FS is the number of trios in which a putative sire can be excluded from paternity.
Because paternal grand sires of the offspring also seem to have comparative relationships to the actual sires, another hypothetical test was carried out assuming that the paternal grand sire of the offspring was a putative sire.The success rate was computed as , where n FATH is the number of trios in which a putative sire (paternal grand sire of the offspring) can be excluded from paternity.
Simulation 2 : In a manner similar to that used in simulation 1, N FS (= 1 to 50) hypothetical full-sibs were generated for all the actual sires after genotype determination, and hypothetical paternity identification tests were carried out for each offspring, assuming that N FS +1 males, including the true sire of the offspring, were putative sires.Each identification test began with the exclusion method, so that every putative sire having an incompatible genotype as a parent at one or more loci was excluded from paternity.If all the putative sires except the true sire were excluded, the test was considered to be correctly accomplished.If several sires, including the true sire, could not be excluded, the LOD score (natural logarithm of likelihood ratio, cf.Meagher, 1986;Marshall et al., 1998;Jones and Ardren, 2003) was calculated for each nonexcluded putative sire, and paternity was assigned to the putative sire with the highest LOD score.
To examine different types of performances, three success rates were computed in this simulation.The first one investigated the performance of the exclusion method, and was computed by where n EX is the number of trios, in which the true sire was identified by only the exclusion method.The second one investigated the comprehensive performance in paternity assignment, which was computed by where n LOD is the number of trios in which the true sire was identified by assigning paternity to the putative sire with the highest LOD score among non-excluded putative sires.The third one, which was computed by investigated the accuracy of the paternity assignment based solely on LOD scores.
Simulations for all the parameter sets were run with 50,000 replicates; all the success rates, except for acc2 LOD , were averaged over all the replicates.Here, acc2 LOD was averaged over all the replicates, except for the replicates in which sire identification succeeded in all the trios by only the exclusion method (i.e., except for the replicates in which N T = n EX ).

RESULTS AND DISCUSSION
To investigate the realized SNP marker conditions in each subpopulation, expected heterozygosity (H e ), observed heterozygosity (H o ), and Nei (1977) were averaged over n l = 50 loci in the parental populations of each sex and their offspring (Figure 1).Because almost the same values were obtained, results for n l = 10 and 30 are not indicated in Figure 1.The differences of expected heterozygosity (H e ) within the true sires and dams were very small between the two subpopulations, indicating that the quality of the SNP marker sets available for the parentage tests was almost equivalent in each of them.Averaged F IS were smaller than zero for all the cases suggesting heterozygote excess, but the deviations from zero were not so large.Therefore, marker genotype frequencies in each population are considered not to be greatly deviated from those expected under HWE.But, it should be noted that the two subpopulations are different from a randomly mating population under HWE from the perspective of an association between mating frequencies of parents (especially those of sires) and their marker genotypes.
The success rates of excluding a single relative of the true sire (acc1 FS for full-sib and acc1 FATH for sire) computed for various combinations of the number of SNP loci (n l ) and MAF (q) are presented in Table 2.For a given number of marker loci (n l ), an increase of MAF inflated both success rates, but the differences between the results for intermediate MAF and those for high MAF were not so large.For example, when n l = 50, an increase of MAF from 0.05≤q≤0.15 to 0.25≤q≤0.35inflates acc1 FS by 0.179 and acc1 FATH by 0.1232 in the subpopulation of Hyogo.However, corresponding increments are only 0.0042 and 0.0054 when MAF increases from 0.25≤q≤0.35 to 0.45≤q≤0.5.This tendency can also be found in the theoretical results obtained by calculating the conventional exclusion probability of a random putative sire in a randomly mating population under HWE (Pr RND ; cf.Dodds et al., 1996) (Table 3).A similar tendency was observed in the effect of n l for a given MAF.
In the subpopulation of Hyogo, the success rates of acc1 FS were consistently higher than those of acc1 FATH when MAF was 0.25≤q≤0.35and 0.45≤q≤0.5, while such a tendency could not be observed in the subpopulation of Shimane.Since the average inbreeding coefficient of hypothetical full-sibs (0.173) was higher than that of the sires of the true sires (0.133) in Hyogo, the full-sibs would form homozygote with higher frequency than the paternal grand sire.This would be one of the reasons for the tendency described above because SNPs were assumed to be biallelic so that putative sires could be excluded only when they formed homozygotes.No difference was observed between the corresponding average inbreeding coefficients in Shimane, both of which were 0.101.Assuming a randomly mating population under HWE, Double et al. (1997) theoretically demonstrated that the exclusion probability of a full-sib of the true sire with a single biallelic locus (Pr FS ) is simply computed by Pr FS = Pr RND /2.The probabilities computed with corresponding n l and MAF used in the simulation are presented in Table 3.
When MAF is intermediate or high (i.e., 0.25≤q≤0.35or 0.45≤q≤0.5),acc1 FS in Hyogo was higher than the corresponding exclusion probability (Pr FS ), and acc1 FATH of Hyogo and both success rates in Shimane were almost the same as the corresponding Pr FS .These results indicated that deviation from HWE and random mating in an actual population does not necessarily degrade the performance of excluding a close relative of the true sires.A strict parentage test based on the exclusion method seems to require at least 50 loci, each having MAF of 0.3 or above.
Two success rates (acc2 EX and acc2) in the two subpopulations were simulated with various conditions of marker set and numbers of putative sires (N FS +1, where N FS is the number of full-sibs of the true sire) (Figure 2).For almost all the combinations of the parameters simulated, both success rates in Hyogo were higher than those in Shimane.The success rate differences between the two subpopulations were generally negligible; however, nontrivial differences were sometimes observed.For example, the largest difference in acc2 EX was 0.1 for n l = 50, 0.25≤q≤0.35,and N FS = 50, and the corresponding value in acc2 was 0.083 when n l = 30, 0.25≤q≤0.35,and N FS = 50.One might expect that the accuracy in distinguishing the true sire from its close relatives using polymorphic DNA markers would be higher in the subpopulation of Shimane, which has a lower degree of relatedness between the individuals within the population (cf.Table 1).In general, The symbols L, M, and H represent the minor allele frequencies (MAF) in the population: L = low (q = 0.1), M = middle (q = 0.3), and H = high (q = 0.5). 2 Pr RND was computed following Dodds et al. (1996), and Pr FS was computed following Double et al. (1997).the degree of relatedness within a population, which is frequently expressed by average coancestry of the population, is interpreted as the expected proportion of the reduced expected heterozygosity (H e ) relative to the base population (Caballero and Toro, 2000;Frankham et al., 2002).Therefore, in a practical field, the difference in the degree of relatedness within each subpopulation might be reflected in the amount of information contained in SNPs sampled, which is associated with H e .In that sense, a parentage test might be more difficult in the subpopulation in Hyogo.However, simulation in the present study assumes that the presented marker sets had already been detected in each subpopulation, so that the higher degree of relatedness within Hyogo did not simply result in lower  (F) 50 loci accuracy.The accuracies of the different parentage tests in each subpopulation were dependent on the specific genotype patterns of the tested parents, which were affected by historical events of selection and mating, and on the current breeding structure in the parental generation.Except for n l = 50 and 0.25≤q≤0.35or 0.45≤q≤0.5, the success rates of acc2 EX in both subpopulations rapidly diminished with increasing N FS , especially when N FS was small.According to Chakraborty et al. (1998), in a randomly mating population under HWE, the probability of excluding N-1 random sires except the true sire of the offspring among N putative sires can be simply calculated as the exclusion probability of a random sire to the power of N-1 (i.e., 1 Pr − N RND ).Computation of this probability with Pr RND (Table 3) suggests that only n l = 30 SNPs with q = 0.3 can exclude 10 random sires with a probability of 0.9578 in an idealized population.However, for both subpopulations in this study, success rates of acc2 EX ≈ 0.9 or above were observed only when n l = 50 and 0.25≤q≤0.35or 0.45≤q≤0.5, indicating the difficulty in distinguishing the true sire from close relatives using only the exclusion method.
The success rates of paternity assignment in the two subpopulations solely by the likelihood-based method (acc2 LOD ) are given in Table 4. Similar to the results obtained by simulation 1 (Table 2), increasing the MAF from low to intermediate values (i.e., from 0.05≤q≤0.15 to 0.25≤q≤0.35) is much more beneficial than increasing it from intermediate to high values (i.e., from 0.25≤q≤0.35 to 0.45≤q≤0.5).Similar results were obtained in the simulation study of Hill et al. (2008), in which sire identification tests using LOD scores were carried out assuming five full-sib putative sires.
For n l = 30 or 50 and 0.25≤q≤0.35or 0.45≤q≤0.5, decreasing rates of acc2 LOD with the increase of N FS were gradual, while those for the other marker conditions were fairly rapid (Table 4).Even when N FS = 50 full-sibs of the true sire competed, the tests making use of n l = 50 loci with MAF of 0.25≤q≤0.35succeeded at a rate of 0.75 in the subpopulation of Hyogo and approximately 0.7 in the subpopulation of Shimane.Marshall et al. (1998) examined the accuracy of likelihood-based paternity tests using the actual allele frequency data at three protein and nine microsatellite loci sampled from the red deer population.One of their simulation results revealed that the reduction in test accuracy was not substantial even when the number of putative sires in the population varied from 2 to 101.Differences in the marker conditions generated large differences in the comprehensive success rates of sire identification (acc2), which is the most important index in this simulation (see Figure 1).For all cases except n l = 50 and 0.25≤q≤0.35or 0.45≤q≤0.5, the success rates rapidly diminished as the number of putative sires (N FS ) increased.However, when n l = 50 loci with 0.25≤q≤0.35or 0.45≤q≤0.5, the decrease in success rate was much more gradual, with acc2 of 0.9430 and 0.9681 in the subpopulation of Hyogo, and 0.8999 and 0.9399 in Shimane, even for N FS = 50.Therefore, good performance in the sire identification tests may be expected with the detection of more than n l = 50 SNPs having MAF larger than 0.3, and the introduction of the combined method of the exclusion and the paternity assignment based on the LOD score.To realize high performance in each prefectural subpopulation unit, SNPs having a large amount of information should be selected in each of them.
In livestock species, a large number of breeds or subpopulations under selection are expected to deviate from an idealized randomly mating population under HWE.In general, the variance of progeny number among parents should be larger than that expected under random mating; and together with the degree of relatedness between the mates, the difference of allele frequency between male and female parents generated by genetic drift affect genotype frequencies in the offspring population (Robertson, 1965;Wang, 1996).Results obtained in the present study illustrated that application of the reasonable SNP system to the parentage test will result in good performance, even in intensively selected livestock populations.Recently, Hill et al. (2008) developed a novel likelihood ratio for SNP systems, especially dealing with the problem of genotyping errors.Furthermore, they proposed posterior probability, considering the expected mating frequencies of the animals, to accommodate the likelihood-based method to the populations in which a large variance of progeny number among parents is expected.Since one of our interests was the effect of deviation from random mating and HWE on the accuracy of parentage tests, the effect of genotyping errors was not considered to prevent it from masking the former effect.Together with the performance of posterior probability proposed by Hill et al. (2008), the effect of genotyping errors will be examined in a future study.

Figure 1 .
Figure 1.Mean of realized expected heterozygosity (H e ), observed heterozygosity (H o ), and F IS of Nei (1977) averaged over n l = 50 loci computed within parental populations of each sex and their offspring.The symbols L, M, and H in each figure represent the minor allele frequencies (MAF) in the parental populations of each sex: L = low (0.05≤q≤0.15), M = middle (0.25≤q≤0.35), and H = high (0.45≤q≤0.5).Standard errors were ≤0.00005 for H e , ≤0.00005 for H o , and ≤0.00011 for F IS .

Figure 2 .
Figure 2. Mean success rates of sire identification tests solely by the exclusion method (acc2 EX ) and those by the combined method of exclusion and the paternity assignment based on the LOD score (acc2), in the Hyogo ((A), (B), and (C)) and Shimane ((D), (E), and (F)) subpopulations, simulated with various conditions of SNP marker sets and the number of full-sib putative sires.The symbols L, M, and H in each figure represent the MAF in the parental populations of each sex: L = low (0.05≤q≤0.15), M = middle (0.25≤q≤0.35), and H = high (0.45≤q≤0.5).The standard error of acc2 EX was ≤0.0005 and that of acc2 was ≤0.0006 for Hyogo.The standard error of acc2 EX was ≤0.0005 and that of acc2 was ≤0.0004 for Shimane.

Table 1 .
Numbers of offspring born in each subpopulation of Hyogo and Shimane prefecture in 2004 (N o ), and those of their sires (N s ) and dams (N d ), and the total number of ancestors appearing in each pedigree file (N a ).

Table 3 .
Exclusion probability of a random sire (Pr RND ) and a full-sib of the true sire (Pr FS ) in a randomly mating population under Hardy-Weinberg equilibrium, computed for various conditions of SNP marker sets

Table 2 .
Mean success rates of paternity tests based on the exclusion method assuming that a single full-sib (acc1 FS ) or sire (acc1 FATH ) of the true sire was a putative sire, in the two subpopulations of Hyogo and Shimane prefectures, simulated with various conditions of SNP marker sets

Table 4 .
Mean success rates 1 of the sire identification tests based on the LOD score (acc2 LOD ), in the two subpopulations of Hyogo and Shimane prefectures, simulated with various conditions of SNP marker sets and the number of full-sib putative sires (N FS )