Individual-breed Assignment Analysis in Swine Populations by Using Microsatellite Markers

Individual-breed assignments were implemented in six swine populations using twenty six microsatellites recommended by the Food and Agriculture Organization and the International Society for Animal Genetics (FAO-ISAG). Most microsatellites exhibited high polymorphisms as shown by the number of alleles and the polymorphism information content. The assignment accuracy per locus obtained by using the Bayesian method ranged from 33.33% (CGA) to 68.47% (S0068), and the accumulated assignment accuracy of the top ten loci combination added up to 96.40%. The assignment power of microsatellites based on the Bayesian method had positive correlations with the number of alleles and the gene differential coefficient (Gst) per locus, while it has no relationship to genetic heterozygosity, polymorphism information content per locus and the exclusion probabilities under case II and case III. The percentage of corrected assignment was highest for the Bayesian method, followed by the gene frequency and distancebased methods. The assignment efficiency of microsatellites rose with increase in the number of loci used, and it can reach 98% when using a ten-locus combination. This indicated that such a set of ten microsatellites is sufficient for breed verification purposes. (AsianAust. J. Anim. Sci. 2005. Vol 18, No. 11 : 1529-1534)


INTRODUCTION
The resolving of individual identification and relationship verification is one of crucial problems in forensic cases, and the detection approaches have yet to be developed from the conventional methods of figure characteristics, fingerprint and blood type to the molecular techniques such as DNA fingerprinting, mtDNA, and microsatellites.Individual assignment and breed verification for domestic animals had been not paid much attention in recent years.However, with the development of molecular markers and the demands of some particular cases (especially for horse, dairy, canine, rare and precious animals), the applications of microsatellites are growing in this field (Marklund et al., 1994;Usha et al., 1995;Mommens et al., 1998;Bjørnstad and Roe, 2001;Villaunueva et al., 2002;Yoon et al., 2005).
Microsatellites have been commonly utilized for the assessment of genetic diversity, construction of genetic maps, quantitative trait loci (QTL) mapping, parentage testing and heterosis prediction in domestic species, as they are numerous, high polymorphic, and co-dominantly inherited which made genotyping them easy (Barker, 1994;Behl et al., 2002;Li et al., 2004;Wang et al., 2004;Zhang et al., 2005).The Food and Agriculture Organization and the International Society for Animal Genetics (FAO-ISAG) had recommend panels of microsatellties for genetic diversity measurements of cattle, horse, pig, chicken and other domestic species, and have also chosen microsatellites for parentage verification analysis (Barker, 1994; http:// www.fao.org/dad-is;http:// www.isag.org.uk/comparison.htm).Heyen et al. (1997) designed twenty two cattle microsatellties into six multiplex PCR systems, each containing three or four loci, and carried out parentage testing in five cattle breeds.The non-parent exclusive probability varied in different microsatellite combinations and different breeds, and the accumulated exclusion probability could add up to 99.99% if a total of six multiplex systems were combined.Toskinen and Bredbadka (1999) combined ten microsatellites into three multiplex PCR systems, each comprising of three loci for assignment efficiency evaluation in dogs, and the exclusion probabilities obtained ranged from 99.34 to 99.93% in four canine breeds.
Here genetic variability and individual-breed classification were implemented in six swine populations by using the microsatellites recommended by FAO-ISAG, with the objectify of clarifying the efficiency of the assignment approaches and power and the factors effecting assignment accuracy, so as to offer an effective and convenient method for parentage analysis in pigs.

Experimental animals
A total of 109 animals belonging to six populations were collected in the study.Auckland island pigs (AL, n = 21) came from Auckland Island in New Zealand, and genomic DNA was extracted from blood by using the Puregene kit (Gentra systems) according to the manufacturer's protocol.Detailed sampling information about four Chinese indigenous breeds named Erhualian Pig (EH, n = 23), Qingping Pig (QP, n = 10), Tongcheng Pig (TC, n = 14), Wannanhua Pig (WN, n = 11), and one Australian commercial population (AC, n = 30) were given in our previous reports (Li et al., 2000).

Microsatellite genotyping
A panel of 27 mcirosatellites was chosen from No. XI set of fluorescent primers distributed by the U.S Pig Genome Coordinated Project and were kindly donated by Prof. Max Rothschild.Detailed information about the 27 pairs of primers can be obtained from the following websites: http://www.toulouse.inra.fr/lgc/pig/panel.htm;http:// www.genome.iastate.edu/resources/fprimerintr.html.
PCR was implemented using a PTC-100 thermal cycler (MJ Research Company, USA) according to a standard touchdown protocol for all the primers (Gongora et al., 2002).Genotyping was performed by using an ABI 373 DNA sequencer (Applied Biosystems/Perkin Elemer, USA).Genescan-350 TM TAMRA size standard and two control samples (Swedish 51 and Swedish 79 from PiGMaP resource family) were used to calibrate the microsatellite allele sizes.Genotype calling was accomplished with Genotyper (Ver 2.0).

Statistic analyses
Determination of the number of alleles, allele frequencies and the exact test for Hardy-Weinberg equilibrium of locus-population combinations were performed with the GENEPOP (Ver 1.2) (Raymond and Rousset, 1995) software package.The genetic heterozygosity and the gene differential coefficients (Gst) were calculated by using DISPAN (Ota, 1993).The polymorphism information content (PIC) and the effective number of allele were estimated by using a computer programme written by ourselves in accordance with the formula of Botstein et al. (1980) and Kimura et al. (1964) respectively.
The assignment power of single and combined microsatellite systems can be evaluated with the nonparental exclusion probability and the accumulated exclusion probability (Jamieson and Taylor, 1997;Koskinen and Bredbacka, 1999;Luikart et al., 1999).Jamieson and Taylor (1997) presented the general formulae for three kinds of cases.(i) The genotypes of individual, maternal and parental animals are known, (ii) The genotypes of individuals and one of the parents are known, while that of the other is unknown, The genotypes of both parental animals are unknown, The exclusion probabilities of microsatellites under cases II and III were calculated with Cervus 2.0 (Marshall et al., 1998).Cornuet et al. (1999) recommended two principal kinds of approaches for assigning an individual to its original population, i.e., likelihood-based method and genetic distance-based method.The former consists of the gene frequency and Bayesian methods.The latter included six kinds of distance measures, i.e., Nei's standard and minimum genetic distances, D A of Nei et al. (1983), Cavilli-Sforza and Edwards' chord distance, shared allele distance (D AS ) and (δµ) 2 of Goldstein.The above assignment procedures with the above approaches were performed respectively by GENECLASS (Cornuet et al., 1999).In addition, it is also of interest to estimate how many loci are enough for individual-breed verification.So here an assignment methodology was introduced as follows, assignment efficiency was assessed by plotting out the mean value of the correct assignment percentage for one locus, for two loci (20 random combinations from 27 loci), for three loci (20 random combinations from 27 loci) etc, until there is no further improvement in the assignment accuracy.

Microsatellite polymorphism
Data from S0178 were discarded since most of the genotyping results of AC pigs for this locus were ambiguous.Of the remaining 26 microsatellites, most of the loci displayed polymorphisms in the studied populations except for S0228 and S0355 in AC (Table 1).The mean observed number of allele per loci ranged from 4.167 (S0355) to 8.333 (S0218), and S0218 had the largest number of allele in TC (n = 13) and Sw240 in EH (n = 13).The mean PIC values per locus were above 0.5 which indicated that all the microsatellites had high polymorphisms.The mean expected genetic heterozygosity per locus were between 0.505 (S0355) and 0.781 (S0068), and S0218 had the highest expected heterozygosity in TC while S0228 and S0355 were zero in AC.The parameter Gst stands for the extent of gene differential of a population's loci.A Gst = 0.215 showed that about 21% of the total genetic variation was due to differentiation among six populations, and the other 79% resulted from differentiation within each population.S0155 had the least Gst value while S0355 was the largest although it had the lowest number of alleles and expected heterozgosity as shown in Table 1.

Assignment power
The exclusion probabilities of 26 microsatellites were between 0.368 (Sw72) and 0.713 (CGA) under case II, and were between 0.551 (Sw72) and 0.833 (CGA) under case III (Table 1).The accumulated exclusion probabilities were more than 99.99% for both cases.In addition, the exclusion probabilities of case II case III showed a significant positive correlation (r = 0.998).Using the Bayesian method, the assignment accuracies of 26 mciosatellites varied from 33.33% (CGA) to 68.47% (S0068), and there are ten loci whose assignment accuracies were more than 55%.The accumulated accuracy can add up to 96.4% if these ten loci were combined together.The Pearson correlation analysis (SAS ver8.2) demonstrated that the assignment accuracy of the Bayesian method had hardly any correlation with the exclusion probabilities of case II and case III, the mean polymorphism information content and the expected genetic heterozygosity, while it had positive correlations to some extent with the mean observed number of alleles (r = 0.362) and the gene differential coefficient (r = 0.294).The exclusion probabilities of both cases had no correlation with the mean observed number of alleles, the polymorphism information content and the expected genetic heterozygosity but they had with gene differential coefficient (r = 0.357).

Assignment approaches comparison
All individuals from six populations were mixed and reassigned to populations by using assignment approaches on  the basis of multilocus genotypes.Here, the assignment procedure of the Bayesian method was taken as an example (Table 2).All animals from AL, AC, TC and WN could be correctly classified into their origin populations but one EH was mistaken as QP and one QP was assigned to WN.In total, there only two animals from 109 were incorrectly assigned to their origin breeds and the percentage of correct assignments was The above assignment procedures were repeated in turn for the other seven methods, and the percentage of correct assignments attained for each method is shown in Table 3.The Bayesian method was the best (p = 98.2%) and (δµ) 2 of Goldstein was the least successful (p = 81.1%).

Microsatellite numbers for assignment
The number of loci used in the parentage testing played an important role in the assignment efficiency.In order to roughly estimate a suitable number of microsatellites for testing, the percentage of correct assignments by using the Bayesian method was calculated for one, two and three-loci combinations which were chosen randomly from the 26 loci and re-done twenty times.The above assignment procedures were repeated until the mean correct assignment percentage did not rise.As shown in Figure 1, the mean corrected assignment percentage was 47.1% when only one locus was used and was 77.1% when using two loci.The assigned efficiency became higher with an increase in the number of loci used, and eventually added up to 98% when ten loci were used and the plotted curve trended to be stable.
Also, the assignment efficient was always high in AL and AC, while it was low in TC and EH, i.e., most of individuals from the former two groups could be easily classified into their original populations while it was a bit difficult for the latter two groups.The gene structure differentiation and the small sample sizes of the studied populations might be the cause of that phenomenon.

Comparisons among assignment approaches
The allele frequency was distinctive for each breed and was the main feature of the population genetic structure.
The clarification of genetic relationships and the individualbreed assignment were deduced from multilocus genotypes or alleles of microsatellites.The updated assignment approaches could be classified into the gene frequency method, Bayesian method, and distance-based method (Cornuet et al., 1999).The allele frequency method was based on the genotype distribution of a verified individual at each locus in each breed.The matching likelihood values of individual-breed for all loci were added up, and the highest likelihood value indicated which individual might come from this breed.The Bayesian method was performed by the marginal probability distribution of individual-breed (Rannala and Mountain, 1997;Dawson and Belkhir, 2001).Distance-based methods assign the individual to the 'closest' population, and individual-population distance was defined by assuming an individual as a sample of two genes (possible values of allelic frequencies are 0, 0.5 and 1) and distances were measured according to the corresponding formula.
As for the eight assignment methods used in the study, most individuals were able to be classified into their origin populations correctly.The assignment accuracy of the Bayesian method was the highest, and then followed by D A of Nei et al. (1983), Nei's standard distance, gene frequency method, Nei's minimum distance, D AS of shared allele measure and (δµ) 2 of Goldstein.The order of the above methods based on assignment accuracy was slightly different from those of the simulation results obtained by Cornuet et al. (1999).In their studies, the likelihood-based methods, including gene frequency, always had greater accuracy than the distance-based methods.However, most population-locus combinations deviated from Hardy-Weinberg equilibrium and microsatellites were occasionally in linkage disequilibrium in the actual study.The population size also has effects on the assignment accuracy.All of the above factors might result in the assignment accuracy difference between the actual and the simulation data.The Bayesian method could be considered as a superior approach in parentage testing though the calculation process seems to be tedious.Besides the above methods used in the study, other assignment approaches could be attempted for parentage testing analysis (Blouin et al., 1996;Götz et al., 1998;Blot et al., 1999).

Factors influencing the assignment efficiency
There are many factors accounting for the assignment efficiency, such as the number of loci and alleles, polymorphism information content, genetic heterozygosity and gene differential coefficient.Frankly speaking, the higher the above factors' estimators, the more robust would be the assignment results of the microsatellites.It could be deduced from this study that the assignment accuracy based on the Bayesian method was related to the number of alleles and the gene differential coefficient, but not related to polymorphism information content and genetic heterozygosity.The estimators of the polymorphism information content and the genetic heterozygosity were the frequencies and the number of allele per locus.Also, some particular alleles with extreme values could mislead the assignment.Thus, alleles with extreme values should be considered carefully during breed assignment and especially rare private alleles should be discarded.
Undoubtedly, the assignment efficiency will be enhanced with an increase in the number of microsatellites used.However, the testing expenditure including PCR and genotyping will also rise accordingly.So it is necessary to determine a feasible number of loci to be used.As is shown in Figure 1, a ten microsatellites combination would be sufficient to obtain a 98% correct assignment.Taking the assignment power of each marker into consideration, the following are proposed to be suitable microsatellties for parentage testing: S0005, S0068, S0215, S0218, S0225, S0228, S0355, Sw632, Sw72 and Sw936.

Assignment conditions
Besides the intrinsic characteristics of microsatellites such as polymorphisms and assignment power, and size homoplasy (Peischl et al., 2005), exterior factors also should be taken consideration for individual-breed assignments.The PCR results' stability, stutter bands, and the occurrence of null alleles during the procedure of microsatellite genotyping should be examined with great caution.Molecular weight ladder or standard control DNA samples should be used in each genotyping experiment.In addition, it is always convenient to use multiplex PCR reaction and multiple-dye detection system for microsatellite genotyping.Nechtelberger et al. (2001) had recommended two multiplex systems containing fifteen microsatellite loci for swine comparison test, five of which were as the same as those of our study.More efforts should be devoted to making an efficient and reliable identification system by means of microsatellites for pigs.

Table 1 .
Genetic variation and assignment power of 26 microsatellite loci in the studied populations N: mean number of alleles; Ne: mean effective number of allele; Hs: mean genetic heterozygosity; PIC: mean polymorphism information content.Gst: mean gene differential coefficient; Excl.II: mean exclusion probability under case II; Excl.III: mean exclusion probability under case III.P: the percentage of corrected assignment using Bayesian method.

Table 2 .
Assignment of individuals into populations by using the

Table 3 .
The percentage of corrected assignment for six approaches based on multiloci genotypes