Optimal population size to detect quantitative trait loci in Korean native chicken: a simulation study

Objective A genomic region associated with a particular phenotype is called quantitative trait loci (QTL). To detect the optimal F2 population size associated with QTLs in native chicken, we performed a simulation study on F2 population derived from crosses between two different breeds. Methods A total of 15 males and 150 females were randomly selected from the last generation of each F1 population which was composed of different breed to create two different F2 populations. The progenies produced from these selected individuals were simulated for six more generations. Their marker genotypes were simulated with a density of 50K at three different heritability levels for the traits such as 0.1, 0.3, and 0.5. Our study compared 100, 500, 1,000 reference population (RP) groups to each other with three different heritability levels. And a total of 35 QTLs were used, and their locations were randomly created. Results With a RP size of 100, no QTL was detected to satisfy Bonferroni value at three different heritability levels. In a RP size of 500, two QTLs were detected when the heritability was 0.5. With a RP size of 1,000, 0.1 heritability was detected only one QTL, and 0.5 heritability detected five QTLs. To sum up, RP size and heritability play a key role in detecting QTLs in a QTL study. The larger RP size and greater heritability value, the higher the probability of detection of QTLs. Conclusion Our study suggests that the use of a large RP and heritability can improve QTL detection in an F2 chicken population.


INTRODUCTION
The application of genomics in agriculture focuses on identifying genes responsible for economically important traits in plants and animals. Some of these traits are characterized by wide variability in the expression of genes at certain loci, i.e., quantitative trait loci (QTL). A genomic region associated with a particular phenotype is called QTL. Classification of the chromosomal regions containing QTLs could be useful in markerassisted selection to increase breeding efficiency [1]. Also, the combination of a molecular linkage map with powerful statistical approaches enables the genetic partition of complex traits. Chicken has particular advantages in such analysis due to its short life cycle and many offspring [2]. However, several factors could influence detection of QTLs, such as genotyping errors, training population size, phenotypic data replication levels, and various environmental effects. The evaluation of some of them is either difficult or time consuming in practice. As an alternative, simulation experiments are generally performed for the evaluation of such factors [3].
A simulation study allows the testing of several theories, permitting an unravelling of the multifaceted evolutionary patterns that are otherwise difficult to understand. For ex ample, the elucidation of the history of human migration provides significant insight into the present patterns of DNA variation in humans [35]. Simulation studies of beef cattle and other livestock have provided information on their po tential for genomic evaluation. Studies have included the prediction of total genetic value [6], genomic prediction of simulated multibreed and purebred cattle [7], genomic selec tion accuracy in simulated populations [8], and a comparison between single and twostep genomic best linear unbiased prediction methods in simulated beef cattle [9,10]. The chicken 60K singlenucleotide polymorphism (SNP) panel currently provides a level of genome coverage and map resolution that are unavailable from microsatellite markers. The high density SNP panel also has the potential to achieve improved accuracy in determining QTL locations. An F2 population is useful for detecting QTLs because it is a cross between two popula tions differing phenotypically in a trait [2]. Ledur et al [11] showed that designed populations, such as F 2 populations for use in genomewide association studies (GWAS), had advantages over random populations in terms of reducing the false discovery rate and improving mapping accuracy. Several experiments have been conducted based on this de sign in different livestock species. The design is especially useful in pigs and chickens because of their shorter genera tion interval and higher prolificacy than other species. The objective of this study was to investigate the optimal size of an F 2 population in QTL detection through simulation using QMSim software.

Simulation of F 2 population, population structure, and simulation parameters
The number of QTLs was examined in two different F 2 pop ulations. A total of six chicken populations were simulated, including Line 1 and Line 2, which performed as a typical sire and dam population, respectively. The crossing of males of Line 1 and females of Line 2 produced the F 1a population, whereas mating of males of Line 2 and females of Line 1 pro duced F 1b population. Similarly, the males of F 1a and females of F 1b produced the F 2a population, and the females of F 1a and males of F 1b created the F 2b population in this study. However, we did not include the effect of mating system in this study.
The QMSim software package [12] was used for simula tion of phenotypic and genotypic datasets of the populations. These simulated datasets mimicked the actual population structures and extent of linkage disequilibrium (LD) existing in the Korean native chicken population [13]. Table 1 sum marizes the parameters for simulation. A 50K markerdensity panel was simulated to generate biallelic markers distributed across 18 autosomal chromosomes of different lengths. In the beginning, a historical population (HP) was simulated, which had a constant size of 10,000 individuals across 1,000 generations. Then, the size was gradually reduced to 8,000 individuals in the subsequent 1,050 generations to create an initial LD and mutationdrift equilibrium. The number of individuals produced for each sex was equal (equal probability of being male or female), and the mating performed among parents was random. For simulating two different pure lines (Lines 1 and 2), 60 males and 600 females were selected from the last generation of the HP. As Line 1 acted as a sire popu lation, individuals selected from this population were based on a higher true breeding value (TBV). Oppositely, Line 2 being the dam population, the selection of individuals from Line 2 was based on a lower TBV. The mating design in each population was based on positive assortative mating. A total of 660 selected individuals was used as the effective popula tion size, Ne was simulated across 20 generations, with each dam producing 10 offspring per generation in all simula tions. A total of 330 individuals (30 males and 300 females) were chosen from the last generation of HP and bred for five generations to create two different F 1 populations (F 1a and F 1b ). Finally, 15 males and 150 females were randomly chosen from the last generation of each F 1 population and randomly bred for six more generations to create two different F 2 pop ulations (F 2a and F 2b ), following a similar mating design as described earlier. The replacement ratio for both sires and dams was 100%. Traits with a phenotypic variance of 1 and heritability levels of 0.1, 0.3, and 0.5 were used in the simula tion. Three reference populations (RP) consisting of 100, 500, and 1,000 individuals were created through a random selec tion of individuals from generations 5 and 6 of F 2 population.
Our simulated genome comprised 18 pairs of chromo somes, with a length identical to the actual Korean native chicken genome length of 2,729.4 cM [13]. A marker density of 50K was selected to ensure sufficient density for segregat ing biallelic loci. The effect of markers on traits was neutral and the effect of QTL was considered to explain 100% of the genetic variance. The wholegenome consisted of 35 QTLs, where these segregated QTLs consist of 2 to 4 alleles per loci (randomly distributed), with a minor allelic frequency greater than 0.01. The additive genetic effect of the QTL was sampled from a gamma distribution, with a parametric shape equal to 0.4. The rate of missing marker genotype and marker genotyping error was 0.05 and 0.005, respec tively. A recurrent mutation rate of 10 -5 was used for markers and QTLs throughout the simulation to obtain a mutation drift equilibrium in the population. Phenotypes were generated by adding random residuals to the QTL effects.

Statistical model for quantitative trait loci detection
The F 2 population was chosen as the RP as their parents were produced by crossing two different families. In GWAS, all markers are required to be in LD, with causal variants in close proximities. All SNPs were coded as AA = 0, AB = 1, and BB = 2, respectively [14]. The statistical model was as follows: y = μ+CG i +b 1 SNP k +A 1 +e ijk where y is the phenotype of individuals; μ is the overall mean, CG i is the vector of fixed contemporary group effect for gener ation by sex; b 1 is the fixed/random effect of marker genotype; SNP k is the recoded marker genotype (0, 1, and 2); A 1 is the vector of the random polygenic effect with , where G is the additive genomic relationship matrix (GRM) and 6 y � μ � � � � � � � � ��� re y is the phenotype of individuals; μ is the overall mean, � is the vector of fixed temporary group effect for generation by sex; � is the fixed/random effect of marker genotype; � is the recoded marker genotype (0, 1, and 2); � is the vector of the random polygenic effect ~N�0, � � �, where G is the additive genomic relationship matrix (GRM) and � � is the random itive effect of animals, and ��� is the random residual effect ~N�0, � � �, where I is the identity rix.
To map QTLs, a modified Bonferroni-type multiple testing correction threshold was used [15] to rict the experiment-wise error rate to 0.05 [16].
is the random additive effect of animals, and e ijk is the random residual effect

145
To map QTLs, a modified Bonferroni-type multiple testing correction threshold was used [15] to 146 restrict the experiment-wise error rate to 0.05 [16].

148
, where I is the identity matrix.
To map QTLs, a modified Bonferronitype multiple testing correction threshold was used [15] to restrict the experiment wise error rate to 0.05 [16].

RESULTS AND DISCUSSION
To investigate the optimal size of an F 2 population in QTL detection, QMSim software was used to simulate data sets derived under different scenarios (e.g., h 2 = 0.1, 0.3, and 0.5; RP size = 100, 500, and 1,000), as shown in Figure 1, 2, and 3. Across the RP sizes, we observed an overall increase in the number of significant QTLs across the different chromo somes.
With a RP size of 100, no QTL was detected to satisfy Bonferroni value at three different heritability levels. In a RP size of 500, two QTLs were detected when the heritability  was 0.5. With a RP size of 1,000, 0.1 heritability was detected only one QTL, and 0.5 heritability shows that five QTLs were detected. To sum up, RP size and heritability are playing a key role to detect QTLs in the QTL study. This result implies that RP sizes should be increased in accordance with herita bility in an F 2 chicken population. With a RP size of 1,000, many QTLs were detected at different h 2 levels of traits, even at the h 2 value of 0.1 (Figure 1). The results of this study imply that increasing the RP size and heritability level improved QTL detection in an F 2 population. However, the optimal RP size for QTL detection should be at least 500 individuals across scenarios of traits with low to high heritability levels (h 2 = 0.1, 0.3, and 0.5) to obtain more significant QTLs in an F 2 chicken population. These results support an earlier study by Hocking [17], who detected QTLs for production traits in F 2 crosses between 250 to 700 birds of two breeds. In 1992, the Korean government launched the nationwide Korean native chicken restoration project, which was mainly admin istered by the National Institute of Animal Science (NIAS) and focused on the development of meattype native chick en lines [18]. As part of this project, Korean Ogye and White Leghorn cross populations were investigated for the deter mination of QTLs and eventually, the causative mutations for meat and eggrelated traits. The results of the present study can be used as an initial framework for designing and implementing QTL detection in an F 2 chicken population, especially cross populations between the Korean Ogye and White Leghorn breeds. However, the population structure and genetic architecture of traits should also be considered to optimize the RP sizes for QTL detection in the chicken industry.

CONCLUSION
In general, a large RP size (1,000) had a positive effect on QTL detection compared with a RP size of 100 or 500. The RP size and heritability levels should be considered for QTL detection in an F 2 chicken population.