Comparative Analysis of Repetitive Elements of Imprinting Genes Reveals Eleven Candidate Imprinting Genes in Cattle

Few studies have reported the existence of imprinted genes in cattle compared to the human and mouse. Genomic imprinting is expressed in monoallelic form and it depends on a single parent-specific form of the allele. Comparative analysis of mammals other than the human is a valuable tool for explaining the genomic basis of imprinted genes. In this study, we investigated 34 common imprinted genes in the human and mouse as well as 35 known non-imprinted genes in the human. We found short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), and long terminal repeats (LTRs) in imprinted (human and mouse) and control (cattle) genes. Pair-wise comparisons for the three species were conducted using SINEs, LINEs, and LTRs. We also calculated 95% confidence intervals of frequencies of repetitive sequences for the three species. As a result, most genes had a similar interval between species. We found 11 genes with conserved SINEs, LINEs, and LTRs in the human, mouse, and cattle. In conclusion, eleven genes (CALCR, Grb10, HTR2A, KCNK9, Kcnq1, MEST, OSBPL5, PPP1R9A, Sgce, SLC22A18, and UBE3A) were identified as candidate imprinted genes in cattle. (


INTRODUCTION
Imprinted genes do not follow the law of Mendenlian genetics in which inheritance of traits is described as either recessive or dominant (e.g., Lee et al., 2007).Genomic imprinting is monoallelic and involves epigenetically expressed parent-of-origin-dependent inheritance of specific autosomal genes (mother (egg) or father (sperm)) (Cheng et al., 2007).Several studies have reported that they are conserved among placental and marsupial species (Reik et al., 2001).There are many imprinted genes reported in the human and mouse, but few reports have investigated imprinted genes in cattle.In a study of imprinted genes (51 in human and 69 in mice), only 26 of these genes were common between the two species (Jirtle, 2006).Humans had fewer imprinted genes than mice, and the imprinted genes in humans were different from those in mice.Although these genes were highly conserved between these two species, some genes showed non-imprinted patterns in both species.To understand the biological mechanisms of genomic imprinting, comparative analysis of the sequence between imprinted and non-imprinted genes is important to identify species-specific monoallelic expression.The status of known imprinted genes and comparative analysis among mammalian species provides tool for identifying the epigenetic mechanism of genomic imprinting (Ismail et al., 2006).
The classification of imprinted and non-imprinted genes on the basis of genomic sequence characteristics was proposed using distinction functions, and the mechanisms representing monoallelic expression of imprinted genes can be used in a genome-wide prediction to verify putative candidate imprinted genes (Ke et al., 2002).Hence, analysis of sequence characteristics of genomic repeated elements plays an important role in identifying imprinted genes.Differences in these conserved repetitive elements may be important in the regulation of monoallelically expressed genes in mammalian species.The short interspersed nuclear elements (SINEs) tend to be repeated at significantly discriminatory densities in human imprinted regions.The SINE sequence was significantly lower in imprinted loci compared to non-imprinted loci (Greally et al., 2002).In a search for sequence characteristics of IGF2 in imprinted and non-imprinted genes, Weidman et al. (2004) noted that paternally expressed IGF2 was strongly associated with a shortage of SINEs.In the region outside of the imprinted domain, the SINE density increased to 13.45% in Ppp2r5c and Dnchc1 (Tierling et al., 2005).Khatib et al. (2007) reported that densities of long interspersed nuclear elements (LINEs) and long terminal repeats (LTRs) were notably lower in imprinted genes compared to control genes.The frequency of LINEs was significantly higher in control genes (13.7%) than in imprinted genes (4.7%), and the frequency of LTRs was significantly under-represented in imprinted genes (0.4%) compared with control genes (1.7%).Khatib et al. (2007) found two imprinted genes (TSSC4 and XIST) in cattle.Walter et al. (2006) found that LINEs contained significantly fewer coding sequences in imprinted genes compared to control genes in cattle.Also, LINEs were denser in imprinted genes than in nonimprinted genes in mice.It is possible that the frequency of LINEs in imprinted genes involves species-specific expression.
The lower percentage of repetitive elements in imprinted regions makes them valuable in biological-based access of the different expression of imprinted genes.However, analyses of sequence characteristics of repetitive elements have not been reported in cattle.Hence, the purpose of this study was to identify putative candidate imprinted genes in cattle by comparing known imprinted genes in the human and mouse.This comparative analysis of mammalian species would be useful information for the study of genomic imprinting in mammals.

Selection of genes
Human and mouse imprinted genes were used in this study because a draft sequence was not available for imprinted genes in cattle.In the human and mouse, there are 179 and 77 known imprinted genes, respectively.In total, 34 genes with known common imprinted genes in the human and mouse were selected from the geneimprint catalog (http://www.geneimprint.com/site/genes-by-species)to identify orthologous genes in cattle.Among these, 15 genes were maternally expressed and 18 genes were paternally expressed, and one gene (GRB10) had isoformdependent expression.For a comparative analysis to imprinted genes, 35 known non-imprinted genes in the human were compiled from the 'Catalogue of Parent of Origin Effects' (http://igc.otago.ac.nz/home.html)to show biallelic expression as controls.

Detection of repetitive elements
To account for differences in genomic imprinting expression, we examined repetitive elements of molecular components.Searching for repetitive elements (SINEs, LINEs, and LTRs) was conducted using UCSC Genome Browser Site (http://genome.ucsc.edu/-March 2006 (hg18), July 2007 (mm9), and October 2007 (bosTau4) builds for human, mouse, and cattle) in the genomic region.We searched the repetitive elements hg18, mm9, and bosTau4 for human, mouse, and cattle, respectively.The genome sequence elements were obtained from the UCSC Genome Browser site to identify sequence characteristics of each repetitive element (from chromosome 1 to chromosome X, except for chromosome Y).The X chromosome has richer resources than the Y chromosome because the X chromosome has a low mutation rate, moderate genetic drift, a high recombination rate, a high number of usable loci, and a highly effective size.Next, to show the position of the sequences in each gene, we downloaded data from the National Center for Biotechnology Information (NCBI) because the UCSC web site lacked gene annotations.To identify regions of conserved sequence characteristics, we downloaded pair-wise alignment data from the UCSC Genome Browser version Human/Cow (bosTau4), Mouse/Cow (bosTau4), and Human/Mouse (mm9) with pair-wise alignments.Consequently, we identified orthologous genes with repetitive elements (SINEs, LINEs, and LTRs) between human, mouse, and cattle.We then counted the number of SINEs, LINEs, and LTRs for each gene in pair-wise species.Also, we performed a comparative analysis of the frequency of these repetitive elements in imprinted and non-imprinted genes.Python script was used to analyze the structure of sequence characteristics and to calculate the frequency of repetitive elements of each gene based on RepeatMasker results.

Statistical analysis
To show conserved relationships between human and mouse (imprinted genes), and cattle (control genes), we computed Pearson's, Spearman's, and Kendall's correlation coefficients, which are useful in measuring both the direction and the strength of the linear relationship between two variables.Unlike classical parametric methods such as Pearson's correlation, Spearman's and Kendall's correlation measure the relationships with different rankings of the same set.Boxplots are suited for comparing two or more data sets and for identifying the approximate shape of the distribution of a data set.We therefore used boxplots to compare the frequency of a repetitive sequence between imprinted and non-imprinted genes in the human and cattle.A confidence interval for the mean frequencies was constructed centered on the sample mean with a width that was a multiple of the standard deviation.Therefore, we estimated 95% confidence intervals (CI) for mean frequencies in human and cattle loci.Statistical analyses of all data were performed with the statistical package R (http://www.r-project.org/).

RESULTS AND DISCUSSION
To date few analyses have been conducted on sequence characteristics in repetitive elements of known imprinted genes.In addition, the epigenetic mechanisms regulating imprinted genes in cattle are poorly understood.Knowledge of the characteristics of imprinted genes would increase our understanding.Ismail et al. (2006) reported that, to understand the biological mechanisms of imprinted genes, comparative analysis of sequence structural features between imprinted and non-imprinted genes is important to identify species-specific monoallelic expression.

Identification of SINEs, LINEs, and LTRs in human, mouse, and cattle using repeat masking
The trends in Figure 1A show the count numbers of orthologous SINEs between the human and cattle based on imprinted genes of the human and mouse.Total numbers of genes with SINEs were 81 between human and cattle, 24 between mouse and cattle, and 56 between human and mouse.Eleven genes had common SINEs between species.Figure 1B represents the common LINEs in human and cattle.Total numbers of LINEs were 57 between human and cattle, 16 between mouse and cattle, and 36 between human and mouse.Six genes had common LINEs in both species.Figure 1C shows the common LTRs in the human and cattle.Total numbers of LTRs were 26 between human and cattle, 3 between mouse and cattle, and 25 between human and mouse and only one gene had common LTRs between species.

Comparative analysis of imprinted and control genes
In order to show whether SINE, LINE, and LTR distributions were different or not between the imprinted and control genes, we identified 35 genes known as nonimprinted genes in the human.We calculated the number of repeat elements for each gene in imprinted genes and biallelically expressed genes in the human and cattle.Observed frequencies of imprinted and control genes are displayed in Figure 2 along with the frequencies of repetitive elements.The sequence characteristic analysis showed significant differences between the two groups of genes.The frequency of SINE elements had significantly lower densities in imprinted genes compared to control genes in both human and cattle.Similarly, we also found that LINE and LTR densities were much lower in imprinted genes than in control regions in both species.The sequence characteristics were consistent for imprinted genes.The frequency of SINEs was significantly lower in imprinted regions.Concentrations of LINEs and LTRs were significantly lower in imprinted genes compared to control genes (Greally et al., 2002;Tierling et al., 2005;Khatib et al., 2007).The sequence characteristics were consistent with maternal and paternal genes (Ke et al., 2002).The sequence features in imprinted genes may help to identify differences in genomic functions for candidate imprinted genes in cattle.Such sequence characteristics difference analyses would help to develop a determinant marker for classification of imprinted genes.

Correlation analysis
We computed correlation coefficients to show the conserved relationship of imprinted gene frequency in human, mouse, and cattle.For SINEs and LINEs, correlation coefficients between all pair-wise species were strongly positive (Table 1).For LTRs, each correlation was significantly positive between pair-wise human/cattle and human/mouse, respectively.However the pair-wise mouse/cattle correlation did not show a significant p-value (Pearson's correlation coefficients = 0.21, Spearman's rho = 0.33, Kendall's tau = 0.22).This was because there are a small number of common genes between mouse/cattle compared to human/cattle and human/mouse.
Though LTRs of pair-wise mouse/cattle were nonsignificant, these repetitive elements might be useful indicators for identifying imprinted genes because the data point of LTRs was much smaller than SINEs and LINEs.A higher correlation coefficient reflects a strong linear Figure 2. Boxplot of SINEs.Short interspersed nuclear element (SINE) frequency (%) of repetitive elements in imprinted genes and biallelically expressed genes in (A) human and (B) cattle.SINE frequencies were significantly different between imprinted and control genes.The frequency of SINE elements had significantly lower densities in imprinted genes compared to control genes in both human and cattle.
relationship between two variables.As the numbers of repetitive elements increase in human and mouse, they also increase in cattle; therefore, the correlation of these repetitive elements shows that for these two species, repetitive counts indicate the possibility of similarity.
Because SINEs have shorter repeat sequence units than LINEs and LTRs in the genome, the correlation between human, mouse, and cattle had a stronger probability of similarity.

Confidence interval for means
We estimated 95% confidence intervals (CI) for mean frequencies of repetitive sequences in the human, mouse, and cattle (Table 2).The 95% CI of SINEs ranged from -39.35 to 75.75 in the human and from -33.52 to 67.35 in cattle between pair-wise human/cattle.The 95% CI of LINEs ranged from -51.55 to 92.89 in the human and from -37.03 to 60.79 in cattle between pair-wise human/cattle.The 95% CI of LTRs ranged from -24.66 to 48.50 in the human and from -22.90 to 36.52 in cattle between pair-wise human/cattle.Although there is no biological meaning of the lower boundary of the CI, the 95% CI in the human had larger ranges than in cattle and also the 95% CI in mouse had larger ranges than in cattle (Table 2; Mouse/Bovine).With increased complexity of an organism, the density of encoding genes decreases and the density of repeated sequences in the DNA increases (Watson et al., 2007).Because the repetitive sequence density is higher in the human and mouse than in cattle, the variation of repetitive element frequency per gene in human and mouse is high.As expected, the variation of repetitive element frequency in the human and mouse was more broadly distributed than in cattle, and the range of counts for each gene in the human and mouse was vastly scattered compared to cattle.We conclude that the human and mouse genome might have more genetic complexity features and functions.These widespread mechanisms would contribute to identifying complexity in the genome.The low frequencies of SINEs, LINEs, and LTRs in imprinted regions could serve as a useful tool in the study of biological mechanisms leading to verification of the extent of imprinted domains.In summary, we concluded that eleven genes (CALCR, Grb10, HTR2A, KCNK9, Kcnq1, MEST, OSBPL5, PPP1R9A, Sgce, SLC22A18, and UBE3A) were candidate imprinted genes.Three additional findings were noted.First, the elements of sequences between imprinted and control genes had distinguishing sequence characteristics, because the sequences of imprinted genes were more regular than biallelic genes.Thus, these sequence differences are keys in the classification of imprinted genes and regions.Second, the correlation of SINEs in the human, mouse, and cattle was stronger than that of LINEs and LTRs, because SINEs were short and frequently repeated.Hence, the relationship between human, mouse, and cattle indicated the possibility of similarity for SINEs.Third, the range of 95% CIs was larger in the human and mouse than in cattle.This indicates that the number of repeat elements for each gene in the human and mouse was more broadly scattered than in cattle.It appears that the human and mouse genome has greater genetic complexity.These widespread biological mechanisms enable us to identify complexity in a genome.
This study would provide useful information for the study of genomic imprinting in mammals.Our genomic imprinting study focused on highly conserved sequence characteristics in the set of particular species being studied.These repetitive elements are important in the imprinted region.Further studies of cattle and other species are necessary to identify additional candidate imprinted genes and to identify whether imprinted genes have specific and important roles.

Biological functions of the imprinting genes
Among the eleven candidate imprinting genes, the physiological functions of seven genes (CALCR, Grb10, HTR2A, KCNK9, MEST, PPP1R9A, and Sgce) are known, but four genes (Kcnq1, OSBPL5, SLC22A18, and UBE3A) are not yet well known in cattle.CALCR function is calcitonin binding and calcium homeostasis hormone activity (Steven et al., 1993) and is known to be a brain specific imprinted gene in the mouse (Hoshiya et al., 2003).Grb10 function is SH2/ SH3 activity containing adapter proteins, insulin receptor binding and inhibition of tyrosine kinase activity (Akhilesh et al., 1995).Grb 10 gene shows equal biallelic expression in almost all tissues and organs in the human, in while it is almost always expressed paternally in the fetal brain which is similar to mouse Meg1/Grb10 gene (Hikichi et al., 2003).HTR2A function is G proteincoupled receptor and receptor activity (Quist et al., 2000).Recently, there has been a contrary result that HTR2A is imprinted in neither human nor cattle, but it is maternally expressed in the mouse.We need to evaluate the controversy in future study (Zaitoun and Khatib, 2008).KCNK9 function is potassium channel activity (Lin et al., 2003); it is known to be predominantly expressed in the brain, and is a known oncogene (Philippe et al., 2007).PPP1R9A function is control cytoskeleton reorganization activity and protein phosphatase I binding (Nakabayashi et al., 2004).

Figure 1 .
Figure 1.Counts of repetitive elements for each gene in human and cattle.(A) Short interspersed nuclear elements (SINEs), (B) Long interspersed nuclear elements (LINEs), and (C) Long terminal repeats (LTRs).Black and gray shading represent human and cattle, respectively.The horizontal axis shows the names of imprinted genes, and the vertical axis denotes the count of repetitive elements (SINEs, LINEs, and LTRs).The frequencies of SINEs and LINEs in cattle and human were highly conserved, and LINEs LTRs in cattle had densities similar to those in the human.

Table 1 .
The correlation coefficient of repetitive elements in human, mouse, and cattle