I need, a GTF file including known Transposable Elements positions in sheep genome. But unfortunately I could not find any things yet. My question is:
Is there any database for depositing Transposable Elements of mammalian genome or how can I find these data (for example for sheep, cattle or human genome)?
GTF files for gene annotation can be obtained from UCSC RefSeq,
Ensembl, iGenomes or other annotation databases. GTF files for TE
annotations are customly generated from UCSC RepeatMasker or other
annotation database. They contain two custom attributes, class_id and
family_id, corresponding to the class (e.g. LINE) and family (e.g. L1)
of the corresponding transposable element. A unique ID (e.g.
L1Md_Gf_dup1) is also assigned for each TE annotation in the
transcript_id attribute. Pre-generated TE GTF files are available for
a number of organisms, and can be downloaded here. If the organism or
genome build of your interest is not available, please contact us and
provide a curated annotation of the transposable elements (e.g.
genomic location and TE name/type). We will do our best to help you
generate the suitable TE GTF file.
(http://hammelllab.labsites.cshl.edu/software/#TEtranscripts)
5-6 years have passed. Is there one for mouse genome from Ensembl, GRCm39 specifically? The Gene Annotation (GTF) for it is Mus_musculus.GRCm39.112.gtf. But, is there one for TE annotation (GTF)?
Transposable (aka, "mobile") elements are categorized as repeats in most genomes, so you'll need to start by downloading the repeatmasker track from UCSC. How you proceed with that will depend completely on (A) what you want to do and (B) what's known about the biology. In humans and mice, for example, it's known that some ERVs are still mobile, while others seem to not be. The simplest was to get a list of this is to filter repeat masker tracks by homology, length, etc.. You can find an example of that for the mouse here. That's based on a some human settings that someone else came up with (I can probably dig up the reference if needed). Not all of the candidate regions will be mobile, but it's a useful list to start with.
Of course, if it's unknown what, if any, families of repeats are still mobile in sheep then the best you can do is some filtering of some likely candidate families based on other organisms.
Thanks for your answer. then your mean is, there is not a file (in GTF format) which determine positions of Alu, SINE, LINE, LTR and ... in genome (even in human)?
I have some SNPs and want to know which of them are in these regions.
Hey,
5-6 years have passed. Is there one for mouse genome from Ensembl, GRCm39 specifically? The Gene Annotation (GTF) for it is Mus_musculus.GRCm39.112.gtf. But, is there one for TE annotation (GTF)?
Thanks, and these answers here are very helpful.