Hi All,
I have run LTR-harvest and LTR-digest on my assembly. Specifically, I have used the -hmms option of LTR-digest and the complete GyDB database of hmms for the annotation of the identified elements. Now I have a gff output. The following data are specific of a single LTR-retrotransposon.
As you can see from the annotated domains, this is probably a Copia-related element. Nevertheless, I'd like to annotate every element at lineage level (sire, hydra, retrofit, oryco or whatever) but this kind of mixed annotation of the domains is not very helpful from this point of view. Did someone else have already faced this problem? Do you have an idea how to associate only a specific lineage to a specific element? Should I do a preliminary filtering based on the e-value (6th column) and the domain's length or something else?
Thank you very much for any advise
seq13 LTRharvest repeat_region 95640 101118 . - . ID=repeat_region2
seq13 LTRharvest target_site_duplication 95640 95644 . - . Parent=repeat_region2
seq13 LTRharvest inverted_repeat 95645 95646 . - . Parent=repeat_region2
seq13 LTRharvest LTR_retrotransposon 95645 101113 . - . ID=LTR_retrotransposon2
seq13 LTRharvest long_terminal_repeat 95645 96356 . - . Parent=LTR_retrotransposon2
seq13 LTRdigest protein_match 96438 96816 2.80E-19 - . name=RNaseH_pseudovirus
seq13 LTRdigest protein_match 96438 96834 6.90E-35 - . name=RNaseH_hydra
seq13 LTRdigest protein_match 96438 96837 2.20E-42 - . name=RNaseH_copia
seq13 LTRdigest protein_match 96438 96837 0 - . name=RNaseH_oryco
seq13 LTRdigest protein_match 96438 96837 2.20E-41 - . name=RNaseH_retrofit
seq13 LTRdigest protein_match 96438 96837 3.41E-43 - . name=RNaseH_pCretro
seq13 LTRdigest protein_match 96438 96837 0 - . name=RNaseH_sire
seq13 LTRdigest protein_match 96438 96840 0 - . name=RNaseH_tork
seq13 LTRdigest protein_match 96489 96759 1.50E-06 - . name=RNaseH_codi_II
seq13 LTRdigest protein_match 97143 97848 0 - . name=RT_copia
seq13 LTRdigest protein_match 97143 97878 0 - . name=RT_pCretro
seq13 LTRdigest protein_match 97143 97878 0 - . name=RT_hydra
seq13 LTRdigest protein_match 97143 97878 0 - . name=RT_sire
seq13 LTRdigest protein_match 97143 97878 0 - . name=RT_tork
seq13 LTRdigest protein_match 97167 97878 4.80E-41 - . name=RT_pseudovirus
seq13 LTRdigest protein_match 97218 97878 0 - . name=RT_oryco
seq13 LTRdigest protein_match 97227 97878 0 - . name=RT_retrofit
seq13 LTRdigest protein_match 98442 99084 0 - . name=INT_tork
seq13 LTRdigest protein_match 98460 98874 6.60E-30 - . name=INT_pCretro
seq13 LTRdigest protein_match 98472 99084 0 - . name=INT_copia
seq13 LTRdigest protein_match 98475 99084 0 - . name=INT_retrofit
seq13 LTRdigest protein_match 98484 99084 0 - . name=INT_oryco
seq13 LTRdigest protein_match 98511 98730 4.30E-12 - . name=INT_hydra
seq13 LTRdigest protein_match 98511 98997 4.00E-25 - . name=INT_sire
seq13 LTRdigest protein_match 98532 98880 1.30E-11 - . name=INT_pseudovirus
seq13 LTRdigest protein_match 99231 99474 0 - . name=AP_tork
seq13 LTRdigest protein_match 99240 99474 2.70E-06 - . name=AP_oryco
seq13 LTRdigest protein_match 99246 99474 1.30E-07 - . name=AP_retrofit
seq13 LTRdigest protein_match 99609 99957 2.90E-08 - . name=GAG_copia
seq13 LTRdigest protein_match 99609 100329 0 - . name=GAG_tork
seq13 LTRharvest long_terminal_repeat 100402 101113 . - . Parent=LTR_retrotransposon2
seq13 LTRharvest inverted_repeat 96355 96356 . - . Parent=repeat_region2
seq13 LTRharvest inverted_repeat 100402 100403 . - . Parent=repeat_region2
seq13 LTRharvest inverted_repeat 101112 101113 . - . Parent=repeat_region2
seq13 LTRharvest target_site_duplication 101114 101118 . - . Parent=repeat_region2