Hello All,
I have a bed file containing annotations for transposable elements that were generated from an old genome assembly. However, I have a new assembly which I believe to be superior and would like to use the old annotations to obtain TE coordinates from the new assembly.
What I tried>
Obtain fasta sequence from old genome. mapp this with bowtie2 to new genome to get aligned reads (--al).
The issue here is that, the new fasta generated contained the old contig names instead of the new contig names in the new genome.
Is there a different way to go about this to arrive at my goal without beginning annotation with repeatmasker from scratch?
I tried to use repeatmasker earlier but the output didnt look right to me and I'd rather just fetch the TEs from the new genome.
Thanks in advance.
Why would you not start from scratch with repeatmasker? It will be the most accurate way to mask repeats in your genome.
Simply mapping the 'old' ones on the new assembly will for sure be sub-optimal (you likely will miss TEs that where not there yet in the old assembly). You can use the TE-lib you have from the previous assembly so no need to re-build the lib itself
Thanks for your input. I did try to repeatmask the new genome but the resulting output appears to be very shallow and only have few hits. I will try to create lib with the TE hopefully that will give better output.
how did you do the repeatmasking then? which library did you use?
I used a public library of TEs from different plants and animals. It was a lib that was already available on my work computer so I am not sure exactly how it was obtained. Anyway I am running repeat masker right now with the annotated TEs from the old genome as lib.
Align your assemblies to each other using an aligner like
lastz
(LINK). You could also useblat
if you are sure the assemblies are very similar. Once you find the corresponding hits transfer your annotations.if you can transform the bed to gff (galaxy has tools for that), you could also use liftoff which wraps all of this in a pipeline https://github.com/agshumate/Liftoff
Hey Phillip. I tried to use liftoff since Repeatmasker takes forever (although its currently running). However, the run terminated with an error saying:
see my command bellow and if I am missing something. I followed the liftoff documentation
Command
I assumed what it wants is the types of TEs so I provided it with a file containing a list with the option -f TYPES (file name)
but I still get the same error.
What does your GFF file look like? I needs to follow this format.
I believe that is the required format.
These include
You don't have any that match this list. May want to try
mobile_genetic_element
.Many thanks. I changed every line in the third column to mobile_genetic_element and provided this in file (TYPE) with the -f option. It is currently running.
Thanks again.
So I thought at the end of the run I'd get the ne TEs. but the run ended with the following lines
it generated a bunch of files that I am not sure which one contain the annotated TEs. Can you help?