I have ChIP-seq (H3K9me3) and RNA-seq data that I wish to map to transposons. I have identified the transposons in my non-model species genome using RepeatModeler and RepeatMasker. The output of RepeatModeler confuses me though - I have, for example, multiple LTR/Gypsy families in the consensi.fa.classified file (annotated as rnd-x_family-x; where x is a number). What does this mean?
How can I obtain consensus sequences for LTRs or specific families of LTRs? In the repeatmodeler consensi.fa.classified file I have sequences corresponding to rnd-x_family-x - what does this mean? Can I simply map to these sequences?
Repeat modeler finds the repeated sequences in the genome. It gives them a number (your x), and compares them to repbase. If the sequence has a hit, it is annotated with the hit in repbase.
The consensi.fa file already contains the consensus sequences of the family. Repeat modeler uses each copy to build the consensus. Also, you can't get the consensus of all LTRs as they are totally different.
I don't get this question.
I would not advise you to do it. You should map the ChIP seq to the genome, localise your transposable element sequences on the genome (with repeat masker, using consensi.fa base) and look if you have some peaks on TEs.
If for example I want to compare H3K9me3 density between LTRs and LINES. From the RepeatMasker.gff output, I could look at total reads that map to annotated LTRs versus LINES? This would be reads normalised to background (input).
I am not a ChIP seq specialist. But my personal point of view would be to do the peak calling, and then compare to the repeatmasker gff.
If for example I want to compare H3K9me3 density between LTRs and LINES. From the RepeatMasker.gff output, I could look at total reads that map to annotated LTRs versus LINES? This would be reads normalised to background (input).