Hello, biostars!
I'm trying to get .gff file from Tandem Repeat Finder output.
Since TRF can't do that, I've found TRAP tool, which can create .gff. But, TRAP creates as many .gff files as the number of contigs (ok, there is 'cat' command).
The main problem is lines of .gff:
. TRF satellite 1 72 144 + . note "satellite sequence" "TRF parameters 2 7 7 80 10 50 2000" "repeat unit size = 2" "copy number = 36.0" "predicted by Tandem Repeats Finder 4.07b" ; label "satellite" ; rpt_type "tandem" ; rpt_unit "TA" ; color 9
. TRF satellite 452 512 69 + . note "satellite sequence" "TRF parameters 2 7 7 80 10 50 2000" "repeat unit size = 14" "copy number = 4.7" "predicted by Tandem Repeats Finder 4.07b" ; label "satellite" ; rpt_type "tandem" ; rpt_unit "TTCTCCATTAATTA" ; color 9
. TRF satellite 453 498 74 + . note "satellite sequence" "TRF parameters 2 7 7 80 10 50 2000" "repeat unit size = 23" "copy number = 2.0" "predicted by Tandem Repeats Finder 4.07b" ; label "satellite" ; rpt_type "tandem" ; rpt_unit "TCTCCATTAATAATTCTCCATTA" ; color 9
Instead of seq. name there are . TRF
. Therefore it is impossible to sort this .gff file.
Is there any tool for obtaining 'normal' .gff file or any script to produce such a file from TRF output?
PS: RepeatMasker makes .gff files, but my aim is creating one .gff file from several tools, thus i'm going to intersect two or three .gff files.
Thank you! Very useful, I'll try it.
Excuse me, I also meet this question, I wonder where is your python code, thank you!