I just ran repseek
on a +/- 100kb BAC sequence and got this output.
Distant.dir 27938 36273 1434 1433 6901 28288-36623-148-2.00 94.491 1240.45 2.00 2 1.00
Distant.dir 47964 55552 2765 2771 4823 48291-55879-127-2.00 97.367 2483.76 2.00 2 1.00
There is a script available from DAWGPAWS to convert repseek
output to GFF3 format, so I did and got the following.
$ ./cnv_repseek2gff.pl < repseek.out
Expecting input from STDIN
seq repseek direct_repeat 27938 29372 1240.45 + . repseek1:dir
seq repseek direct_repeat 36273 37706 1240.45 + . repseek1:dir
seq repseek direct_repeat 47964 50729 2483.76 + . repseek2:dir
seq repseek direct_repeat 55552 58323 2483.76 + . repseek2:dir
Ok, this format is much more familiar. However, the last column (attributes) is not valid GFF3. This script creates two lines of GFF3 for each line in the repseek
output. How are these pairs of features related and what is the proper way to represent that relationship in GFF3?
that's a nice way to indicate the relationship given what's available in gff