The thing is I have a multi FASTA file and I was hoping to extract the gene coding regions with Glimmer multi-extract. I have already used the glimmer3 script and got two files: a .predict and a .detail. Now, when I try to use multi-extract it just gives me an error. Multi-extract asks me for this:
USAGE: multi-extract [options] <sequence-file> <coords>
Read multi-fasta-format <sequence-file> and extract from it the
subsequences specified by <coords>. By default, <coords>
is the name of a file containing lines of the form
<id> <tag> <start> <stop> [<frame>] ...
<id> is the identifier for the subsequence
<tag> is the tag of the sequence in <sequence-file> from which
to extract the entry
Now, although the glimmer3 package itself doesn't tell you from where you're supposed to get your <coords>
file I assume it is from the .predict file (though some biolinux website suggested that the long-orfs output would do. In any case long-orfs doesn't seem to work with multi fasta as it only extracts the orfs from the first contig in my file.). But then.... the .predict file doesn't have the right structure, for a start it doesn't even include an <id>
column, it's something like this:
>contig-7
orf00002 1741 461
orf00003 3381 1747
>Wcontig-7000023
>Wcontig-11112
orf00001 426 2648
orf00002 2710 4581
orf00003 4569 5480
orf00004 6990 6133
orf00006 9180 7108
orf00007 10201 9209
orf00008 11663 10203
orf00009 12489 11680
orf00010 13153 12473
orf00011 14382 13225
orf00013 14715 15968
orf00014 19868 16410
>Wcontig-1674000002
orf00001 2995 637
orf00002 2497 1166
orf00003 2984 2529
Does anybody know if I'm doing something terribly wrong or do I have to apply some commands to the file in order for it to meet multi-extract rules?
hi, have you solved this problem? I met the same problem as yours.
What do you do next?