bedtools getfasta error: Error: malformed GFF entry at line 15. Coordinate detected that is < 1
1
0
Entering edit mode
23 months ago
Simone ▴ 10

I'm trying to use bedtools to extract regions of a genome based on coordinates in an annotation file. I keep getting error messages that Line 15 of my BED file has a start/end position where start > end,

Error: malformed GFF entry at line 15. Coordinate detected that is < 1. Exiting.
srun: error: cn68: task 0: Exited with exit code 1

but the file looks like this (line 15):

JAJFZI010040149.1   Complete    1671.3  0   4952    -   2371at8457  1421    https://www.orthodb.org/v10?query=2371at8457    Upstream transcription factor family member 3

BED files are zero-based I believe, but it's reading it as a GFF file. It has a .BED file extension so I'm not sure why it's detecting a GFF...

BED GFF bedtools • 1.2k views
ADD COMMENT
0
Entering edit mode
23 months ago

Your file IS a GTF/GFF file whatever is the extension. Most Unix programs don't care about the suffix. GTF/GFF are one based. A GTF file cannot have a coordinate (4th column) starting at 0.

ADD COMMENT
0
Entering edit mode

It's not a GFF file, it's a TSV annotation file, output from BUSCO. The annotations are coordinates for BUSCO genes. I've tried using BEDOPS to convert it to a BED or GFF but it doesn't work.

ADD REPLY
0
Entering edit mode

but it's decoded AS a GFF/GTF file because bedtools can see columns 4 and 5 are integers. move columns 4 and 5 to columns 2 and 3 to create a BED file.

ADD REPLY
0
Entering edit mode

Ah I see, thank you so much for your help! I'll use an awk one-liner to switch the coordinate columns to 2 and 3. Apparently the newest version of BUSCO (v5.4) outputs a GFF file now (probably for this reason), but all of my genomes were run on v5.2 so I'm having to work around it.

ADD REPLY

Login before adding your answer.

Traffic: 2014 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6