Entering edit mode
24 months ago
Simone
▴
10
I'm trying to use bedtools to extract regions of a genome based on coordinates in an annotation file. I keep getting error messages that Line 15 of my BED file has a start/end position where start > end,
Error: malformed GFF entry at line 15. Coordinate detected that is < 1. Exiting.
srun: error: cn68: task 0: Exited with exit code 1
but the file looks like this (line 15):
JAJFZI010040149.1 Complete 1671.3 0 4952 - 2371at8457 1421 https://www.orthodb.org/v10?query=2371at8457 Upstream transcription factor family member 3
BED files are zero-based I believe, but it's reading it as a GFF file. It has a .BED file extension so I'm not sure why it's detecting a GFF...
It's not a GFF file, it's a TSV annotation file, output from BUSCO. The annotations are coordinates for BUSCO genes. I've tried using BEDOPS to convert it to a BED or GFF but it doesn't work.
but it's decoded AS a GFF/GTF file because bedtools can see columns 4 and 5 are integers. move columns 4 and 5 to columns 2 and 3 to create a BED file.
Ah I see, thank you so much for your help! I'll use an awk one-liner to switch the coordinate columns to 2 and 3. Apparently the newest version of BUSCO (v5.4) outputs a GFF file now (probably for this reason), but all of my genomes were run on v5.2 so I'm having to work around it.