Hello,
I am trying to convert a gtf file to bed file.
gtf2bed < Homo_sapiens.GRCh38.109.gene.gtf > Homo_sapiens.GRCh38.109.gene.bed
I get the following error:
Warning: If your Wiggle data is a significant portion of available system memory, use the --max-mem and --sort-tmpdir options, or use --do-not-sort to disable post-conversion sorting. See --help for more information.
Warning: Potentially missing gene or transcript ID from GTF attributes (malformed GTF at line [1]?)
My original gtf file looks like this:
However and by bed file looks like this:
I was expecting something along these lines -
7 127588344 127588498 ENST00000000233 0 +
11 64305577 64305736 ENST00000000442 0 +
11 64307167 64307179 ENST00000000442 0 +
12 2794952 2795139 ENST00000001008 0 +
2 37231665 37231705 ENST00000002125 0 +
I am having the same issue with the original ensemble 109 gtf file. Same errors and no clean bed file.
Can you give
AGAT
a try: https://agat.readthedocs.io/en/latest/gff_to_bed.htmlThe examples from the tool you use are correct, it is BED12 format. Just use
cut
to limit the output to the first 6 columns if you don't need the rest.hmm be careful, actually no it is not a BED12 format! The format created by gff2bed from bedops is a particular fanciful format. It does not follow the BED specifications. Until the 6th column it could be considered as a BED, but over that column it becomes something orginal.
Refer to the documentation for the correct answer: https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/gtf2bed.html#column-mapping
Although I gave the example of a gene gtf, what I actually want is the transcript ID.
Even if no official specifications exists for bed, my view was to use the definition by who made the format. And from what I know it was UCSC in early 2000 (1998?). So for me what bedops is doing is misleading. At least they have a nice documentation about how they do their own "bed".
As they in the bedops publication
BEDOPS supports a relaxed variation of the BED specification
. They could have created their own format, perhaps it would have been less misleading for the users.Um, there is a specification for BED: https://github.com/samtools/hts-specs/blob/master/BEDv1.pdf
And the BEDOPS documentation does refer to UCSC columns. I know this not only because it is directly from the link I provided above, but because I was the person who wrote said documentation.
Thank you for the link this is really nice. I don't know if there is a specific way o make a specification official (written in a publication, a big group advertising it like for GTF2.2, a consortium managing it like for GFF3 or held under the umbrella of a tool that stands out in the field as you show it for bed), but as long as it is well described, findable, etc, (FAIR) it is good (for me). Thank you for this work. I like the way flexibility is given is the format via
BEDn+m
!To come back to Bedops, and following your recent specifications (from what I see it is from ~2021),
gtf2bed
gives aBED6+5
format as output.P.S:
We have to pay attention that a BED12 would is different of a BED6+6 :)
I wrote this documentation a long time ago. Whatever, you're wrong and being unpleasant about it. Have a nice day.
Sorry, I did not meant to be unpleasant, really. I will read more carefully your spec.