Given a BED file (BED12), what is the fastest tool (or available tools) to convert it to GTF or GFF3?
Given a BED file (BED12), what is the fastest tool (or available tools) to convert it to GTF or GFF3?
This is probably a duplicated question from:
How To Convert Bed Format To Gtf?
How to convert original BED file to a GTF ?
Converting different annotation file formats (GTF/GFF/BED) to each other
How to change scaffold.fasta file or scaffold.bed file to GTF file?
Converting from BED to SAF/GFF
However, 1) all are outdated, 2) none produce a complete GTF/GFF (gene_ids attribute), 3) no benchmark, and 4) none provide an ordered list of options. Here, I provide an ordered list of options:
A high-performance BED-to-GTF converter written in Rust from https://github.com/alejandrogzi/bed2gtf.
Usage: bed2gtf[EXE] --bed/-b <BED> --isoforms/-i <ISOFORMS> --output/-o <OUTPUT> --threads/-t <THREADS>
where:
--bed <BED>: a .bed file
--isoforms <ISOFORMS>: a tab-delimited file
--output <OUTPUT>: path to output file (*.gtf)
The isoforms file specification:
a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):
> cat isoforms.txt
ENSG00000198888 ENST00000361390
ENSG00000198763 ENST00000361453
ENSG00000198804 ENST00000361624
ENSG00000188868 ENST00000595977
Converts
A Rust BED-to-GFF3 translator that runs in parallel from https://github.com/alejandrogzi/bed2gff.
Usage: bed2gff[EXE] --bed/-b <BED> --isoforms/-i <ISOFORMS> --output/-o <OUTPUT> --threads/-t <THREADS>
where:
--bed <BED>: a .bed file
--isoforms <ISOFORMS>: a tab-delimited file
--output <OUTPUT>: path to output file (*.gff)
The isoforms file specification:
a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):
> cat isoforms.txt
ENSG00000198888 ENST00000361390
ENSG00000198763 ENST00000361453
ENSG00000198804 ENST00000361624
ENSG00000188868 ENST00000595977
Convert
UCSC offers a fast way to convert BED into GTF files through KentUtils or specific binaries using:
bedToGenePred in.bed /dev/stdout | genePredToGtf file /dev/stdin out.gtf
You can install these tools with bioconda, or download them here. The gene_id is only achieved when using refTables (a format specified in UCSC's web browser), you can see a more elaborate answer here Obtaining Ucsc Tables Via Ftp And Converting Them To Proper Gff3 Via Genepredtogtf?.
Other scripts/tools That DO NOT produce a complete GTF file (lacking gene_id attributes) are:
-kscript from https://github.com/holgerbrandl/kscript:
kscript https://git.io/vbJ4B my.bed > my.gtf
from https://github.com/pfurio/bed2gtf:
python bed2gtf [options] <mandatory>
Considering only the options that produce gene_ids attributes, bed2gtf and bed2gff are faster by ~3-4 seconds than UCSC's C binaries. More detailed instructions of this tools are explained in the sources linked.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thanks for making these great tools