I downloaded the hg38 gtf file from UCSC and converted it to bed format using convert2bed
convert2bed --input=gtf --output=bed --do-not-sort < hg38.ncbiRefSeq.gtf > hg38.ncbiRefSeq.bed
and then sorted the hg38.ncbiRefSeq.bed file using sort-bed
Here are the first 3 lines from my sorted hg38 bed file:
chr1 11873 12227 DDX11L1 . + ncbiRefSeq exon . gene_id "DDX11L1"; transcript_id "NR_046018.2"; exon_number "1"; exon_id "NR_046018.2.1"; gene_name "DDX11L1";
chr1 11873 14409 DDX11L1 . + ncbiRefSeq transcript . gene_id "DDX11L1"; transcript_id "NR_046018.2"; gene_name "DDX11L1";
chr1 12612 12721 DDX11L1 . + ncbiRefSeq exon . gene_id "DDX11L1"; transcript_id "NR_046018.2"; exon_number "2"; exon_id "NR_046018.2.2"; gene_name "DDX11L1";
Is this the expected format from the convert2bed
tool?
My bed file that I want gene annotations for has 100 rows and looks like this:
chr1 233914000 233915000 35
chr1 181582000 181583000 42
chr1 215193000 215194000 51
When I ran bedmap
using this command:
bedmap --echo --echo-map-id --delim '\t' \
hg38.ncbiRefSeq_sort.bed H3K27ac_top_100_ranked_bins.bed > top_100_ranked_bins_annotated.bed
It produced a file with 4,131,447 rows. I was expecting to get back 100 rows. What I want is my input bed file with gene names added to it.
Is there something wrong with the original gtf file? Did the gtf to bed conversion fail? If my original gtf file is wrong how do I get the correct gtf file?