Question

bedmap to annotate bed file with GTF that was converted to BED produces way too many results

0

Entering edit mode

4.7 years ago

Tawny ▴ 180

I downloaded the hg38 gtf file from UCSC and converted it to bed format using convert2bed

convert2bed --input=gtf --output=bed --do-not-sort < hg38.ncbiRefSeq.gtf > hg38.ncbiRefSeq.bed

and then sorted the hg38.ncbiRefSeq.bed file using sort-bed

Here are the first 3 lines from my sorted hg38 bed file:

chr1    11873   12227   DDX11L1 .       +       ncbiRefSeq      exon    .       gene_id "DDX11L1"; transcript_id "NR_046018.2"; exon_number "1"; exon_id "NR_046018.2.1"; gene_name "DDX11L1";
chr1    11873   14409   DDX11L1 .       +       ncbiRefSeq      transcript      .       gene_id "DDX11L1"; transcript_id "NR_046018.2";  gene_name "DDX11L1";
chr1    12612   12721   DDX11L1 .       +       ncbiRefSeq      exon    .       gene_id "DDX11L1"; transcript_id "NR_046018.2"; exon_number "2"; exon_id "NR_046018.2.2"; gene_name "DDX11L1";

Is this the expected format from the convert2bed tool?

My bed file that I want gene annotations for has 100 rows and looks like this:

chr1    233914000    233915000  35
chr1    181582000    181583000  42
chr1    215193000    215194000  51

When I ran bedmap using this command:

bedmap --echo --echo-map-id --delim '\t' \
hg38.ncbiRefSeq_sort.bed H3K27ac_top_100_ranked_bins.bed > top_100_ranked_bins_annotated.bed

It produced a file with 4,131,447 rows. I was expecting to get back 100 rows. What I want is my input bed file with gene names added to it.

Is there something wrong with the original gtf file? Did the gtf to bed conversion fail? If my original gtf file is wrong how do I get the correct gtf file?

bedmap bed GTF • 1.2k views

ADD COMMENT • link 4.7 years ago by Tawny ▴ 180