Dear biostars community,
Do you have any ideas how to "trick" htseq-count to treat a bed file of genomic coordinates as a gtf file of gene annotations so that i can get counts per genomic interval instead of counts per actual gene?
I have tried to edit the bed file to add upto 9 columns (like a standard gtf) but its possible that my feature label (in column 3) and gene_id label (in column9) are incorrect. i just added these labels arbitrarily. with this "fake" gtf file, i use a bam file with alignments for PE RNA-seq reads mapped with STAR.
this is what my "fake" gtf file looks like:
chr1 . exon 0 10000 . - 0 gene_id "1";
chr1 . exon 10000 20000 . - 0 gene_id "2";
chr1 . exon 20000 30000 . - 0 gene_id "3";
htseq count outputs this error and aborts:
Error occured when processing GFF file (line 1 of file *.gtf):
start too small
[Exception type: IndexError, raised in _HTSeq.pyx:376]
htseq-count crashes and output this error:
I assume that 0-based start is the problem here. how to edit these coordinates so that the intervals remain the same in terms of the region it spans? should it be 1 based? (i thought until now that the0-based start and 1-based end in bed format was compatible with htseq).
thank you so much for your help!
It would be a lot easier with
featureCounts
:thank you for the suggestion. since I'm already very familiar with htseq and made the gtf file that it likes, i prefer to continue to use htseq. i would like to know how to get around this error that the program raises- specifically why 0-based start is not compatible? and how to fix it?
GTF is 1-based. Convert 0- to 1-based coordinates. Should do the trick.
Thank you for your help !
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!