My aim is to count reads aligned to introns of human BRCA1. I am using HTSeq-count. I made a GTF file by obtaining bed file of intronic regions (Hg38) from UCSC table browser. I converted BED file to GTF based on Ensembl data format.
Bed file:
17 43045802 43047642 NM_007294.3_intron_0_0_chr17_43045803_r 0 -
17 43047703 43049120 NM_007294.3_intron_1_0_chr17_43047704_r 0 -
17 43049194 43051062 NM_007294.3_intron_2_0_chr17_43049195_r 0 -
17 43051117 43057051 NM_007294.3_intron_3_0_chr17_43051118_r 0 -
17 43057135 43063332 NM_007294.3_intron_4_0_chr17_43057136_r 0 -
17 43063373 43063873 NM_007294.3_intron_5_0_chr17_43063374_r 0 -
17 43063951 43067607 NM_007294.3_intron_6_0_chr17_43063952_r 0 -
17 43067695 43070927 NM_007294.3_intron_7_0_chr17_43067696_r 0 -
17 43071238 43074330 NM_007294.3_intron_8_0_chr17_43071239_r 0 -
GTF file:
17 Refseq intron 43045802 43047642 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_0";
17 Refseq intron 43047703 43049120 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_1";
17 Refseq intron 43049194 43051062 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_2";
17 Refseq intron 43051117 43057051 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_3";
17 Refseq intron 43057135 43063332 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_4";
17 Refseq intron 43063373 43063873 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_5";
17 Refseq intron 43063951 43067607 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_6";
17 Refseq intron 43067695 43070927 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_7";
17 Refseq intron 43071238 43074330 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_8";
There are multiple transcripts for this gene. If I count reads aligned to introns of NM_007294, I get the result.
NM_007294_intron_0 28
NM_007294_intron_1 21
NM_007294_intron_2 31
NM_007294_intron_3 22
NM_007294_intron_4 30
NM_007294_intron_5 10
NM_007294_intron_6 96
NM_007294_intron_7 73
NM_007294_intron_8 42
Now when I run HTSeq on same bam file giving GTF file for all BRCA transcript, I cannot get the same result:
NM_007294_intron_0 0
NM_007294_intron_1 0
NM_007294_intron_10 0
NM_007294_intron_11 0
NM_007294_intron_12 0
NM_007294_intron_13 0
NM_007294_intron_14 0
NM_007294_intron_15 0
NM_007294_intron_16 0
NM_007294_intron_17 0
NM_007294_intron_18 0
NM_007294_intron_19 0
NM_007294_intron_2 0
NM_007294_intron_20 0
NM_007294_intron_21 0
NM_007294_intron_3 0
NM_007294_intron_4 0
NM_007294_intron_5 0
NM_007294_intron_6 0
NM_007294_intron_7 0
NM_007294_intron_8 0
NM_007294_intron_9 0
NM_007297_intron_0 0
NM_007297_intron_1 0
NM_007297_intron_10 0
NM_007297_intron_11 0
NM_007297_intron_12 0
NM_007297_intron_13 0
NM_007297_intron_14 0
NM_007297_intron_15 0
NM_007297_intron_16 0
NM_007297_intron_17 0
NM_007297_intron_18 0
NM_007297_intron_19 0
NM_007297_intron_2 0
NM_007297_intron_20 0
NM_007297_intron_3 0
NM_007297_intron_4 0
NM_007297_intron_5 0
NM_007297_intron_6 0
NM_007297_intron_7 0
NM_007297_intron_8 0
NM_007297_intron_9 0
NM_007298_intron_0 0
NM_007298_intron_1 0
NM_007298_intron_10 0
NM_007298_intron_11 0
NM_007298_intron_12 0
NM_007298_intron_13 0
NM_007298_intron_14 0
NM_007298_intron_15 0
NM_007298_intron_16 0
NM_007298_intron_17 0
NM_007298_intron_18 0
NM_007298_intron_19 0
NM_007298_intron_2 0
NM_007298_intron_20 0
NM_007298_intron_3 0
NM_007298_intron_4 0
NM_007298_intron_5 0
NM_007298_intron_6 0
NM_007298_intron_7 0
NM_007298_intron_8 0
NM_007298_intron_9 0
NM_007299_intron_0 0
NM_007299_intron_1 0
NM_007299_intron_10 0
NM_007299_intron_11 0
NM_007299_intron_12 0
NM_007299_intron_13 0
NM_007299_intron_14 0
NM_007299_intron_15 0
NM_007299_intron_16 0
NM_007299_intron_17 0
NM_007299_intron_18 0
NM_007299_intron_19 0
NM_007299_intron_2 0
NM_007299_intron_20 0
NM_007299_intron_3 0
NM_007299_intron_4 0
NM_007299_intron_5 0
NM_007299_intron_6 0
NM_007299_intron_7 0
NM_007299_intron_8 0
NM_007299_intron_9 0
NM_007300_intron_0 0
NM_007300_intron_1 0
NM_007300_intron_10 0
NM_007300_intron_11 0
NM_007300_intron_12 0
NM_007300_intron_13 0
NM_007300_intron_14 0
NM_007300_intron_15 0
NM_007300_intron_16 0
NM_007300_intron_17 0
NM_007300_intron_18 0
NM_007300_intron_19 0
NM_007300_intron_2 0
NM_007300_intron_20 0
NM_007300_intron_21 0
NM_007300_intron_22 0
NM_007300_intron_3 0
NM_007300_intron_4 0
NM_007300_intron_5 0
NM_007300_intron_6 0
NM_007300_intron_7 0
NM_007300_intron_8 0
NM_007300_intron_9 0
How is this possible. whe I give a gtf file with one transcript, I get counts, however when GTF file with all transcripts of BRCA1 the resulting counts are 0.
It cannot be chromosome mapping issue (chr17 vs. 17) or any other issue that I can think of.
Can anyone help understand this ambiguous behavior of HTSeq.
htseq-count -f bam -r name -s reverse -t intron -i intron_number myBAM.bam introns_NM_007294.gtf
---- WORKS.
htseq-count -f bam -r name -s reverse -t intron -i intron_number myBAM.bam introns_all_transcripts.gtf
--- Doesn't WORK.
Thanks Adrian.
GTF file:
17 Refseq intron 43045802 43047642 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_0";
17 Refseq intron 43047703 43049120 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_1";
17 Refseq intron 43049194 43051062 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_2";
17 Refseq intron 43051117 43057051 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_3";
17 Refseq intron 43057135 43063332 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_4";
17 Refseq intron 43063373 43063873 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_5";
17 Refseq intron 43063951 43067607 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_6";
17 Refseq intron 43067695 43070927 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_7";
17 Refseq intron 43071238 43074330 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_8";
17 Refseq intron 43074521 43076487 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_9";
17 Refseq intron 43076614 43082403 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_10";
17 Refseq intron 43082575 43090943 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_11";
17 Refseq intron 43091032 43091434 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_12";
17 Refseq intron 43094860 43095845 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_13";
17 Refseq intron 43095922 43097243 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_14";
17 Refseq intron 43097289 43099774 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_15";
17 Refseq intron 43099880 43104121 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_16";
17 Refseq intron 43104261 43104867 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_17";
17 Refseq intron 43104956 43106455 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_18";
17 Refseq intron 43106533 43115725 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_19";
17 Refseq intron 43115779 43124016 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_20";
17 Refseq intron 43124115 43125270 . - . transcript_id "NM_007294"; intron_number "NM_007294_intron_21";
17 Refseq intron 43045802 43047642 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_0";
17 Refseq intron 43047703 43049120 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_1";
17 Refseq intron 43049194 43051062 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_2";
17 Refseq intron 43051117 43057051 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_3";
17 Refseq intron 43057135 43063332 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_4";
17 Refseq intron 43063373 43063873 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_5";
17 Refseq intron 43063951 43067607 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_6";
17 Refseq intron 43067695 43070927 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_7";
17 Refseq intron 43071238 43074330 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_8";
17 Refseq intron 43074521 43076487 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_9";
17 Refseq intron 43076614 43082403 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_10";
17 Refseq intron 43082575 43090943 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_11";
17 Refseq intron 43091032 43091434 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_12";
17 Refseq intron 43094860 43095845 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_13";
17 Refseq intron 43095922 43097243 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_14";
17 Refseq intron 43097289 43099774 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_15";
17 Refseq intron 43099880 43104121 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_16";
17 Refseq intron 43104261 43104867 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_17";
17 Refseq intron 43104956 43106455 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_18";
17 Refseq intron 43106533 43124016 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_19";
17 Refseq intron 43124115 43125276 . - . transcript_id "NM_007297"; intron_number "NM_007297_intron_20";
17 Refseq intron 43045802 43047642 . - . transcript_id "NM_007298"; intron_number "NM_007298_intron_0";
17 Refseq intron 43047703 43049120 . - . transcript_id "NM_007298"; intron_number "NM_007298_intron_1";
17 Refseq intron 43049194 43051062 . - . transcript_id "NM_007298"; intron_number "NM_007298_intron_2";
17 Refseq intron 43051117 43057051 . - . transcript_id "NM_007298"; intron_number "NM_007298_intron_3";
17 Refseq intron 43057135 43063332 . - . transcript_id "NM_007298"; intron_number "NM_007298_intron_4";
17 Refseq intron 43063373 43063873 . - . transcript_id "NM_007298"; intron_number "NM_007298_intron_5";
17 Refseq intron 43063951 43067607 . - . transcript_id "NM_007298"; intron_number "NM_007298_intron_6";
Thanks genomax for very clean formatting my question. I could not figure out how to do it.Thanks Adrian
Counting programs do not count multimapping reads by default. Have you looked into that as a reason for getting 0 counts?
For future reference: You can use the formatting bar (especially the
code
option) by highlighting text you want to format and then clicking the button in red-box below.i'm also going for the "multimapping answer" of genomax