Hello,
I was wondering if there was a way of using htseq to count the number of reads per genes (introns + exons) and not only on the exonic regions of genes.
Thanks a lot!
Hello,
I was wondering if there was a way of using htseq to count the number of reads per genes (introns + exons) and not only on the exonic regions of genes.
Thanks a lot!
Hi pernille.rainer
According to the HTSeq-count documentation, you can specify on what type of feature, you want to assess the counting:
-t feature-type, --type= feature-type
feature type (3rd column in GFF file) to be used, all features of other type are ignored (default, suitable for RNA-Seq analysis using an Ensembl GTF file: exon)
Thus if you use "-t gene", you should get all reads within a gene-locus (frankly, I haven't tested it).
Denote, that overlapping features will appear in this scenario more often and more reads will be "ambiguous" .
Cheers, Michael
It all comes down to which gtf you use as input. So if you gtf has intervals containing the entire gene or introns and exons separately, that will work.
That's what I though thank you :) But I don't know much about all that, where can I find a gtf containing entire gene or introns? I tried on ensembl but I were not so successful
Thanks!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please use
ADD COMMENT
to reply to earlier answers, as such this thread remains logically structured and easy to follow.The information you want is already included in a standard gtf file, see third column whenever it indicates 'gene'. You can use the
-t
flag in htseq count to specify which feature from that third column you want to use for counting.Are you sure about that? Usually there are "gene" and "transcript" entries.
I just had a look and at least in the gtfs of GRCh38.81 and GRCh37.82 a 'gene' and 'transcript' feature type is present.