Hi all I want to use ht-seq count for counting reads. I am planning to do both read counts for exons and gene, and compare them later. If I have understood right, we calculate exon counts to find differential expression between alternatively spliced transcripts which may not be visible in gene counts. If so, which feature to use ?
Mygff file is like this
- gene 3503760 3523716 . - . ID=gene7;
- mRNA 3503760 3523716 . - . ID=rna12;Parent=gene7;
- exon 3515344 3515454 . - . ID=id60;Parent=rna12;
- CDS ID=cds9;Parent=rna12
- CDS ID=cds9;Parent=rna12
- CDS ID=cds9;Parent=rna12
- mRNA 3503760 3523659 . - . ID=rna13;Parent=gene7;
- exon 3523589 3523659 . - . ID=id63;Parent=rna13;
- exon
If the 'mRNAs' represents alternatively spliced transcripts of gene7, then in ht-seq count,
is it ok use --type mRNA and --i ID or should I use --type exon and -i ID ? and why?
to calculate the raw counts for genes, is the option --type gene and -i ID are correct?
Thanks in advance for your suggestions.
mRNA will count annotated isoforms, to look for new isoforms you can use StringTie, and by default htseq-count count genes, so yes, option 2 is fine.
According to the manual - https://htseq.readthedocs.io/en/release_0.11.1/count.html, by default it counts exon when aGTF file is provided.
(-t <feature type="">, --type=<feature type="">
However, I want to understand the significance of using --type exon over --type mRNA.
To count genes (complete, all annotated exons):
To count mRNAs (if a gen has to isoforms it will count two mRNAs):