Question

Proper HTSeq usage on bacterial genome. Don't quite understand --t

0

Entering edit mode

17 months ago

SushiRoll ▴ 140

Hi everyone,

I'm trying to run HTSeq on a group of BAM files generated from the alignment of an RNAseq illumina reads mapped to a reference genome. The reference genome is the sequence with highest quality available and was downloaded from NCBI refseq. I want to generate a count table using HTSeq for later DE as follows:

   htseq-count -f bam -t  gene -i gene alignment.bam reference.gtf

what confuses me is that according to the manual the -t is "exon" by default but I don't have exon in my third column of the gtf file. On the other hand, I do have exon in my gff file. I see that sometimes gtf and gff are used interchangeably in spite of them having different structures, this confuses me a little.

Another question is: Should I sort the bam files? The manual talks about sorting and stuff but then I read here https://www.echemi.com/community/running-htseq-count-over-bam-files_mjart2205182203_481.html that newer versions of HTSeq don't require a sorted input.

Thanks for any help!

RNAseq HTSeq • 720 views

ADD COMMENT • link 17 months ago by SushiRoll ▴ 140

score 2 · Accepted Answer · 2023-10-19

2

Entering edit mode

17 months ago

Shred ★ 1.6k

Most of bacterial products are encoded by a single construct, with no splicing. You could proceed renaming the feature column into "exon" or pass another column name (CDS?).

Regarding the last question, the answer is yes, there's no need for sorting according to the docs. Sorting the alignment before counting often results in better performances.

Similar issue raised here

ADD COMMENT • link 17 months ago by Shred ★ 1.6k

0

Entering edit mode

Hi Shred,

Thanks a lot for your answer, concise and clear. I tried using "CDS" but it somehow failed. I'm aware that exon doesn't really make a lot of sense in the bacterial world, my confusion is because the docs suggest that the annotation file should be .gtf and talk about "exon" while the "exon" is actually in the .gff formatted annotation but not in the .gtf. My confusion was should I use the recommended gtf and use some other identifier such as "gene" or switch to a gff and use exon. I will now do as you suggested and replace "gene" for "exon" in the gtf file and see what happens. The github link was also useful,

Thanks.

ADD REPLY • link 17 months ago by SushiRoll ▴ 140