Proper HTSeq usage on bacterial genome. Don't quite understand --t
1
0
Entering edit mode
13 months ago
SushiRoll ▴ 140

Hi everyone,

I'm trying to run HTSeq on a group of BAM files generated from the alignment of an RNAseq illumina reads mapped to a reference genome. The reference genome is the sequence with highest quality available and was downloaded from NCBI refseq. I want to generate a count table using HTSeq for later DE as follows:

   htseq-count -f bam -t  gene -i gene alignment.bam reference.gtf 

what confuses me is that according to the manual the -t is "exon" by default but I don't have exon in my third column of the gtf file. On the other hand, I do have exon in my gff file. I see that sometimes gtf and gff are used interchangeably in spite of them having different structures, this confuses me a little.

Another question is: Should I sort the bam files? The manual talks about sorting and stuff but then I read here https://www.echemi.com/community/running-htseq-count-over-bam-files_mjart2205182203_481.html that newer versions of HTSeq don't require a sorted input.

Thanks for any help!

RNAseq HTSeq • 550 views
ADD COMMENT
2
Entering edit mode
13 months ago
Shred ★ 1.5k

Most of bacterial products are encoded by a single construct, with no splicing. You could proceed renaming the feature column into "exon" or pass another column name (CDS?).

Regarding the last question, the answer is yes, there's no need for sorting according to the docs. Sorting the alignment before counting often results in better performances.

Similar issue raised here

ADD COMMENT
0
Entering edit mode

Hi Shred,

Thanks a lot for your answer, concise and clear. I tried using "CDS" but it somehow failed. I'm aware that exon doesn't really make a lot of sense in the bacterial world, my confusion is because the docs suggest that the annotation file should be .gtf and talk about "exon" while the "exon" is actually in the .gff formatted annotation but not in the .gtf. My confusion was should I use the recommended gtf and use some other identifier such as "gene" or switch to a gff and use exon. I will now do as you suggested and replace "gene" for "exon" in the gtf file and see what happens. The github link was also useful,

Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6