HTseq count issue, not aligning to features
1
0
Entering edit mode
11 weeks ago

Hi all,

I am trying to perform an RNAseq analysis. I have created an index using HISAT2, aligned my paired-end reads to the index, and am now trying to count reads using HTseq.

Unfortunately, when counting, HTseq yields the following result.

__no_feature    6280013
__ambiguous 0
__too_low_aQual 4434361
__not_aligned   4758201
__alignment_not_unique  3366408

What I have done to troubleshoot this.

  • Create new index using different file (transcriptome instead of genome)
  • Use .gff or .gtf file for counting.
  • Sort .sam file prior to counting.
  • Ensure --stranded=no and --nonunique=all

These changes have not yielded correctly counting output.

At this point I am not sure what to do. I imagine there is something wrong with the way I set my index up and potentially not matching the .gtf file, however, I have downloaded these from the same source so I am not sure.

Thanks

htseq hisat2 rna-seq • 649 views
ADD COMMENT
0
Entering edit mode

Have you examined the alignments with IGV or a similar genome browser? Are the reads aligning in correct place (e.g. under exons). Also try using featureCounts to see if the result is any different.

ADD REPLY
0
Entering edit mode

Yes, I have looked at them in a genome browser and they look ok, so I am confused why HTSeq is unable to count them. I tried using featureCounts per your suggestion but this has resulted in 0 reads being counted.

ADD REPLY
0
Entering edit mode
11 weeks ago
MolGeek ▴ 80

Have you checked if the cromosome names are represented the same way in both bam files and the gtf/gff? For example if in bams the chromosomes are named as chr1,chr2... and in the gtf file as 1,2,3 migh produce a problem in counting.

ADD COMMENT
0
Entering edit mode

From original post

I have downloaded these from the same source

So it would be unlikely.

ADD REPLY
0
Entering edit mode

I dont see chromosome names in my .gtf file, and I thought this would not be a concern as I am using gene ID as the ID attribute (not position)? Is that not correct?

ADD REPLY
0
Entering edit mode

Im not sure how to attach the file itself but here is a screenshot of the .gtf and the first lines of the .sam file after I aligned...enter image description here

enter image description here

ADD REPLY
0
Entering edit mode

It appears that you have aligned the data to a transcriptome file and not the genome. XM_* accession number and the length show that. So this is not going to work.

You will need to align your data to the genome NC_037638 and then count using the GTF.

If you want to use the transcripts to quantitate against then you will need to use a program like salmon instead.

ADD REPLY
0
Entering edit mode

Thank you for your help. I appreciate it.

ADD REPLY

Login before adding your answer.

Traffic: 2123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6