I have been analyzing HISAT2 aligned RNA sequencing reads in feature counts and it seems most of my aligned reads are falling into the Unassigned Ambiguity category, meaning they overlap multiple meta-features. Is this normal with 3' mRNA sequencing? Does this necessarily mean that my reads cover multiple genes or could it be that multiple meta-features cover the exact same genomic region?
Also, I would like to know what are the reads behaving this way - is there a way feature counts can label this out? Thank you very much.
Assigned 2030063
Unassigned_Unmapped 7166618
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 1302741
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_NoFeatures 583039
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 7609437
Hi Michael,
Thank you so much for your answer. Our library should be negative strand specific and we can re-label our bam to emphasize that for feature count... What other parameters should I alter please? Also, I'm not very familiar with htseq-count, does it considers the overlapped features? Thank you so much!
Hi DVA,
You're right featurecounts' starndedness parameter is not mentioned in the listed parameter on the CL. According to the webpage :
Thus, you should use -s 2.
In htseq-count the parameter name is the same (-s or --stranded=) and the options are no, yes, and reverse. Here you also should use reverse.
[EDIT] The -s parameter is actually documented in the command line help. I was wrongly scrolling in my byobu session
Hi Michael, thank you very much for all the help. I opened another post when I also realized the annotation files make a difference for my case. C: Which annotation file to use However, as the discussion goes, there are more overlap with this post. I'm sorry about that. Nonetheless, thanks a lot for your help.