Question

High Unassigned Ambiguity in Feature Counts

0

Entering edit mode

7.3 years ago

DVA ▴ 640

I have been analyzing HISAT2 aligned RNA sequencing reads in feature counts and it seems most of my aligned reads are falling into the Unassigned Ambiguity category, meaning they overlap multiple meta-features. Is this normal with 3' mRNA sequencing? Does this necessarily mean that my reads cover multiple genes or could it be that multiple meta-features cover the exact same genomic region?

Also, I would like to know what are the reads behaving this way - is there a way feature counts can label this out? Thank you very much.

Assigned    2030063
Unassigned_Unmapped 7166618
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 1302741
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_NoFeatures   583039
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    7609437

feature count • 7.5k views

ADD COMMENT • link updated 7.3 years ago by michael.ante ★ 4.0k • written 7.3 years ago by DVA ▴ 640

score 3 · Accepted Answer · 2018-04-30

3

Entering edit mode

7.3 years ago

michael.ante ★ 4.0k

Hi DVA,

In many genomes, you have overlapping genes on different strands. Without correct orientation, the reads will not be assigned without ambiguity.

Is your 3' seq protocol strand specific? If so, you should run featureCount with the corresponding parameter. You can run e.g. infer_experiment from RSeQC to check the strandedness.

You can also re-run your analysis with htseq-count.

Cheers,

Michael

ADD COMMENT • link 7.3 years ago by michael.ante ★ 4.0k

0

Entering edit mode

Hi Michael,

Thank you so much for your answer. Our library should be negative strand specific and we can re-label our bam to emphasize that for feature count... What other parameters should I alter please? Also, I'm not very familiar with htseq-count, does it considers the overlapped features? Thank you so much!

ADD REPLY • link 7.3 years ago by DVA ▴ 640

1

Entering edit mode

Hi DVA,

You're right featurecounts' starndedness parameter is not mentioned in the listed parameter on the CL. According to the webpage :

Perform strand-specific read counting (use '-s 2' if reversely stranded):

featureCounts -s 1 -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bam

Thus, you should use -s 2.

In htseq-count the parameter name is the same (-s or --stranded=) and the options are no, yes, and reverse. Here you also should use reverse.

[EDIT] The -s parameter is actually documented in the command line help. I was wrongly scrolling in my byobu session

ADD REPLY • link 7.3 years ago by michael.ante ★ 4.0k

0

Entering edit mode

Hi Michael, thank you very much for all the help. I opened another post when I also realized the annotation files make a difference for my case. C: Which annotation file to use However, as the discussion goes, there are more overlap with this post. I'm sorry about that. Nonetheless, thanks a lot for your help.

ADD REPLY • link 7.3 years ago by DVA ▴ 640