I am running qualimap bamqc on some human RNAseq samples and I am seeing very high N content in my reads. I have done proper qc of my data and I am not sure why is there such high N content being shown in the bam. If anyone can advice as to why I am seeing this high N content and how to interpret this?
how could you have 435% Ns? The percentages for A,T,G,C add up to almost 100% as well
seems like something is wrong with these numbers
Yes, I also found it very strange. Is it an error of the tool or something else, I am not sure.
What did that include?
Generally there should be no N's in data you receive from sequencing now a days since technical aspects are well worked out. N's indicate some sort of issue (hardware/software/libraries) with run. ~12 Billion N's seems rather high.
I have trimmed low quality bases, adapter content, checked reads quality using fastqc post that to ensure quality of all the reads being used further downstream for alignment
This is my fastqc report for this sample.
fastqc has another plot that shows the sequence composition of the reads, that would show you if you have Ns
There are no significant N's (plot is at top) so the N's in qualimap must be coming from CIGAR as you predict.
Thank you so much for the help.