Question

Qualimap bamqc with very high N%

0

Entering edit mode

15 months ago

Priyanka ▴ 10

enter image description here

I am running qualimap bamqc on some human RNAseq samples and I am seeing very high N content in my reads. I have done proper qc of my data and I am not sure why is there such high N content being shown in the bam. If anyone can advice as to why I am seeing this high N content and how to interpret this?

This is my chromosome wise coverage

qualimap • 1.4k views

ADD COMMENT • link 15 months ago by Priyanka ▴ 10

0

Entering edit mode

how could you have 435% Ns? The percentages for A,T,G,C add up to almost 100% as well

seems like something is wrong with these numbers

ADD REPLY • link 15 months ago by Istvan Albert 101k

0

Entering edit mode

Yes, I also found it very strange. Is it an error of the tool or something else, I am not sure.

ADD REPLY • link 15 months ago by Priyanka ▴ 10

0

Entering edit mode

I have done proper qc of my data

What did that include?

Generally there should be no N's in data you receive from sequencing now a days since technical aspects are well worked out. N's indicate some sort of issue (hardware/software/libraries) with run. ~12 Billion N's seems rather high.

ADD REPLY • link 15 months ago by GenoMax 147k

0

Entering edit mode

I have trimmed low quality bases, adapter content, checked reads quality using fastqc post that to ensure quality of all the reads being used further downstream for alignment

enter image description here

This is my fastqc report for this sample.

enter image description here

ADD REPLY • link 15 months ago by Priyanka ▴ 10

0

Entering edit mode

fastqc has another plot that shows the sequence composition of the reads, that would show you if you have Ns

ADD REPLY • link 15 months ago by Istvan Albert 101k

1

Entering edit mode

There are no significant N's (plot is at top) so the N's in qualimap must be coming from CIGAR as you predict.

ADD REPLY • link 15 months ago by GenoMax 147k

0

Entering edit mode

Thank you so much for the help.

ADD REPLY • link 15 months ago by Priyanka ▴ 10

score 0 · Answer 1 · 2023-08-25

0

Entering edit mode

15 months ago

Istvan Albert 101k

perhaps this qualimap tool counts the Ns in the CIGAR string

Depending on the aligner the CIGAR string can have Ns in it indicating a spliced alignments (an aligner can put Ns where over the intronic regions)

in which case the N is pretty much meaningless in the context of quality

ADD COMMENT • link 15 months ago by Istvan Albert 101k

0

Entering edit mode

I have used STAR aligner for mapping. Does that have to do anything with this N %?

ADD REPLY • link 15 months ago by Priyanka ▴ 10

1

Entering edit mode

As I mentioned before the letter N can stand for two different things.

An N in the FASTQ file (ambiguous base)
or an N in the CIGAR string that means intronic region.

The first type of N is a problem the second kind of N is not.

You most likely have the second kind of Ns.

ADD REPLY • link 15 months ago by Istvan Albert 101k

0

Entering edit mode

Thank you so much for the help. It is a whole transcriptome bulk RNAseq so it wont be unexpected to have presence of intronic reads.

ADD REPLY • link 15 months ago by Priyanka ▴ 10