Hey, guys! I'm processing some whole-genome sequencing data, and some reported "per base n content" fail. I'm wondering that how the fail influencing SNV calling, SV calling and post analysis? And what should I do to confirm the exact reasons? Thank you very much!
What's the nature of this data? It looks like it was processed already?
Your N content seems very specific around 120bp with a correspnding decrease in sequence quality at that position. If this isn't expected, you can take a look at the reads to see what is at that position.
What does the rest of the QC look like?
Thank you very much for your reply! You are right. The data was processed by Fastp. But after comparing FASTQC results before and after Fastp, I didn't find differences.
Here are all FASTQC results before Fastp. Following the reply below, I will show you graph with multiqc. There are 15 samples from the same batch of sequencing data, and they all have the same issue.
Here are all FASTQC results after Fastp.
Not all samples are affected. There are 2 clear groups of affected/unaffected in mean quality scores, and per sequence quality scores.