Entering edit mode
24 days ago
michelafrancesconi9
▴
20
I have paired_end files from shoutgun metagenomics analysis (251 bp). Before starting with Metaphan, I run fastqc and fastq_screen to check how my files are. I used KneadData to delete the human genome, and now it is ok. (I also notice that all my files do not pass the “Per Base Sequence Content.” Is this a problem? (All the other control is OK.)
Should I also have to delete overlapping reads between R1 and R2? How can I do it?
Thanks Michela
For the “Per Base Sequence Content”, is really hard to pinpoint what the actual problem is without looking at the graph itself. Is it localised within certain region of the read (leading or trailing)? Also, I don't think it is harmful to keep overlapping R1/R2, it just tells you that your fragments are rather short.
Thank you for the answer! The problem is always located in the first 10/15 bp
Please see: https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/
That bias in first 10-15 bases is not a problem in terms of alignments etc. "Failing" a test in FastQC is not always a deal breaker. You need to consider the results in context of your experiment.