Do I have to remove overlapping reads from paire

Do I have to remove overlapping reads from paire_end data before Metahplan?

0

Entering edit mode

24 days ago

michelafrancesconi9 ▴ 20

I have paired_end files from shoutgun metagenomics analysis (251 bp). Before starting with Metaphan, I run fastqc and fastq_screen to check how my files are. I used KneadData to delete the human genome, and now it is ok. (I also notice that all my files do not pass the “Per Base Sequence Content.” Is this a problem? (All the other control is OK.)

Should I also have to delete overlapping reads between R1 and R2? How can I do it?

Thanks Michela

overlapping paired_ends metaphaln • 309 views

ADD COMMENT • link updated 23 days ago by GenoMax 148k • written 24 days ago by michelafrancesconi9 ▴ 20

0

Entering edit mode

For the “Per Base Sequence Content”, is really hard to pinpoint what the actual problem is without looking at the graph itself. Is it localised within certain region of the read (leading or trailing)? Also, I don't think it is harmful to keep overlapping R1/R2, it just tells you that your fragments are rather short.

ADD REPLY • link 24 days ago by biofalconch ★ 1.3k

0

Entering edit mode

Thank you for the answer! The problem is always located in the first 10/15 bp

enter image description here

ADD REPLY • link 24 days ago by michelafrancesconi9 ▴ 20

1

Entering edit mode

Please see: https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

That bias in first 10-15 bases is not a problem in terms of alignments etc. "Failing" a test in FastQC is not always a deal breaker. You need to consider the results in context of your experiment.

ADD REPLY • link 23 days ago by GenoMax 148k

Login before adding your answer.