Duplicate reads due to multiple db in kneaddata

0

Entering edit mode

8 weeks ago

noorehuma • 0

I'm using Kneaddata to perform quality control on my fastq files. My samples have contamination from multiple genomes so I passed multiple reference-db in my kneaddata command (used 3 genomes). As a result I got 3 different clean outputs which I concatenated. After running kraken2 I saw that my recovered reads were 200% more than the raw reads.

One possible explanation is that I concatenated clean outputs from three different genomes and it resulted in duplicated reads from non contaminated reads.

Do you have any pointers as to how I should deal with multiple contamination in my raw sample or how to remove duplicate reads.

Thank you so much.

multiple duplicate genomes kneaddata reads • 266 views

ADD COMMENT • link updated 8 weeks ago by GenoMax 148k • written 8 weeks ago by noorehuma • 0

0

Entering edit mode

how to remove duplicate reads.

dedupe.sh from BBMap suite should help there. Something like

dedupe.sh -Xmx5g in=input.fq.gz out=output.fq.gz

ADD REPLY • link 8 weeks ago by GenoMax 148k

Login before adding your answer.