Duplicate reads due to multiple db in kneaddata
0
0
Entering edit mode
8 weeks ago
noorehuma • 0

I'm using Kneaddata to perform quality control on my fastq files. My samples have contamination from multiple genomes so I passed multiple reference-db in my kneaddata command (used 3 genomes). As a result I got 3 different clean outputs which I concatenated. After running kraken2 I saw that my recovered reads were 200% more than the raw reads.

One possible explanation is that I concatenated clean outputs from three different genomes and it resulted in duplicated reads from non contaminated reads.

Do you have any pointers as to how I should deal with multiple contamination in my raw sample or how to remove duplicate reads.

Thank you so much.

multiple duplicate genomes kneaddata reads • 266 views
ADD COMMENT
0
Entering edit mode

how to remove duplicate reads.

dedupe.sh from BBMap suite should help there. Something like

dedupe.sh -Xmx5g in=input.fq.gz out=output.fq.gz 
ADD REPLY

Login before adding your answer.

Traffic: 1641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6