Entering edit mode
3 days ago
HarperReed
•
0
Hello everyone, Should I use the dedup option when filtering my reads using Fastp while working on FASTQ files generated with Nanopore sequencing technology?
-D, --dedup enable deduplication to drop the duplicated reads/pairs
thank you in advance
Do you expect there to be sequence duplication because of the type of data (are these amplicon reads)? Depending on the length of your reads this could need significant memory/compute resources. Finally, what is the reason to want to do this?
it's amplicon sequencing of 1.5 to 3 kb. The primary aim is to detect variants.
You may want to try a workflow for this purpose provided by Nanopore: https://github.com/epi2me-labs/wf-amplicon
thank you !
But in my case, should I use this option or not? I don't really have time to test this new tool, especially since I'm required to use specific tools.
If tools you are planning to use include a step to mark duplicates after alignments then you don't need to do this upfront.
Like I said above, depending on length of your amplicons (and amount of data)
fastp
may need compute resources to do the deduplication. You could give it a try and see if it works with nanopore data.BTW; there is a
fastplong
version meant for long read data, but it does not have the-D
option implemented as of now (https://github.com/OpenGene/fastplong?tab=readme-ov-file#all-options ).