Entering edit mode
2.9 years ago
nishimalhotra2612
▴
50
Does fastp remove duplicated reads or not because I have been trying to remove the duplicate reads by using -D
or --dedup
but it's not working so can someone tell me if it's still available?
That does not tell us in what way is it not working. How did you determine that?
As long as you are using
v.0.22
or above that feature should be available: https://github.com/OpenGene/fastp#deduplicationi used this command
fastp -i SRR13684098_1.fastq.gz -I SRR13684098_2.fastq.gz -o 098_1.fastq.gz -O 098_2.fastq.gz -D
and it shows undefined short option: -D
but mostly for enabling any feature we use this way only
sorry may be i am wrong also
what is your version of
fastp
?can you see the -D option in
fastp --help
fastp 0.23.2
no i can't see i think its the latest version and even in github there is an option for removing duplications.
Use
clumpify.sh
then: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.hey thanks for the suggestion but i figured it out as its rna-seq data there is no need to remove duplication as i might lose some important data
Does
clumpify.sh
remove duplicates based on UMIs, aka. keep consensus or the best sequence from a group of sequences with the same UMI?No.
clumpify.sh
only uses sequence read. If your UMI is still part of that sequence then it will dedupe those reads but you should not count on it using the UMI specifically.If you want to specifically use UMI, then try
umi-tools
orfastp
.