Hello to the Galaxy community,
I was wondering what is the quickest and the simplest way of extracting fastq unique reads from two fastq files.
What I have is: 2 fastq files with sequences and their quality scores What I want: one fastq file that has only the unique reads that are seen in the first fastq file, but not the second.
What would be the way to go around it? Both of the files have 42M reads each.
Thank you in advance for all of the help.
Erika
This is Biostars community :-)
Dedupe.sh from BBMap package.
I second @genomax2's recommendation for dedup.sh. I totally forgot that it's a sequence-based (as opposed to alignment-based) deduplicator. I would definitely try this tool first.
Thanks to all!
harold.smith.tarheel, your explanation was very detailed and helpful, I'll see what I can get done with the data that I am working with.
Please use
ADD COMMENT/ADD REPLY
when responding to existing comments. This keeps threads logically organized.