Question

Is there a way to identify replicates and retain their indices in fastq records?

0

Entering edit mode

4.7 years ago

scolobs • 0

I have used Shortread and BiostrinG to read a fastq file, and used tables() to identify duplicates. However, it's not possible to recover the indices of the reads, which I would need for further analysis. This is what I have tried:

tb = tables( readFastq(fastqfile) )

Trying to run a *apply loop would take forever to finish.

Thanks for any pointers

fastq Biostring Shortread • 621 views

ADD COMMENT • link updated 4.7 years ago by GenoMax 153k • written 4.7 years ago by scolobs • 0

score 1 · Answer 1 · 2020-11-16

1

Entering edit mode

4.7 years ago

GenoMax 153k

You can use clumpify.sh from BBMap suite to mark sequence duplicates. See in-line help and this post for more information: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD COMMENT • link 4.7 years ago by GenoMax 153k