Is there a way to identify replicates and retain their indices in fastq records?
1
0
Entering edit mode
4.2 years ago
scolobs • 0

I have used Shortread and BiostrinG to read a fastq file, and used tables() to identify duplicates. However, it's not possible to recover the indices of the reads, which I would need for further analysis. This is what I have tried:

tb = tables( readFastq(fastqfile) )

Trying to run a *apply loop would take forever to finish.

Thanks for any pointers

fastq Biostring Shortread • 545 views
ADD COMMENT
1
Entering edit mode
4.2 years ago
GenoMax 148k

You can use clumpify.sh from BBMap suite to mark sequence duplicates. See in-line help and this post for more information: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD COMMENT

Login before adding your answer.

Traffic: 3574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6