Clumpify : issues with consensus mode (reverse complement of R1)
1
1
Entering edit mode
7.4 years ago

Hi,

I tried clumpify to extract a consensus sequence from a fastq file. To test clumpify I first tested it with a test fastq file containing two perfectly similar seqence :

R1:

@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/1
CTCTAGCGGCCAGGAGAGACCGGCAAACAATTGGGGGCTCGTCCGGGATTGATCACCCCGGAACCCTAACAATCCTCTGGACCCACCCCCTCGGCGGCGTTT
+
FFFFFFFGGGGGGGGGGGGHGGGGGGHHHHHHHHHGGGFHGGGGGGGGGHGHHHHHHHGGGGGGGGHHHHHHHHHHHHHHHHHGHGGGGGGGGGGGGGGGGG
@M00991:78:000000000-AP8FW:1:1107:22337:16138_CAGAAGTA/1
CTCTAGCGGCCAGGAGAGACCGGCAAACAATTGGGGGCTCGTCCGGGATTGATCACCCCGGAACCCTAACAATCCTCTGGACCCACCCCCTCGGCGGCGTTT
+
BFFFFFFGGGGGGGGGGGGHGGGGGGHHHHHHHHHGGGGHGGGGGGGGGHHHHHHHHHGGGGGGGGHHHHHEHHHHHHHHHHHGHGGGGGGGGGGGGGGGGG

R2:

@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGFGGGGGHHHHHGHHHHHHHHHHHGHHGGGGEEHHHHHHHFHGGGGGGGGGGGGGGHHHFHHHHHGGGGGGHHHHHHHHGGGGGHH
@M00991:78:000000000-AP8FW:1:1107:22337:16138_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGEGGGGHHHHHHGHHHHHHHHHHHHHHGGGGEGFHHHHGHHHGGGGGGGGGGGGGGHHHHHHHHHGGGGGHHGHHHHHHGGGCGHH

Then I perform clumpify on it :

clumpify.sh qin=33 in=R1.fastq in2=R2.fastq out=R1.dedup.fastq out2=R2.dedup.fastq dedupe=t dupesubs=0 consensus=t

Here is the results:

R1:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/1
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
GGGGGGGGGGGGGGGGGHGHHHHHHHHHHHHHHHHHGGGGGGGGHHHHHHHGHGGGGGGGGGHFGGGHHHHHHHHHGGGGGGHGGGGGGGGGGGGFFFFFFF

R2:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGFGGGGGHHHHHGHHHHHHHHHHHGHHGGGGEEHHHHHHHFHGGGGGGGGGGGGGGHHHFHHHHHGGGGGGHHHHHHHHGGGGGHH

So R2 is ok but R1 seems to be reverse complemented. Why ? Is there a paramter that I miss ?

Thanks

clumpify consensus • 2.0k views
ADD COMMENT
1
Entering edit mode

You seem to have found a bug. Irrespective of what rcomp= is set to (t/f) the final result seems to be the same as yours. I tried using separate R1/R2 files and interleaving the reads. It only seems to be happening in consensus=t mode.

ADD REPLY
0
Entering edit mode

Tagging: Brian Bushnell

ADD REPLY
2
Entering edit mode
7.4 years ago

Clumpify can only currently produce consensus of clumps, not of sets of duplicate reads or read pairs. Thus, it is currently impossible to do what many people want to do, which is "deduplicate my reads, but instead of keeping the best representative pair of each set of duplicates, create a consensus from each set of duplicates".

The consensus operation was written prior to deduplication, and for a different goal, which was genome assembly - once reads are formed into clumps (assuming unpaired reads), each clump is flattened into a single consensus sequence that spans multiple overlapping reads (and thus is usually longer than any single read). Adding the "consensus" flag automatically sets rcomp to true to minimize the number of clumps (which fit my original goal, but I will examine changing that). After the consensus operation the original orientation is lost because presumably multiple reads of different orientations went into the clump.

So, unfortunately, you can't currently use Clumpify the way you want to use it. I may put something in to catch the combination of consensus+dedupe or consensus+paired reads and exit with a warning, because those are not really supported right now. But I do plan to add the ability to produce consensus output from sets of duplicate reads at some point.

ADD COMMENT
0
Entering edit mode

Thanks Brian for the answer. I'll try to write my own function. It would not be to complicated as my group of reads to "consensus" have the same length and should be very similar (max 3 mismatch). I'll post my answer when it's ready.

Maybe you should specify this feature (consensus of clumps and not reads) in clumpify's help. BTW nice tool ;)

ADD REPLY

Login before adding your answer.

Traffic: 2071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6