Hi,
I tried clumpify to extract a consensus sequence from a fastq file. To test clumpify I first tested it with a test fastq file containing two perfectly similar seqence :
R1:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/1
CTCTAGCGGCCAGGAGAGACCGGCAAACAATTGGGGGCTCGTCCGGGATTGATCACCCCGGAACCCTAACAATCCTCTGGACCCACCCCCTCGGCGGCGTTT
+
FFFFFFFGGGGGGGGGGGGHGGGGGGHHHHHHHHHGGGFHGGGGGGGGGHGHHHHHHHGGGGGGGGHHHHHHHHHHHHHHHHHGHGGGGGGGGGGGGGGGGG
@M00991:78:000000000-AP8FW:1:1107:22337:16138_CAGAAGTA/1
CTCTAGCGGCCAGGAGAGACCGGCAAACAATTGGGGGCTCGTCCGGGATTGATCACCCCGGAACCCTAACAATCCTCTGGACCCACCCCCTCGGCGGCGTTT
+
BFFFFFFGGGGGGGGGGGGHGGGGGGHHHHHHHHHGGGGHGGGGGGGGGHHHHHHHHHGGGGGGGGHHHHHEHHHHHHHHHHHGHGGGGGGGGGGGGGGGGG
R2:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGFGGGGGHHHHHGHHHHHHHHHHHGHHGGGGEEHHHHHHHFHGGGGGGGGGGGGGGHHHFHHHHHGGGGGGHHHHHHHHGGGGGHH
@M00991:78:000000000-AP8FW:1:1107:22337:16138_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGEGGGGHHHHHHGHHHHHHHHHHHHHHGGGGEGFHHHHGHHHGGGGGGGGGGGGGGHHHHHHHHHGGGGGHHGHHHHHHGGGCGHH
Then I perform clumpify on it :
clumpify.sh qin=33 in=R1.fastq in2=R2.fastq out=R1.dedup.fastq out2=R2.dedup.fastq dedupe=t dupesubs=0 consensus=t
Here is the results:
R1:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/1
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
GGGGGGGGGGGGGGGGGHGHHHHHHHHHHHHHHHHHGGGGGGGGHHHHHHHGHGGGGGGGGGHFGGGHHHHHHHHHGGGGGGHGGGGGGGGGGGGFFFFFFF
R2:
@M00991:78:000000000-AP8FW:1:1109:25120:10273_CAGAAGTA/2
AAACGCCGCCGAGGGGGTGGGTCCAGAGGATTGTTAGGGTTCCGGGGTGATCAATCCCGGACGAGCCCCCAATTGTTTGCCGGTCTCTCCTGGCCGCTAGAG
+
HHHHGGGGGGGGGGGGGFGGGGGHHHHHGHHHHHHHHHHHGHHGGGGEEHHHHHHHFHGGGGGGGGGGGGGGHHHFHHHHHGGGGGGHHHHHHHHGGGGGHH
So R2 is ok but R1 seems to be reverse complemented. Why ? Is there a paramter that I miss ?
Thanks
You seem to have found a bug. Irrespective of what
rcomp=
is set to (t/f
) the final result seems to be the same as yours. I tried using separate R1/R2 files and interleaving the reads. It only seems to be happening inconsensus=t
mode.Tagging: Brian Bushnell