Hi! I used cutadapt on my PE reads, and then FASTX to collapse duplicate reads in my data. Then I proceeded to do bowtie2 - turns out that there was a different no. of reads in R1 than R2 so bowtie could not run at all! I assume this was not do to FASTX, but rather to cutadapt (I specified to remove reads that became too short after trimming). so my questions are:
- Am I right in my assumption regarding cutadapt?
- Is there any way to make the reads appear even again, without hurting the data? (I was not sure if just adding reads would be OK or not).
Thanks for the help!
If you are happy with the end result then you could use
repair.sh
from BBMap to re-sync your files (C: Calculating number of reads for paired end reads? ).Errr.... Why would it not be fastx_collapser? If one of a pair is duplicate and the other isn't, you'll lose a read from one file but not the other....
Cutadapt has a paired end mode. As far as I can tell fastx_collapser doesn't.
While fastx-collapser doesn't seem to support paired end processing, the tally tool apparently does the same thing and does accept paired data.
If we are talking about other read de-duplicators then
dedupe.sh
from BBMap would be a great candidate as well.