Hi community!
I have paired-ended amplicon sequences for a batch of samples, with very little overlap (<10%).
Conceptually, I was wondering if it makes sense to join the forward and reverse read to generate a single read for downstream processing, instead of interleaving/merging them to get the overlapping sequences, since that isn't the best solution in this particular case?
or perhaps, concatenating the R1 and R2 to read it in as a single read?
Thanks!
Or perhaps just keeping them as two separate paired-end reads? Why would you want to merge or join them?
The idea is to call OTUs on them, so I'm trying to figure out what the best way is to make use of the forward and reverse reads since the overlap is minimal. For now, I'm leaning more towards just using the forward reads, since their quality is pretty okay in comparison, but I was just wondering, if conceptually, it made sense to join the two?
No it would not make sense in my opinion to just concatenate the forward and reverse reads. That has to do with the downstream analyses. If you blast there is a change that you don't get the right biological hit which is a must in this kind of studies. Did you already tried to merge them and see how good or bad it is?
Thanks, that makes sense and is on the lines of what I was thinking! If by merging the reads, you mean, checking to see the overlap, then yes, I already did that and it's minimal. Haven't tried joining them yet.
Check out 'PANDAseq'.
"PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence."
You can also find many other similar tools on the web.