I am supposed to analyze some metabarcoding reads. However, the forward and reverse reads are unable to be merged due to lack of overlap. I was informed that this is because the sequence was too long, so the forward and reverse sequences couldn't extend far enough to overlap adequately. My question is, what would be the problem of using the forward reads alone, as if they were single-end? My first thought is that they are too short to be a legitimate barcode sequence for identifying taxa. But I'm not sure. It's COI. I gather from Meusnier et al. 2008 that a 95% success rate of species identification was obtained with 250-bp mini barcodes. My forward sequences are that long. But this region by itself has not been tested for specificity. How does this impact its reliability? Thank you for your input.
Thank you. Those are some really good ideas. I would guess then, that the fact that my sequences are shorter than the full mini-barcode, would be reflected in a higher BLAST e-value? But as long as that's also below the cutoff, then it's still fine.
No worries. yes = ) If using only read one, You would also expect that the full sequence is matching the reference database since its amplicon data (with the exception of chimeras). Since all sequences with illumina are sequenced to the same length, you can also use standard metabarcoding pipelines, for processing read one or two. The same goes for concatenating sequences, when Ns are inserted, however, check how the algorithm is dealing with ambiguous bases (most should have trouble with it I would assume).