I am working with paried-end amplicon Illumina sequences using a marker gene. The gene are found in both bacteria and archaea, but the archaea gene is too long for the paired-end reads to overlap.
My goal is to analyze the composition of microbes with the chosen marker gene, so for my downstream analysis I have chosen to merge the reads from bacteria (as these do overlap) and use the archaea reads as single end. My approach is to use representation sequences from Fungene (https://github.com/rdpstaff/fungene_pipeline/tree/master/resources) and align the archaeal sequences to those, in order to identify them as archaeal sequences, as well as their orientation.
I was thinking of cutting a part (approx. 100 bp) of the representation sequences out from both ends, and then align equal lenght sequences from my data to those together with their complement sequences. Is this a feasible strategy?