A bit of back ground - I have been given thousands of reads in FASTQ format containing sequences (16S and 18S ITS) sequenced by ILLUMINA. The adapter indices have already been removed.
The plan is to separate bacterial and fungal sequences and then BLAST them to determine community composition. Although the adapter indices arent present, the Euk sequences should contain an Illumina bottom primer sequence as well as the ITSF and ITS2 primer sequences in both orientations.
What would be a method to extract sequences which contain these sequences (taking into account that some of these reads are not exact leading to a degree of error).
Any help would be greatly appreciated!
I would use the SIlvaNGS pipeline and sort out Bacterial and Euk seqs. Just an idea.
Just thinking out aloud. These are not the solutions: