Entering edit mode
5 hours ago
Chandini
•
0
Hello everyone.
I am trying to automate a 16s metagenome analysis workflow such that the user needs to provide nothing but the fastq files to the workflow. The analysis requires primer sequences for cutadapt, qiime2 classification and its downstream analysis. I want to automate this process so hopefully the user need not provide this in the input. The experiment is 16S rRNA amplicon sequencing using Illumina.
Does anyone know of any tools that can identify a forward and reverse primer sequence (consensus building with IUPAC codes) from FASTQ files, when even the primer sequence lengths are not known?
Running a tool such as fastqc would that work? should give you an indication of adapters/overrepresented seqs/ ....
alternatively many of the read trimming tools have built-in "adapter" recognition functionality, perhaps they can also fish out the primer sequences?
You can start with a list of known primer-pairs (e.g. wikipedia and PMC8544895) and your FastQ files with each combination. You can store the cutadapt output in region-specific folders (like v3v4, v6-v8 ) and store those reads not having this pair in a temporary folder (using the
--untrimmed-output
and--untrimmed-paired-output
option ) in order to use the untrimmed pairs as input for the next known primer pairs.If there is a substantial amount of reads not having any of the known primers, you can investigate manually and then add your findings to the list of primers.