Identifying primer sequences from raw FASTQ files
0
0
Entering edit mode
5 hours ago
Chandini • 0

Hello everyone.

I am trying to automate a 16s metagenome analysis workflow such that the user needs to provide nothing but the fastq files to the workflow. The analysis requires primer sequences for cutadapt, qiime2 classification and its downstream analysis. I want to automate this process so hopefully the user need not provide this in the input. The experiment is 16S rRNA amplicon sequencing using Illumina.

Does anyone know of any tools that can identify a forward and reverse primer sequence (consensus building with IUPAC codes) from FASTQ files, when even the primer sequence lengths are not known?

primers • 73 views
ADD COMMENT
1
Entering edit mode

Running a tool such as fastqc would that work? should give you an indication of adapters/overrepresented seqs/ ....

alternatively many of the read trimming tools have built-in "adapter" recognition functionality, perhaps they can also fish out the primer sequences?

ADD REPLY
0
Entering edit mode

You can start with a list of known primer-pairs (e.g. wikipedia and PMC8544895) and your FastQ files with each combination. You can store the cutadapt output in region-specific folders (like v3v4, v6-v8 ) and store those reads not having this pair in a temporary folder (using the --untrimmed-output and --untrimmed-paired-output option ) in order to use the untrimmed pairs as input for the next known primer pairs.

If there is a substantial amount of reads not having any of the known primers, you can investigate manually and then add your findings to the list of primers.

ADD REPLY

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6