Hi there,
I am working on the analysis of 16s RNA datasets using mothur toolkit.
In the fastq files, the illumina adaptor overhang sequences have been eliminated but primer sequences are still present.
Would you like to provide some advice on whether the primer sequences should be removed from the fastq files before creating contigs?
Is there a need to take into account the removal of primer sequences in the process of creating the customized reference alignment that will be used by align.seqs?
Many thanks,
Tom
I'd agree on that, however my reasoning is a bit different. (Traditionally) you target the conserved regions, so there's close to no variation in that part of the sequence anyway (except the designed ambiguities). What you don't know is how much of the sample you didn't amplify because your primers didn't match in the first place
Out of curiosity, is this personal experience or can you link to some systematic observations?
Yes they are also non informative because they are all the same anyways. With the primer, you target most of the times like you said a conserved region, but even that it is conserved there are differences (that you can not solve with ambiguities). In other words, the primer is not exactly the same as that conserved region. A mismatch is allowed and it still binds because of the annealing temp. When you sequence the product you sequence the primer seq and not that conserved region. In practice that reason does not matter much because the fact that they are all the same is reason enough to trim them off but I just wanted to try to explain.
I think if you need a reference you can use this paper
Rita Sipos, Anna J. Székely, Márton Palatinszky, Sára Révész, Károly Márialigeti, Marcell Nikolausz, Effect of primer mismatch, annealing temperature and PCR cycle number on 16S rRNA gene-targetting bacterial community analysis, FEMS Microbiology Ecology, Volume 60, Issue 2, May 2007, Pages 341–350, https://doi.org/10.1111/j.1574-6941.2007.00283.x
https://academic.oup.com/femsec/article/60/2/341/584515
Awesome, thank you - I learned something!
Thanks for this comment.
Further to this question, after removing the 5'-primers for both R1 and R2 reads from fastq files and assembling the reads from fastq files together, it is seen that the primer-like sequences appear in the 5'-end and 3'-end of the merged contigs, which is because it has sequenced through to the other end of the targeted regions of 16s rRNA.
Is there any need to continue to remove the primer-like sequences in the fasta files? Or it is OK to leave them there?
Thanks,
You need to remove both primers