Entering edit mode
2.9 years ago
Wendy Lorena
•
0
I used Trimmomatic, and referenced in the command a file that contained all known adapter sequences for Illumina, including these for small_rna:
>Illumina Small RNA v1.5 3p Adapter
ATCTCGTATGCCGTCTTCTGCTTG
>Illumina RNA 3p Adapter (RA3)
TGGAATTCTCGGGTGCCAAGG
>Illumina RNA 5p Adapter (RA5)
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina 3p RNA Adapter
TCGTATGCCGTCTTCTGCTTGT
The command removed all the adapters for me except the Illumina_small_rna_3'_adapter. Please could someone help me to fix this problem?
Please provide some additional information. For instance, why do you write that the Illumina_small_rna_3'_adapters were not trimmed ? Was this specific adapter used during library preparation ? Is it contaminating your samples ? Have you used FASTQC or similar to assess that ?
Hello Carlo, after trimming I did a quality control with FastQc and in the content of adapters they indicate that there is presence of Illumina_Small_Rna_3 '. For this reason I assume that they were not trimmed, even though when I ran Trimmomatic I gave them the sequence that in the Illumina manual corresponds to Small_Rna_3´.
If this is small RNA data then having ~275 bp reads is very odd. Generally one would need only 50 bp reads.
They are amplicons of the V4 region of the 16S gene.
Are you following a standard protocol for 16S or trying to roll something of your own? I am not sure why small RNA adapters are involved in 16S amplicons.
I'm a bit confused too because I have never seen such a adapter contamination pattern: the small RNA 3' adapter is 25nt-long, yet covers about 125nt in the read... Just thinking out loud here, but could it be that serial ligation happened during adapter ligation because of a lack of ddC-3' (that blocks self ligation) on the adapter ?
But perhaps the explanation is more trivial. Can you show the code you used for trimming the reads ?
Clear. The command I used was this:
The adapters.fa file contained all known Illumina adapters, (including that of small_rna_3´).
In other words, it could have been an error in the sequencing process?
No I don't think the sequencing itself is a a problem, but there is definitely something weird in those results... Perhaps you could check with whoever provided with the data or made the sequencing libraries whether the adapter contamination makes sense ?
Keeping this issue in mind, perhaps it is possible to find a workaround. What is the next step in your analysis ? Assembly or mapping ? Because it is probably safe to skip adapter trimming before mapping (but not before assembly !), since most modern aligner use soft-clipping and can ignore parts of the read (i.e., adapters) that do not map on the reference.
The next step is the elimination of the wrongly named bases, specifically the "N" ones. Could you help me with a command or program to help me achieve this?
Thank you!
It would not be advisable to remove
N
calls in the middle of reads. If you have too many of them then there may be an issue with sequencing. Otherwise aligners may be able to handle them during normal course of alignment by treating them as a mismatch.