We are doing riboseq analysis on some fastq files. We are using Fastp for adapter identification and removal. But, we are facing some problems. fastp is able to remove the adapter sequences only if the adapter seq is given in the command line as an argument. It is not able to detect the adapter and remove it by itself and is giving the output as adapter is not detected. The sra file used, adapter seq and codes are as follows:
Thanks for your suggestion.
Actually all the reads in the file have a length of 51 nucleotides, hence i do not think that data has already been trimmed of the adapters. (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR810103)
Also i want to use the automatic adapter detection feature of fastp so that i can generalize my workflow for any ribo-seq data sets, therefore I want to use fastp so that it automatically detects and removes if any adapters are present.
I'm meeting the same problem. The paper didn't provide the adapter sequence.
Is there a effective tool to detect adapters automatically, could you help me solve this problem?
If you can find out what kit was used for preparing the libraries then it may be simple to use the adapter for that kit. Your data need not necessarily have adapter sequences so unless you are planning to do de novo assemblies you could let aligners take care of any adapter sequences by soft-clipping at time of alignment.
If it shows no adapter then there are none and you do not need any trimming. If (unikely) there is an adapter that fastqc does nt recognize then it should still show up as a overrepresented sequence. If there are neither overrepresented sequences nor detected adapters then the data are good to be aligned right away without further processing I'd say.
One way to probe the adapter content is to slice and group the ends of the reads. I do this often as a quick sanity check. It is a simple way to detect possible systematic contamination that starts at a give coordinate. Get a dataset (251bp long) reads:
fastq-dump -X 100000 SRR519926 --split-files
For example to see if there are 30 bp long common sequences starting at base 210 you could do a
A little eyeballing tells us the most sequences are overlapping and appear to be different substrings of a much longer adapter sequences. For a more "proper" solution, extract and align the ends to see if these sequences overlap.
Have you considered the possibility that the data is already trimmed. This may be especially true if the reads are not all equal length.
As an alternative you could try
bbduk.sh
from BBMap suite withliteral=your_adapter_seq
option. A guide for BBduk is available here.Thanks for your suggestion. Actually all the reads in the file have a length of 51 nucleotides, hence i do not think that data has already been trimmed of the adapters. (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR810103) Also i want to use the automatic adapter detection feature of fastp so that i can generalize my workflow for any ribo-seq data sets, therefore I want to use fastp so that it automatically detects and removes if any adapters are present.
you can try trim_galore, it will satisfiy.
Hi,
I'm meeting the same problem. The paper didn't provide the adapter sequence. Is there a effective tool to detect adapters automatically, could you help me solve this problem?
Thank you very much!
If you can find out what kit was used for preparing the libraries then it may be simple to use the adapter for that kit. Your data need not necessarily have adapter sequences so unless you are planning to do
de novo
assemblies you could let aligners take care of any adapter sequences by soft-clipping at time of alignment.Run fastqc to see whether and which adapter there is, then google for the sequence and provide it to the tool.
https://emea.support.illumina.com/bulletins/2016/12/what-sequences-do-i-use-for-adapter-trimming.html
Thanks! The fastqc result shows no adapter. fastqc didn't work.
If it shows no adapter then there are none and you do not need any trimming. If (unikely) there is an adapter that fastqc does nt recognize then it should still show up as a overrepresented sequence. If there are neither overrepresented sequences nor detected adapters then the data are good to be aligned right away without further processing I'd say.
fastp
does have an automatic adapter detection option (I believe this is its default behavior).