Entering edit mode
2.8 years ago
pavelasquezv
▴
50
Hi all,
I would like to perform quality control on multiple fastq folders using trimmomatic. I am going to work with thousands of SRA files that I will download from NCBI. However, each experiment may have used specific adapters according to the equipment. Therefore, I need to specify the adapter type for each experiment in the ILLUMINACLIP
parameter, which doesn't let me loop to trim all the SRA files in one loop. Do you have any suggestions to run the trimmomatic with all SRAs checking any adapter?
This is the code:
cat listaSE | while read l; do prefetch -p $l; \
fasterq-dump $l -p -e $NSLOTS; \
java -jar /Storage/progs/Trimmomatic-0.38/trimmomatic-0.38.jar SE -threads $NSLOTS \
$l*fastq \
$l\_SR.fastq \
HEADCROP:12 ILLUMINACLIP:TruSeq3-SE:2:30:10 \
SLIDINGWINDOW:5:20 LEADING:3 TRAILING:3 MINLEN:40;
done
You could use an adapter file that contains multiple commercial adapters. While there is some chance that there may be a bit of over-trimming it should be negligible.
bbduk.sh
includes such a file inresources
directory that you could use here.Thanks for your answer GenoMax. Is there no way to do this with the trimmomatic program?
Not unless you can find a way to include which adapter file you want to use for each sample in your
listaSE
file since that is the only input for your script. If you only haveSRR*
accessions then they alone won't tell you about which adapter to scan for.Hi GenoMax, I hope you are well. I am trying to trim with
bbduk.sh
but I think I have a problem with the adapters. Do you have a suggestion to remove the adapters or trim the first and last 12 base pairs?Many thanks!
This is the code:
You do not need to trim the initial 12 bp. That pattern is commonly seen in RNAseq libraries due to random primers not being so random. There is a blog post from authors of FastQC on this topic: https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/ (NOTE: SSL cert on the site has expired once again but that site is safe, I pinged author of FastQC to fix the SSL cert, this should be done by tomorrow, if you want to wait until then to visit the link above).
Thanks a lot my friend! So I can continue safer. You are the best
GenoMax
! Kind regards, AlexAs a remark, if you are going to process such large amounts of samples do yourself a favor and spend time learning a workflow manager such as Nextflow or Snakemake first that allows caching and resuming pipeline operations. Likely the pipeline you build does not only trimming but also much more, and for large sample size you need some parallelization, e.g. on a cluster and for these the managers are tailored plus it makes pipelines more robust and feasable.
Many thanks for your suggestion ATpoint!
Hi ATpoint I have very interested in using a Nextflow or Snakemake but i don´t have any idea how can I do that. I am a new to bioinformatics. Please, do you have any idea how to build a pipeline in Nextflow with the following script?
Sorry, but I am not going to do your coding for you. Both Nextflow and Snakemake have lots of tutorial material at their websites that you can use to learn from.
Sorry, my friend, I am trying but is very difficult for me. But I will get it. Many thanks again for your suggestion! All the best!