Hi,
I am attempting to trim Illumina RNA-seq data (paired-end) I downloaded from NCBI in SRA format. I have converted the .sra file into two .fastq using fastQ-dump. I then FastQC'd both files, which indicated there were adapters/primers present:
#Sequence Count Percentage Possible Source
CGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCG 1108938 10.945848187304692 Illumina Paired End PCR Primer 2 (100% over 31bp)
CAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACC 85099 0.839975485456754 Illumina Paired End PCR Primer 2 (100% over 36bp)
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGT 29724 0.2933927699469625 Illumina Paired End PCR Primer 2 (100% over 50bp)
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATATCGTATGCCGT 23964 0.23653829696571824 Illumina Paired End PCR Primer 2 (98% over 50bp)
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGAAGAGCGGTTCAG 10698 0.10559533888079009 Illumina Paired End PCR Primer 2 (97% over 36bp)
I have been told to use Trimmomatic, but I am struggling getting it to work. Could someone please guide as to how to only remove the above sequences (no quality trimming etc)? i.e Only run the ILLUMINACLIP process and how to specify a custom list of adapters/primers to be trimmed.
Also, will Trimmomatic work on 454 data? If not, what would be a suitable alternative?
This was my previous attempt:
java -jar /exports/software/trimmomatic/trimmomatic-0.32/trimmomatic-0.32.jar PE \
-threads 24 \
-phred33 \
-trimlog trimlog.txt \
SRR1505105_1.fastq SRR1505105_2.fastq \
output1P.fa output1U.fa output2P.fa output2U.fa \
ILLUMINACLIP:~/lewis_adapters.fa
This did not work and produced the following error:
TrimmomaticPE: Started with arguments: -threads 24 -phred33 -trimlog trimlog.txt SRR1505105_1.fastq SRR1505105_2.fastq output1P.fa output1U.fa output2P.fa output2U.fa ILLUMINACLIP:~/lstevens_adaptors.fa
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.makeIlluminaClippingTrimmer(IlluminaClippingTrimmer.java:53)
at org.usadellab.trimmomatic.trim.TrimmerFactory.makeTrimmer(TrimmerFactory.java:27)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:495)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:35)
Many thanks,
Lewis
Hello to anyone new to bioinformatics who is reading this to try and solve a problem trimming with trimmomatic. I wish I had read this post a week ago and saved myself many headaches. Please save yourself time and go use trim-galore (which uses cutadapt as part of the process) or just cutadapt (or another alternative like bbtrim, which is supposed to work well too). I did finally get trimmomatic to work (many hours and finally help from an IT person). It was a big waste because it runs VERY slowly (even on a high powered compute system) and it did not give consistent output. A lot of the adapters were not removed when I got the reads and ran them on fastqc afterwards.
I then ran everything using trim-galore (installed with miniconda 3 from the bioconda channel) and got VERY fast and very clean reads that were also trimmed properly for quality and already run through fastqc (you can choose that option easily). I also found out it is easy to add an additional unique adapter sequence to trim out at the same time, and I ran trim-galore on a different dataset (produced by a flavor of radseq) and was able to get rid of the regular illumina adapters and the unique adapter.