Entering edit mode
5.8 years ago
Farah
▴
80
Hello,
I performed trimmomatic for two pairs of my *sortmerna_1.fq.gz and *sortmerna_2.fq.gz files as follows:
cd /scratch/fs/trimmomatic
find /scratch/fs/sortmerna -name "*sortmerna_[12].fq.gz" | sort | head -n 4 | while read FW_READ
do
read RV_READ
FILEBASE=$(basename "${FW_READ%_1.fq.gz}")
trimmomatic PE -threads 16 -phred64 "$FW_READ" "$RV_READ" \
"$FILEBASE-trimmomatic_1.fq.gz" "$FILEBASE-trimmomatic-unpaired_1.fq.gz" \
"$FILEBASE-trimmomatic_2.fq.gz" "$FILEBASE-trimmomatic-unpaired_2.fq.gz" \
ILLUMINACLIP:"/home/local/software/biobuilds/2017.11/libexec/trimmomatic-0.36/adapters/TruSeq3-PE-2.fa":2:30:10 \
SLIDINGWINDOW:5:20 MINLEN:50 2> "$FILEBASE-timmomatic.log"
done
It produced 10 files (each file with size of 1) including *-sortmerna-timmomatic.log , *-sortmerna-trimmomatic-unpaired_1.fq.gz , *-sortmerna-trimmomatic-unpaired_2.fq.gz , *-sortmerna-trimmomatic_1.fq.gz, *-sortmerna-trimmomatic_2.fq.gz for each pair.
But, when I performed fastqc, as below, on the produced trimmomatic files, It failed to do that.
fastqc -o /scratch/fs/qa/trimmomatic -t 4 /scratch/fs/trimmomatic/*trimmomatic_[12].fq.gz
Failure message is as bellow:
Started analysis of TM_17-sortmerna-trimmomatic_1.fq.gz
Analysis complete for TM _17-sortmerna-trimmomatic_1.fq.gz
Started analysis of TM_17-sortmerna-trimmomatic_2.fq.gz
Analysis complete for TM_17-sortmerna-trimmomatic_2.fq.gz
Failed to process file TM_17-sortmerna-trimmomatic_1.fq.gz
Failed to process file TM_17-sortmerna-trimmomatic_2.fq.gz
java.lang.ArrayIndexOutOfBoundsException: -1
at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:100)
at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:184)
at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
at java.lang.Thread.run(Thread.java:745)
Would you please advise me what I did wrong? and how to fix it? Thank you very much.
Is that 1 byte? That means you trimming did not work at all.
Yes, each file is 1 kb. So, if the trimming did not work, would you please let me know why it did not work? what I did wrong in that scripts? Thanks a lot.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original question.As for why the trimming did not work I suggest you start looking into the log file
$FILEBASE-timmomatic.log
for clues first.Note: If you have data of recent vintage then it most certainly is not going to be
-phred64
. It should bephread+33
.Sure. Thanks. As you suggested, I first looked into $FILEBASE-timmomatic.log which is showing the following:
From the above, I do not know what is the problem. Also, the data was obtained on 2015. I am not sure whether I should use -phred64 or phread+33.
May I kindly ask for your advise. Thank you.
It looks like all of your reads are getting discarded. Are they shorter than 50 bp? If they are then you should remove
MINLEN:50
as well. You can also remove-phred64
while you are at it.I will also recommend that you try
bbduk.sh
from BBMap suite, as an alternative (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/ ) . Simple to understand options, easy to use and fast.Thanks. The sequence length is 101 bp.
Run without
-phred64
and let us know what happens. It would be highly unlikely that your entire dataset only has adapter dimers etc.Thank you for your advise. I ran without -phred64. It could finish the analysis only for one pair of my *18-sortmerna_1.fq.gz and *18-sortmerna_2.fq.gz files. While another pair remains without trimmomatic.
I then ran fastqc to check trimmomatic output. It looks good in "per base sequence quality", but now basic statistic section of fastqc shows the sequence length of 50-101 bp, while for my raw fq.gz data and also sortmerna-filtered data it is 101 bp. Is it a potential problem or it is normal?
Also, FastQC shows Encoding of my reads as: Sanger / Illumina 1.9. Which I do not know it should be -phred64 or phred+33.
At this step and condition, would you please guide me how can I improve it to do trimmomatic for all of my files (and not only for one pair).
Many thanks for your help.
Does it mean there is an error or there is no output?
That is expected because now your reads have been trimmed so they are no longer all of equal original length. Since you had
minlen:50
in there reads that went below that size were dropped from output.So
-phred33
is correct encoding as I had expected.Thanks a lot for your great help. The size of my standard error output (stderr) from my job submission is zero. So, It did not produce any error. But. it just successfully performed trimmomatic for one pair of my fq.gz files, and there is no output for another pair (I mean, in the output file, there are only five generated files which all belong to the first pair of my files), while the scripts is a loop that should perform it for all of my fq.gz files.
Many thanks to genomax for the helpful guide. I could successfully run trimmomatic for all of my samples. Thanks again.
Try running your loop with
echo
in front of trimmomatic. It will show you how the command would look like. See if that's correct.As you suggested, I put echo in front of trimmomatic as below:
....
But it just produced log files, no other files.