Hi all,
I am totally new in the bioinformatic analysis. I am working on a project that looks at DGE among different time treatments. Besides, there is no reference genome (meaning that I need a de novo assembly step).
So far, after struggling and navigating many website, I started using my terminal on MacBook and was able to do FastQC for the 118 reads (PE), I then followed this with MultiQC and both worked fine for me using the terminal. I am trying to go one step at a time. So I wanted to use trimmomatic to trim the reads but I failed!
I used the following command for one pair of reads and it worked but I do not know how to do this for the whole reads!
java -jar trimmomatic-0.39.jar PE -phred33 $i\1_R1_001.fastq.gz $i\1_R2_001.fastq.gz $i\1_R1_paired.fq.gz $i\1_R1_unpaired.fq.gz $i\1_R2_paired.fq.gz $i\1_R2_unpaired.fq.gz ILLUMINACLIP:contams_forward_rev.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
I tried adding * but it did not work.
java -jar trimmomatic-0.39.jar PE -phred33 -threads 4 /Users/mohamedhabib/Desktop/RNAseq/*_R1_001.fastq.gz *_R1_001.fastq.gz out_*_R1_001_P.fastq.gz out_*_R1_001_U.fastq.gz out_*_R2_001_P.fastq.gz *_R2_001_U.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:40:15
I will appreciate your help and advice on this specific topic and generally for further steps of analysis. Is there a template script (containing all the steps) that I can modify and use for my analysis?
Thanks
There is something weird about your commands: what are the
\
s doing after the variable names?I just copied it as it is from another thread and modified it to my own data
What thread did you copy it from, what changes did you make and what was the logic behind each of those changes?
here: Trimmomatic job script to run on multiple pair end read file
That answers 1 out of the 3 questions I asked.
I basically changed the file names to make sure this command is working. I picked one forward and reverse read pair and processed it as below: it worked but when I tried to trim all the samples (I thought maybe by adding a * to catch all the files!!) but it failed.
java -jar trimmomatic-0.39.jar PE -phred33 $i\1_R1_001.fastq.gz $i\1_R2_001.fastq.gz $i\1_R1_paired.fq.gz $i\1_R1_unpaired.fq.gz $i\1_R2_paired.fq.gz $i\1_R2_unpaired.fq.gz ILLUMINACLIP:contams_forward_rev.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
I hope that make sense because I am not an expert!
You need your command to run once per pair of files, not combine all pairs and then run just once. Read the thread you're trying to emulate and understand the loop part of it. Modify your code after you understand the loop.
I assume you are referring to 118 samlpes (and not reads). You can use a
for
loop to do this. See example here: Trimmomatic job script to run on multiple pair end read fileI used the following:
and I got:
Honestly, I think most RNASeq aligners are fairly robust to adapters. You could probably get away with not trimming at all
What names do your fastq files have? That first part of my answer will need to be adjusted to account for the differences in your file names so the
for
loop works right.Here is an example: 19144R-03-01_S75_L004_R2_001.fastq.gz
Use following:
If the commands look correct then remove
echo
and>> cmd_file
parts from the command to actually run the trimming.