Trimmomatic/ linux system
1
0
Entering edit mode
2.8 years ago
Pegasus ▴ 120

Hi all,

I am trying to remove adapters and clean my RNA-seq.gz files using Trimmomatic, loaded on a Linux system (supercomputer server)

Following the steps for Pair ends reads, explained in the manual (http://www.usadellab.org/cms/?page=trimmomatic)

java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36

Since Trimmomatic figured out the argument, I assume Trimoomatic is loaded successfully, before starting the process!

TrimmomaticPE: Started with arguments:
 T1R1-f.fastq.gz T1R1-r.fastq.gz T1R1-f_paired.fq.gz T1R1-f_unpaired.fq.gz T1R1-r_paired.fq.gz T1R1-r_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36
java.io.FileNotFoundException: /*****/project/****/md/RNA/rna_data-bacteria/TruSeq3-PE.fa (No such file or directory)
    at java.base/java.io.FileInputStream.open0(Native Method)
    at java.base/java.io.FileInputStream.open(FileInputStream.java:213)
    at java.base/java.io.FileInputStream.<init>(FileInputStream.java:155)
    at org.usadellab.trimmomatic.fasta.FastaParser.parse(FastaParser.java:54)
    at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.loadSequences(IlluminaClippingTrimmer.java:110)
    at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.makeIlluminaClippingTrimmer(IlluminaClippingTrimmer.java:71)
    at org.usadellab.trimmomatic.trim.TrimmerFactory.makeTrimmer(TrimmerFactory.java:32)
    at org.usadellab.trimmomatic.Trimmomatic.createTrimmers(Trimmomatic.java:59)
    at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:552)
    at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)
Quality encoding detected as phred33

then no progress happens, only empty space.

Thank you in advance

Illumina sequencing RNA-Seq Trimmomatic • 4.3k views
ADD COMMENT
4
Entering edit mode
2.8 years ago
GenoMax 148k

java.io.FileNotFoundException: /*/project/**/md/RNA/rna_data-bacteria/TruSeq3-PE.fa (No such file or directory)

I assume you redacted the actual names using * but the program is not able to find the adapter sequences files at that location. Provide the actual full path in your command, if you know where that file is.

for example

ILLUMINACLIP:/this_is_dir/with_the_file/TruSeq3-PE.fa:2:30:10:2:True
ADD COMMENT
0
Entering edit mode

Hi GenoMax

Thank you for your answer,

I have two groups of RNA *.gz files, as fastqc showed, before trimming;

1- with Nextera adapters

2-with Illumina Universal adapters

The example above is for those from group 1.

I uploaded the Nextera-adapters-fa file into the same directory where I have my *.gz files, then start this argument;

$ java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.39.jar PE T1R2-F.fastq.gz T1R2-R.fastq.gz T1R2-F_paired.fq.gz T1R2F_unpaired.fq.gz T1R2-R_paired.fq.gz T1R2-R_unpaired.fq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
TrimmomaticPE: Started with arguments:
 T1R2-F.fastq.gz T1R2-R.fastq.gz T1R2-F_paired.fq.gz T1R2F_unpaired.fq.gz T1R2-R_paired.fq.gz T1R2-R_unpaired.fq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36

Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33.
  • However, this time terminal showed nothing after phred33, so I should exit to try the process again and again

Thank you

ADD REPLY
1
Entering edit mode

If the program is running i.e. your system prompt has not returned then let the trimming complete. It could take some time depending on the size of your files. You can open a new terminal (log in to remote server, if needed) and go to this directory. Listing should show growing files with names you chose above. A process monitoring tool like top or htop should show trimmomatic process actively running, consuming CPU cycles.

ADD REPLY
0
Entering edit mode

Many thanks, GenoMax, yes it was a matter of time,

# I got this result, after 10 min,

 Input Read Pairs: 17587061 Both Surviving: 17490730 (99.45%) Forward Only Surviving: 429 (0.00%) Reverse Only Surviving: 1981 (0.01%) Dropped: 93921 (0.53%)
TrimmomaticPE: Completed successfully
ADD REPLY
0
Entering edit mode

GenoMax . However, as I know, supercomputers (servers) are very fast, so is it normal that some bacterial RNA-seq gz.* files , took more than 1 h to be processed? is there any modification on the command line that can accelerate the trming process!

ADD REPLY
0
Entering edit mode

Hi GenoMax,

Sorry for my late reply. Actually it was a problem in the sever itself as I received a notification later.

ADD REPLY

Login before adding your answer.

Traffic: 1620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6