Hi all!
I downloaded some sra datasets using ascp as recommended (https://www.ncbi.nlm.nih.gov/books/NBK158899/) and additionally used fastq-dump on the downloaded sra-files
fastq-dump --gzip --split-files file.sra
and got two files (file_1.fastq.gz, file_2.fastq.gz), each 7.4 GB, as output.
First thing I did after this, was to check read quality with fastqc:
As you can see, the whiskers go down to ~15, so I wanted to discard the low quality reads, using trimmomatics. Actually, I was not sure whether the adapters were already removed from the reads or not, so I just added the standard ILLUMINACLIP and other options recommened to use:
java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz ../file_2_clean.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50
Multiple cores found: Using 4 threads Input Read Pairs: 73338588 Both Surviving: 53742129 (73,28%) Forward Only Surviving: 4076266 (5,56%) Reverse Only Surviving: 7872077 (10,73%) Dropped: 7648116 (10,43%) TrimmomaticPE: Completed successfully
finally I got the "clean" reads: file_1_clean.fastq.gz (4.8 GB) file_2_clean.fastq.gz (0.4 GB)
I don't know why there is this huge difference. Is it possible, that the second read pair is that bad? After this, I checked quality of reads again with fastqc. file_1_clean.fastq.gz looks ok, I think but file_2_clean.fastq.gz looks really strange and not really "clean".
Does anyone know what happend here?
Thanks in advance!
you need to 4 output file, each input need to two out putfile. Trimmomatic will save filtered and unfiltered reads in separate files so your command must be as following.
java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz file_1_discarded.fastq.gz ../file_2_clean.fastq.gz file_2_discarded.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50
in your command, Trimmomatic save discarded reads in file_2_clean.fastq.gz and because it the size of that file is low.
i hope my suggestion work for you
Yes, I think you're right. I completely overlooked this in the manual. How stupid... I will test again and tell you!
Thanks!
That bit makes the size of the R2_clean file (0.4G) suspicious. Looks like the file may have got corrupted in the process. Have you tried to repeat the trimming?
I repeated the trimming also with others parameters (e.g. SLIDINGWINDOW:4:15 or SLIDINGWINDOW:4:20), but it looks similar.
As a side note: FastQC tells you the presence of the adapters, since you're uploading FastQC screenshots you should see also the adapter content in the same output. Obviously this is true for the standard adapters, which are very often the ones used, but if you used a different one for some reason then you won't see it there. It shouldn't be the case though.