Fastqc report analysis
1
0
Entering edit mode
20 months ago

Hello, I am a student and I dot a home work, where I have to analyze the data. So,I have to do FASTQC to all samples, after this I have to remove adapters, trim low-quality bases as well as remove reads that are shorter than 20 bp and compare the results. And I got quite strange results. To remove illumina adapters and remove reads, which are shorter than 20bp, I used this command

trim_galore --length 20 --illumina --fastqc filename.fastq.gz

But, the results are looking the same for me, so and I don't know if it is my fault or it is the data problem.

This is the link to google disk folder, where are fastqc reports (non trimmed and trimmed)

I just could understand, why I got worse data after trimming and how I could improve it

trim_galore fastqc • 1.7k views
ADD COMMENT
2
Entering edit mode
20 months ago
ntsopoul ▴ 60

Hi there,

Cool that you learn to do these things and welcome to the bioinformatic community.

In my opinion, both of the reports look to have good quality and could be used for downstream processing. It also seems that there was not really any adaptor sequence before you did the trimming. So in case you want to use it for RNA-seq analysis it is not necessary to do any trimming. Since Trimm Galore automatically recognizes the adaptors, you do not need to specify Illumina. In case you are using trim-galore for RNA-seq analysis, I would recommend setting the stringency parameter to 3. This makes sure that at least 3 base pairs of adapter must be present to cut. Otherwise, trim-galore will cut even if only one bp is present (however, this high stringency can be useful for bisulfite seq), and you mind end up with some small reads.

In case you did paired-end sequencing, you need to specify whether you have paired sequencing files (Read1 and Read2). Here is an example of how I do the trimming for RNA-seq data

trim_galore --paired file_R1.fastq.gz file_R2.fastq.gz -q 20 --fastqc  --stringency 3

I guess you say the trimmed report is worse off because the length distribution is more heterogeneous. This is because for some reads, trim galore found sequences that might have been adapter sequences or natural sequences that look like adaptor sequences and trimmed them to smaller reads. That's why you need to use a lower stringency (--stringency 3). However, it is still "good quality" and you can proceed.

See here for more details: https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md

ADD COMMENT
0
Entering edit mode

Thanks for your answer, it really helped me. Also, I added --illumina options, because then I checked MultiQC report for all of my samples it showed, that where are some illumina universal adapters. But, in case that where are about 0.26%, could it be just a little MultiQC tool error? MultiQC adapters

ADD REPLY
0
Entering edit mode

I think you are fine, no worries. Will you go ahead an align the fastq files to a genome? Do you know how?

ADD REPLY
0
Entering edit mode

Yeah, after trimming and fastqc I will have to generate MultiQC plots and after do mapping with reference genome. As I know, for eukaryotes I should use hisat2 to generate bam files

ADD REPLY

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6