Question

How to improve fastq quality based on fastqc output ?

0

Entering edit mode

6.6 years ago

Angelique ▴ 10

Good morning,

I performed a FASTQC analysis on a fastq file ( the results :https://drive.google.com/open?id=1SzMSjaKuOdFL62r-ouJeCwslHN7Ndp_Q ), the results are ok for the main points but wrong for others. The duplication level is really high (81%), some kmer are enriched and the GC content is high too. I don't know how to improve the quality of the file. i think I should trim but I don't know where. Thank you for your advices

RNA-Seq • 2.2k views

ADD COMMENT • link 6.6 years ago by Angelique ▴ 10

1

Entering edit mode

I see that you tagged rna-seq in the topic. What is the sequencing kit you used ?

You have reads (I presume single reads) with length = 50bp, is that correct ? Or did you cut all the graphs ?

~~The first 13 bases of your data are not very well distributed in term of nucleotide. Maybe try to remove them using Trimmomatic and re-process FastQC on output data.~~

ADD REPLY • link 6.6 years ago by Bastien Hervé 5.9k

2

Entering edit mode

This is RNAseq data. Nucleotide distribution at the beginning of the reads is characteristic and does not require trimming.

ADD REPLY • link 6.6 years ago by GenoMax 147k

1

Entering edit mode

Ok ! When I saw reads with 50 bases long I wasn't sure about the RNAseq analysis. Thanks for the info

ADD REPLY • link 6.6 years ago by Bastien Hervé 5.9k

1

Entering edit mode

First, you need to clarify what you have sequenced using NGS platform and second, what is the aim of your project. Because all these parameters need to tackle carefully based on your requirement. For instance, RNASeq data have high duplication rate, amplicon sequencing can have abnormal GC content etc.

ADD REPLY • link 6.6 years ago by Tm ★ 1.1k

0

Entering edit mode

I am working with public RNA-seq data set ( from https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP091947, the reads are already all cut to 50 bp) and I want to perform a differential expression analysis with this data. It is sequenced with llumina HiSeq 2000, paired-end from human hepatocytes.

ADD REPLY • link 6.6 years ago by Angelique ▴ 10

1

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

I am going to suggest that you proceed with alignment and downstream analysis as is. Manipulating this data is likely not going to lead to "improvement". STAR/DESeq2 (or salmon) would be the way to go.

ADD REPLY • link 6.6 years ago by GenoMax 147k

0

Entering edit mode

Sorry I am new to the forum and to RNA-seq analysis ... Thank you for all your answers. So the fastq file is ok according to an RNA-seq experiment even if the eleven first bases are weird ?

ADD REPLY • link 6.6 years ago by Angelique ▴ 10

4

Entering edit mode

Yes it should be fine. Please see this blog post by Dr. Simon Andrews (Author of FastQC). You may also want to read some of the other FastQC related posts to understand other tests it does.