Entering edit mode
6.6 years ago
Angelique
▴
10
Good morning,
I performed a FASTQC analysis on a fastq file ( the results :https://drive.google.com/open?id=1SzMSjaKuOdFL62r-ouJeCwslHN7Ndp_Q ), the results are ok for the main points but wrong for others. The duplication level is really high (81%), some kmer are enriched and the GC content is high too. I don't know how to improve the quality of the file. i think I should trim but I don't know where. Thank you for your advices
I see that you tagged
rna-seq
in the topic. What is the sequencing kit you used ?You have reads (I presume single reads) with length = 50bp, is that correct ? Or did you cut all the graphs ?
The first 13 bases of your data are not very well distributed in term of nucleotide. Maybe try to remove them using Trimmomatic and re-process FastQC on output data.This is RNAseq data. Nucleotide distribution at the beginning of the reads is characteristic and does not require trimming.
Ok ! When I saw reads with 50 bases long I wasn't sure about the RNAseq analysis. Thanks for the info
First, you need to clarify what you have sequenced using NGS platform and second, what is the aim of your project. Because all these parameters need to tackle carefully based on your requirement. For instance, RNASeq data have high duplication rate, amplicon sequencing can have abnormal GC content etc.
I am working with public RNA-seq data set ( from https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP091947, the reads are already all cut to 50 bp) and I want to perform a differential expression analysis with this data. It is sequenced with llumina HiSeq 2000, paired-end from human hepatocytes.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.I am going to suggest that you proceed with alignment and downstream analysis as is. Manipulating this data is likely not going to lead to "improvement". STAR/DESeq2 (or salmon) would be the way to go.
Sorry I am new to the forum and to RNA-seq analysis ... Thank you for all your answers. So the fastq file is ok according to an RNA-seq experiment even if the eleven first bases are weird ?
Yes it should be fine. Please see this blog post by Dr. Simon Andrews (Author of FastQC). You may also want to read some of the other FastQC related posts to understand other tests it does.
Ok Thanks a lot for your help !