Hello,
I have some RNA seq data. I did fastqc and multi QC on the data and I have some questions about the output and how to proceed. I was checking the quality scores for the bases and the mean for my samples is higher than 35 which seems to be good. However, when I look at the individual fastqc files I noticed that even though the per base sequence quality seems to pass the check the error bars are really big. For example the mean could be 38 but the error bar will reach 24. Should I be concerned about that?
Also, I am getting two peaks on the per sequence GC content. I was reading about it and it seems that this could indicate some kind of contamination. I am planning to use STAR to align the data and then check the GC content on the mapped and unmapped reads. I am also planning to blast some of the overrepresented sequences. Is there anything else I can do to identify the source of contamination and check if it interferes with the mapping?
Thank you
Thank you for replying. I added the images on the initial post. I blasted the overepresented sequences and they seem to be lncRNA and a specific enhancer. Only one sequence seemed to be rRNA. I get around 50% alignment with kallisto. I am planning to align with STAR now so I can check the GC content of the mapped and unmapped reads.
I noticed that there is an rRNA sequence that is in the overrepresented sequences for all my samples. The GC content for that is around 69%.Could that be what is causing the issue? I aligned my data with STAR and then when I do fastqc on the aligned BAM files the peak still seems to be there. However, I just realized that that rRNA is not on the gtf file from ensembl and in general it does not seem to have an ensembl id. Does that mean that it is not that rRNA causing the issue?
If your BAM files contain unaligned reads then that is expected. It is likely that rRNA is causing that peak distribution. Unless you are working with rRNA you are not going to use those counts/reads.
I think that my BAM file does not contain unaligned reads because I used
--outReadsUnmapped
on the STAR command.