Heyya,
Trying to figure out Quality Control for RNA-SEQ. I ran FASTQC on the first batch and the same sample, forward strand looks the same as on each lane. The same with the reverse. Is this normal?
Plus it looks kind of like this:
Per base and per sequence quality score look alright.
I read all over the internet but those problems seem to be too specific.
What does that mean? If you have very good quality data Q scores may be pegged very high.
I mean L001_R1 "per sequence GC content" looks identical with L002_R1, L003_R1 and L004_R1. Ok, so that looks alright then? What about 2 peaks? I read that's a sign of contamination but again, the quality score is good even then?
You should first scan/trim this data before re-running FastQC. Post that result here. As @swbarnes2 says you likely have something odd going on here.
Hi GenoMax,
I have run Trimmomatic (operation: SLIDINGWINDOW, Number of bases to average across: 4, Average quality required: 20) on some of my samples and done the FASTQC again. The Per Sequence GC content is not much different. Do you have any suggestions? Or how should I approach this issue?
I'll be honest. The prognosis doesn't look that good to me (biostar link)
I don't quite understand why you have this result but at this point go ahead and start you alignments. You don't need to worry about splicing so things should be simpler. Let us know what % alignments you get. If alignments look wonky they it is possible that you have some kind of contamination. But we will get to that later.
I'm thinking of asking the sequencer provider for the adapters he used. Would that be a good idea? Maybe that's the problem. I checked in here but who knows. This was a source of inspiration.
What would you do with the adapter sequence? Again, are you absolutely sure that deviation from that theoretical blue line (which was probably calculated for DNA from eukaryotic species) really is a problem for your sample?