Help in identifying quality from FASTQC on illumina RNA sequences
1
0
Entering edit mode
9.1 years ago
eclark28 • 0

I have attached some images below. I am relatively new to using galaxy suite and need some help understanding whether or not the data I have is viable. This data was sequenced previously and I was given it to learn RNA-seq. I ran FASTQC through galaxy on it and here attached are images of 7 of the files. I am looking to see if the data is quality or not. Thank you for any help.

https://www.dropbox.com/s/v8853f5b6csu1gs/BiostarVar.001.jpg?dl=0

illumina RNA-Seq galaxy quality fastqc • 2.3k views
ADD COMMENT
0
Entering edit mode
9.1 years ago
stolarek.ir ▴ 700

Well FastQC doesn't really filter anything. It just raises a flag if the Quality is below some threshold. So in basics if you ahve red color it's bad, green it's good.

You haven't actually put the imagaes from those statistics. To share some real overview you should give a picture of the quality graphs from FASTQC.

If you want to actually filter some data with let's say quality >= 30 and with use something like adapterremoval software. It will divide you data in discarded / truncated. Truncated will be the reads, that passed the quality (or others) filter. You really shouldn't include some very weak data in you analysis.

Google or check RNA-seq papers to see how people filtered the data

ADD COMMENT
0
Entering edit mode

Thanks for the advice. Here are the statistics generated. I am looking mainly to see what categories I should pay closest attention to when looking at the QC report.

https://www.dropbox.com/sh/la8z7freytw2sdm/AAABDazSm-9yR0jCf09l1p4_a?dl=0

ADD REPLY
0
Entering edit mode

Well actually, every category is important. But any deviations will be interpeted differently depending on the application and data source. So let's just stick with you initial quality question. Like I said FASTQC just shows you the results, you still need to do filtering. If you are beginner bioinfromaticist, learn to use command line linux software (like adapter removal). If not, hire someone. I don't think it's a good idea to think that 'you kind of know what you are doing'.
So off to intepret results. I onnly checked first file. Per sequence quality shows you how many of your sequences have a mean quality score showed on the x axis. Basically >= 30 should be used. Rest, probably safe to discard (mainly, I don't know you application). The per base quality plot shows the quality distribution for each position (in lame terms: taking base 1 of all the reads, and do a box plot, then base 2 of all the reads, make a box plot and so on). So you can see that till the end of the reads there is a tendency for mean base quality to drop, and the variability of the quality is getting much bigger. Conclusion can be, that you actually want your data to be consistent, so it might be a good idea to trimm off last couple of bases.

ADD REPLY

Login before adding your answer.

Traffic: 2110 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6