Quality control & Processing of RNA seq data
5
4
Entering edit mode
9.9 years ago
David_emir ▴ 500

Hello All,

I am having an RNA-seq raw data, Please let me know any protocol available to QC the data. It would be great if you guys post codes, since I am new to programming and Bioinfo it would be a great help to me.

Thanks a ton !!!

-Khaliq

quality-control RNA-Seq • 4.3k views
ADD COMMENT
3
Entering edit mode
9.9 years ago

Check out Picard suite for various quality metrics, CollectRnaSeqMetrics should cover most of your needs.

PS

FastX toolkit could be used for quality trimming, etc. It could also be accessed via Galaxy

ADD COMMENT
2
Entering edit mode
9.9 years ago

You can run QC report using fastQC. It is java-based and has a user-friendly interface, don't need any programming skills.

ADD COMMENT
0
Entering edit mode

Thanks a lot Marina, it's a great help! FASTQC helps me in QCing the raw data, but I can't actually remove the adaptors/removing bases with less Phred quality, How to do that?

ADD REPLY
0
Entering edit mode

Use fastX tools from Galaxy if you are not very familiar with coding http://hannonlab.cshl.edu/fastx_toolkit/galaxy.html or install it on your machine.

ADD REPLY
1
Entering edit mode
9.9 years ago

In parallel to FastQC, common trimmers are trimmomatic, trim_galore (this is actually slow, but it works), and skewer...to name just a few. Some people also like using prinseq, though I have no experience with it.

I should note that I generally suggest people avoid the fastx toolkit. It's fine if you have single-end data (probably the most common), but causes no end of problems (and posts here) when people try to use it with paired-end data.

ADD COMMENT
0
Entering edit mode
9.9 years ago

Picard , FastQC, etc are nice to evaluate your data previous to the mapping procedure

I would also include programs to analyze your mapped data with programs similar to Qualimap (you can use before and after the mapping), Statmap or similar programs

ADD COMMENT
0
Entering edit mode
9.9 years ago
Michele Busby ★ 2.2k

RNA Seq qc is also good

The most common modes of failure are are incomplete removal of rRNA (i.e. all you sequence is ribosomes, <~10% is normal, shows up as high alignment to rRNA), not enough input causing low complexity (shows up as high duplication rate which can be calculated but also seen in IGV - change the settings to show duplicates), contamination (shows up in low alignment rate), and sequencing your adapters rather than your RNA (shows up as low alignment rate)/

http://bam.iobio.io/ is BY FAR the easiest way to get your first quick and dirty stats.

ADD COMMENT

Login before adding your answer.

Traffic: 1724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6