Hi,
I'm working on RNA-Seq data and wanted to start looking at alternative splicing events. Anyone have good advice/ideas to do that? I read that DEXSeq works well.
Edit > I'm working on human and bovine.
Thanks in advance,
N.
Hi,
I'm working on RNA-Seq data and wanted to start looking at alternative splicing events. Anyone have good advice/ideas to do that? I read that DEXSeq works well.
Edit > I'm working on human and bovine.
Thanks in advance,
N.
Such a big question. There are many tools of several categories that might be relevant to this problem.
There are also many tools that are usually considered for straight differential expression but if run the right way might still yield results informative to alternative expression of isoforms. These include: edgeR, DEseq, htSeq, DEGseq, sSeq, etc.
Note that placing each tool in one of three categories is an over-simplification. Some span across the three activities and some are components of a workflow generated by a single research group. Overall the area is a bit of a wild west. More tools are being developed constantly and you will find aspects of all of them that leave you wanting something better. The problem is not a simple one and is an area of active research.
The intro section of the ALEXA-seq website has a summary of some relevant background reading and also contains a now out-of-date review of rna splicing tools.
The RNA-Seq Blog has a great list of relevant resources here.
Here is a recent review: Integrative analysis of many RNA-seq datasets to study alternative splicing
We recently published a paper "Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud" that covers this topic in some detail and described many relevant tools in the Supplementary Tables. This resource is maintained in GitHub here and has a corresponding hands on tutorial: RNA-seq analysis tutorial. The list of tools can be found here.
Finally here are some relevant posts from BioStar and SeqAnswers:
I just came across this tool in BMC but haven't tried it yet SplicingCompass: differential splicing detection using RNA-Seq data. (link: http://bioinformatics.oxfordjournals.org/content/early/2013/02/28/bioinformatics.btt101.short) Hope it helps.
MATS is my favorite tool for splicing events (works with or without replicates):
http://rnaseq-mats.sourceforge.net/
MISO is another popular option:
Hi Charles, I am trying to use MATS for the first time. It seems that you have used it quiet a bit. What they mean when they say that the read-length or length of each read should be same. I have two conditions and multiple replicates for each condition. I trimmed low quality bases and removed reads with less than 40 nt. My original reads were 75nt and right now I have reads whose length ranges from 40 -75 bp. Can they be used? Or I need to crop them to a fixed length before I can feed it to MATS.
Also, can it accept the bam files from latest version of tophat ? Or it has to be the older version which they ask you to download if you also want to align you reads.
I think they mean that you would want to trim everything to 40 bp. If that is the case, you should receive an error if you try to use the mixed reads.
However, more importantly, I have found longer reads to be necessary to get good splicing event results. For example, I have not found it useful for 40 bp single-end reads, and I have only gotten good results with 100 bp paired-end reads. It is possible that the paired end requirement is more important than the length. Hopefully that is the case - for example, I know that very poor quality reads will negatively affect your TopHat alignment (for example, I had a HiSeq dataset that tried to push for 140 bp reads, but the last 40 bp had so many problems that I needed to trim them to 100 bp to get good results).
The latest version should be OK - I know it works with TopHat2, but I don't remember the exact version number that I have tried.
Thanks Charles. I have a single end read data which I assume wont be as helpful as paired end. Also, my reads were 75 bp and I trimmed them to 60bp. I hope i can get some decent results. The good thing is that I have at least 6 replicates for my two samples that I am comparing but not sure how much having replicates help for MATS analysis. Thanks a lot again.
I also have a question regarding read lengths in MATS tool. I aligned adapter-trimmed paired-end reads using STAR, and used sorted bam files as input to MATS for the analysis of splicing events. Originally, each mate length was 76, but after trimming it can be of any length. Given, I provide average insert length (r1 and r2) and the corresponding sd1 and sd2 values to MATS, does the read length should still need be same in different samples and replicates. If this is so, why MATS requires r1,r2 and sd1,sd2.
Yes - MATS will work without replicates.
If you do have replicates, JunctionSeq is also a relatively new option that I currently rank as my top choice for splicing analysis.
Do you think a workflow like this is good ?
Any advices ? or ideas ?
I think you need to read a little bit more on the topic because you are mixing things. For instance It doesn't make any sense to perform a de-novo assembly when you have the reference genome (human and cow) and you can use a reference-based assembly (Cufflinks) instead.
Hi NicoBxl,
I am planning to do a similar workflow as yours, but on a totally non-reference animal. I performed de novo assembly with both Trinity and Velvet/Oases. I also try cufflinks assembly which base on the draft genome of this animal that we just obtained.
I just want to know which software that you used for merging assembly (step 4) and re-align reads on merged assembly (step 5).
Thank you in advance!
Phuong.
Have you tried TopHat? It works on RNA-Seq data and can identify splicing events. The only drawback is that you need a reference genome. Which species are you studying?
In which species are you working? Do you have a reference genome for it?
You could give a try to the flux capacitor package: http://flux.sammeth.net/capacitor.html
Hii You can try Olego for splice aligner and Quantas fro isoform prediction. Here is the link http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation
I just pushed an update of my R package IsoformSwitchAnalyzeR to Bioconductor which introduces a module for alternative splicing.
For individual splice sites the already suggested tools might be better - but for a genome wide analysis of splicing it is very convenient to frame it as a comparison of isoforms that are switching since it allows for easy interpretation and statistical analysis. For examples of what it can do see the alternative splicing part of the vignette here.
As a bonus IsoformSwitchAnalyzeR also allows you to identify and analyze isoform switches with predicted functional consequences (both for individual genes and genome wide) meaning it will help you figure out what result of the identified the alternative splicing is.
I have done RNA-seq of Nextera data with BBMap on prokaryotes (which do not generally have differential splicing) which worked well, and human (and various other organism) RNA-seq of non-Nextera data which also worked well. It's pretty robust to most noise-inducing factors like error rate, intron length, insert size, and so forth. Note that I am BBMap's author. But, it's pretty easy to use as it autodetects the insert size in a splice-aware manner.
Please note that there are two kinds of Nextera libraries - normal (fragment) and LMP. LMP libraries require completely different processing from normal libraries, and they are the ones that would be expected to have a very broad size distribution. Normal Nextera libraries are not really very different from randomly-fragmented libraries, aside from a frequency bias in the first ~10 bases.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank for the info. I also like to add the Dream6 alternating splicing challenge webiste (http://www.the-dream-project.org/challenges/dream6-alternative-splicing-challenge). It's a little old but provides good background and some standard files and scoring metric. However, does anyone know about the winner result of this challenge? I couldn't find it on the web. Thank in advance.
Concerning Trinity, there is a recent paper in Nature Protocols that might help people trying to use this tool for their project.
mark for later use~_~