Recommended Tools For Alternative Splicing Detection From Rna-Seq Data
10
34
Entering edit mode
11.7 years ago

Hi,

I'm working on RNA-Seq data and wanted to start looking at alternative splicing events. Anyone have good advice/ideas to do that? I read that DEXSeq works well.

Edit > I'm working on human and bovine.

Thanks in advance,
N.

alternative-splicing splicing rna-seq • 64k views
ADD COMMENT
105
Entering edit mode
11.7 years ago

Such a big question. There are many tools of several categories that might be relevant to this problem.

  1. Aligners capable of identifying splice sites from sequence reads (aka splice aware aligners). These include: TopHat, MapSplice, SpliceMap, HMMsplicer, GSNAP, STAR, RUM, SoapSplice, HISAT, etc. I saw a nice poster at AGBT13 that indicated that STAR performs very well compared to several competitors.
  2. Transcriptome assemblers that either perform de novo assembly of transcripts from sequence reads or do so with the help of a reference assembly (and perhaps even guided by known transcript annotations). These include: Cufflinks, Scripture, Trinity, Trans-ABySS, GRIT, etc. Of these, Cufflinks is probably the easiest to use while Trinity and Trans ABySS seem to yield impressive results in the hands of certain groups (particularly those that developed them...).
  3. Alternative expression tools that seek to identify isoform expression differences between two or more conditions. These include: Cuffdiff, ALEXA-seq, MISO, SplicingCompass, Flux Capacitor, JuncBASE, DEXSeq, MATS, SpliceR, FineSplice, ARH-seq, etc.

There are also many tools that are usually considered for straight differential expression but if run the right way might still yield results informative to alternative expression of isoforms. These include: edgeR, DEseq, htSeq, DEGseq, sSeq, etc.

Note that placing each tool in one of three categories is an over-simplification. Some span across the three activities and some are components of a workflow generated by a single research group. Overall the area is a bit of a wild west. More tools are being developed constantly and you will find aspects of all of them that leave you wanting something better. The problem is not a simple one and is an area of active research.

The intro section of the ALEXA-seq website has a summary of some relevant background reading and also contains a now out-of-date review of rna splicing tools.

The RNA-Seq Blog has a great list of relevant resources here.

Here is a recent review: Integrative analysis of many RNA-seq datasets to study alternative splicing

We recently published a paper "Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud" that covers this topic in some detail and described many relevant tools in the Supplementary Tables. This resource is maintained in GitHub here and has a corresponding hands on tutorial: RNA-seq analysis tutorial. The list of tools can be found here.

Finally here are some relevant posts from BioStar and SeqAnswers:

ADD COMMENT
2
Entering edit mode

Thank for the info. I also like to add the Dream6 alternating splicing challenge webiste (http://www.the-dream-project.org/challenges/dream6-alternative-splicing-challenge). It's a little old but provides good background and some standard files and scoring metric. However, does anyone know about the winner result of this challenge? I couldn't find it on the web. Thank in advance.

ADD REPLY
1
Entering edit mode

Concerning Trinity, there is a recent paper in Nature Protocols that might help people trying to use this tool for their project.

ADD REPLY
0
Entering edit mode

mark for later use~_~

ADD REPLY
5
Entering edit mode
11.7 years ago
henryvuong ▴ 810

I just came across this tool in BMC but haven't tried it yet SplicingCompass: differential splicing detection using RNA-Seq data. (link: http://bioinformatics.oxfordjournals.org/content/early/2013/02/28/bioinformatics.btt101.short) Hope it helps.

ADD COMMENT
5
Entering edit mode
10.6 years ago

MATS is my favorite tool for splicing events (works with or without replicates):

http://rnaseq-mats.sourceforge.net/

MISO is another popular option:

http://genes.mit.edu/burgelab/miso/

ADD COMMENT
0
Entering edit mode

Hi Charles, I am trying to use MATS for the first time. It seems that you have used it quiet a bit. What they mean when they say that the read-length or length of each read should be same. I have two conditions and multiple replicates for each condition. I trimmed low quality bases and removed reads with less than 40 nt. My original reads were 75nt and right now I have reads whose length ranges from 40 -75 bp. Can they be used? Or I need to crop them to a fixed length before I can feed it to MATS.

Also, can it accept the bam files from latest version of tophat ? Or it has to be the older version which they ask you to download if you also want to align you reads.

ADD REPLY
1
Entering edit mode

I think they mean that you would want to trim everything to 40 bp. If that is the case, you should receive an error if you try to use the mixed reads.

However, more importantly, I have found longer reads to be necessary to get good splicing event results. For example, I have not found it useful for 40 bp single-end reads, and I have only gotten good results with 100 bp paired-end reads. It is possible that the paired end requirement is more important than the length. Hopefully that is the case - for example, I know that very poor quality reads will negatively affect your TopHat alignment (for example, I had a HiSeq dataset that tried to push for 140 bp reads, but the last 40 bp had so many problems that I needed to trim them to 100 bp to get good results).

The latest version should be OK - I know it works with TopHat2, but I don't remember the exact version number that I have tried.

ADD REPLY
0
Entering edit mode

Thanks Charles. I have a single end read data which I assume wont be as helpful as paired end. Also, my reads were 75 bp and I trimmed them to 60bp. I hope i can get some decent results. The good thing is that I have at least 6 replicates for my two samples that I am comparing but not sure how much having replicates help for MATS analysis. Thanks a lot again.

ADD REPLY
0
Entering edit mode

I also have a question regarding read lengths in MATS tool. I aligned adapter-trimmed paired-end reads using STAR, and used sorted bam files as input to MATS for the analysis of splicing events. Originally, each mate length was 76, but after trimming it can be of any length. Given, I provide average insert length (r1 and r2) and the corresponding sd1 and sd2 values to MATS, does the read length should still need be same in different samples and replicates. If this is so, why MATS requires r1,r2 and sd1,sd2.

ADD REPLY
0
Entering edit mode

I would recommend checking with the developer.

You could also just try specifying 76 bp an see what happens - I think you might get an error, but it has been a while.

ADD REPLY
0
Entering edit mode

are you sure that MATS work without replicates ?

ADD REPLY
0
Entering edit mode

Yes - MATS will work without replicates.

If you do have replicates, JunctionSeq is also a relatively new option that I currently rank as my top choice for splicing analysis.

ADD REPLY
3
Entering edit mode
11.7 years ago

Do you think a workflow like this is good ?

  1. Alignment with STAR
  2. Reference-based assembly : cufflinks
  3. De-Novo assembly : Trinity
  4. Merge assembly
  5. Re-align reads on merged assembly
  6. Infer isoform expression (cuffdiff or RSEM)
  7. Alternative splicing analysis : DEXSeq

Any advices ? or ideas ?

ADD COMMENT
1
Entering edit mode

I think you need to read a little bit more on the topic because you are mixing things. For instance It doesn't make any sense to perform a de-novo assembly when you have the reference genome (human and cow) and you can use a reference-based assembly (Cufflinks) instead.

ADD REPLY
1
Entering edit mode

De-Novo will be usefull for a other aspect of my project (fusion)

ADD REPLY
0
Entering edit mode

Hi NicoBxl,

I am planning to do a similar workflow as yours, but on a totally non-reference animal. I performed de novo assembly with both Trinity and Velvet/Oases. I also try cufflinks assembly which base on the draft genome of this animal that we just obtained.

I just want to know which software that you used for merging assembly (step 4) and re-align reads on merged assembly (step 5).

Thank you in advance!

Phuong.

ADD REPLY
2
Entering edit mode
11.7 years ago

Have you tried TopHat? It works on RNA-Seq data and can identify splicing events. The only drawback is that you need a reference genome. Which species are you studying?

ADD COMMENT
2
Entering edit mode
11.7 years ago
Biojl ★ 1.7k

In which species are you working? Do you have a reference genome for it?

You could give a try to the flux capacitor package: http://flux.sammeth.net/capacitor.html

ADD COMMENT
2
Entering edit mode
8.6 years ago
Lalit ▴ 30

Hii You can try Olego for splice aligner and Quantas fro isoform prediction. Here is the link http://zhanglab.c2b2.columbia.edu/index.php/Quantas_Documentation

ADD COMMENT
1
Entering edit mode
6.6 years ago

I just pushed an update of my R package IsoformSwitchAnalyzeR to Bioconductor which introduces a module for alternative splicing.

For individual splice sites the already suggested tools might be better - but for a genome wide analysis of splicing it is very convenient to frame it as a comparison of isoforms that are switching since it allows for easy interpretation and statistical analysis. For examples of what it can do see the alternative splicing part of the vignette here.

As a bonus IsoformSwitchAnalyzeR also allows you to identify and analyze isoform switches with predicted functional consequences (both for individual genes and genome wide) meaning it will help you figure out what result of the identified the alternative splicing is.

ADD COMMENT
0
Entering edit mode
8.7 years ago
garyhokawai ▴ 30
Very informative post, I just wonder if anyone has done alternative splicing analysis with nextera kit derived library, which has a broad size distributed fragments. Would that diverse library affect the algorithms? Any recommendation for this kind of library analysis?
ADD COMMENT
1
Entering edit mode

I have done RNA-seq of Nextera data with BBMap on prokaryotes (which do not generally have differential splicing) which worked well, and human (and various other organism) RNA-seq of non-Nextera data which also worked well. It's pretty robust to most noise-inducing factors like error rate, intron length, insert size, and so forth. Note that I am BBMap's author. But, it's pretty easy to use as it autodetects the insert size in a splice-aware manner.

Please note that there are two kinds of Nextera libraries - normal (fragment) and LMP. LMP libraries require completely different processing from normal libraries, and they are the ones that would be expected to have a very broad size distribution. Normal Nextera libraries are not really very different from randomly-fragmented libraries, aside from a frequency bias in the first ~10 bases.

ADD REPLY

Login before adding your answer.

Traffic: 2141 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6