Given junction.bed files from tophat how can is define exon splicing events? for example skipped exon, constitutive exon or skipped junctions.
Given junction.bed files from tophat how can is define exon splicing events? for example skipped exon, constitutive exon or skipped junctions.
That's what rMATS is for, though it'll take the BAM file instead of the junction.bed file, which in my opinion is essentially worthless.
Also, stop using tophat. Use something better, like STAR or even hisat2.
Hi NBS:
If you provide
... then perhaps someone will be able or inclined to take your challenge.
In my experience, these terms are not consistently defined in the literature so it would be a mistake to try and assume what you really want.
For example, I've never heard of "skipped junctions" as a kind of "exon splicing event"
Similarly, "constitutive exon" is a label given to an exon which appears in every (known) isoform of a gene. But it is not a name for an "exon splicing event".
So you really have to be quite specific in what you are asking for.
That said, I expect that however you frame the question, you will find that knowing just the locations of (putative) introns, as provided by a junction.bed file, will prove insufficient to answer it. This is because these files don't tell you where the surrounding exons begin and end. They just tell you where the introns are.
Nonetheless: you might think along these lines:
Consider your junctions.bed file(s) as a directed graph(s) with each line in the file representing an 'edge' connecting a 'donor' with an 'acceptor' site (where the sites are integers being the chromosomal coordinate).
Then split it up into a set of its 'connected components'.
Then relabel each connected components, changing the label from the chromosomal coordinate to its rank in the list of all the chromosomal coordinates.
Then each unique graph might correspond to a "kind" of exon splicing event.
Example
(ignoring strand and chromosome for simplicity) given these junctions as input directed graph:
1100 1200
1100 1300
2100 2200
2100 2300
3100 3200
3100 3300
3100 3400
The connection components would be
1100 1200
1100 1300
2100 2200
2100 2300
3100 3400
3200 3400
which would be relabeled as
1 2
1 3
1 2
1 3
1 3
2 3
Now, you might decide that [[1,2],[1,3]] is the canonical motif for an alternative acceptor event (of which we have 2), and [[1,3],[2,3]] is the canonical motif for an alternative donor event (of which we have 1).
BUT, remember, you don't know where the surrounding exons end, so, you might well be making a mistake in-so-doing.
If you know and represent the extent of the surrounding exons (as might be inferred from RNA-Seq coverage, or might be given as known in a GTF file), this kind of approach extends nicely. A little trickier though but doable.
FWIW: I still wonder if these categories are really biologically meaningful. Many different schemes have been devised to classify them (a good review appears in: A General Definition and Nomenclature for Alternative Splicing Events) but less interesting work has substantiated that these classes are biologically relevant, my prior efforts notwithstanding. I would appreciate being educated contrariwise here... For instance: Do we know that different RBPs control switching between A3SS (Altenative 3' Splice Sites) than control switching between, say, MXE (Mutually Exclusive Exon). That would be interesting!
@Malcom.cook thank you for elaborate information, I lost you hen you started talking about directed graphs. Will be reading up on this to make complete sense. From what you have explained in the example let me assume this (exon_skipped column was derived from using https://regtools.readthedocs.io/en/latest/commands/junctions-annotate/) passing bam files.
start end exon_skipped
1100 1200 0
1100 1300 1
using dexseq it can be seen that reads mapping to 1100 1300 is statistically significant between two conditions, which means that the previous exon (1100 1200) was skipped
------A1100(5')------1200(3')B---------1300(3')C--------
correct me if i am wrong if we can consider B as exon skipped.
Hi Sorry about the link can you try this https://regtools.readthedocs.io/en/latest/
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
So how to use BAM files to look for splicing events? could you explain the algorithm of splicing events?
Please see the rMATS paper.
Thanks @Devon Ryan i am aware of rmats, (picking up midway some ones work) I need to annotate junctions that were differentially expressed by using DEXseq. I only have set of junctions now, no access to fastq or bams.
Hmm, I'm sure there's something for this but it's not something I've ever needed to do. For the most part, what you're seeing is just changes in isoform usage.
Yes, I guess I need to write something myself.