Is there an online resource or downloadable dataset that would give me data on splice site usage from published RNA-seq experiments? Ideally, I would like to answer questions like "which spliceform is dominant in which tissue" and similar.
I have seen online data that gives multi-tissue expression profiles for multiple refseq entries per gene, one example is [http://medicalgenomics.org]. However, for all the instances I checked, the intensities were almost the same. I guess that the maintainers count a lot of non-discriminatory reads for all isoforms, thus blurring the differences. Or am I missing something?
In response to some comments/anwers, let me be more precise:
I am currently interested in human transcripts, but I might be interested in zebrafish next week, who knows. I am well aware of the fact that I can download raw RNA-seq data from GEO and elsewhere, but I was hoping that somebody has done such analyses before, since they address a rather common problem. There are several data sources out there that provide abundance data for more than one isoform per gene. However, since the bulk of the reads will not be uniquely assignable to one isoform, I was hoping for an analysis focusing on those reads that allow to make this distinction. It was of course imprecise (or even wrong) of me to talk about "splice forms", since the usual short reads can at best tell about the usage of one particular splice site.
You can simply download any RNA-seq dataset (the FASTQ files) and then process the data using one or more combinations of HISAT2 / StringTie, DEXSeq, and rMATS. This would give you information on expression of different splice isoforms.
Online repositories where RNA-seq is commonly stored include:
I do not know anything about the Medicalgenomics website. For specific queries, I would contact them directly.
In addition to that, if you are interested in such things, you can also look at the coverage of exon junctions, using R packages like for example "spliceSites".