Question

Splice site usage from RNA-seq data

0

Entering edit mode

5.8 years ago

Suicyte ▴ 10

Is there an online resource or downloadable dataset that would give me data on splice site usage from published RNA-seq experiments? Ideally, I would like to answer questions like "which spliceform is dominant in which tissue" and similar.

I have seen online data that gives multi-tissue expression profiles for multiple refseq entries per gene, one example is [http://medicalgenomics.org]. However, for all the instances I checked, the intensities were almost the same. I guess that the maintainers count a lot of non-discriminatory reads for all isoforms, thus blurring the differences. Or am I missing something?

In response to some comments/anwers, let me be more precise:

I am currently interested in human transcripts, but I might be interested in zebrafish next week, who knows. I am well aware of the fact that I can download raw RNA-seq data from GEO and elsewhere, but I was hoping that somebody has done such analyses before, since they address a rather common problem. There are several data sources out there that provide abundance data for more than one isoform per gene. However, since the bulk of the reads will not be uniquely assignable to one isoform, I was hoping for an analysis focusing on those reads that allow to make this distinction. It was of course imprecise (or even wrong) of me to talk about "splice forms", since the usual short reads can at best tell about the usage of one particular splice site.

splicing RNA-Seq database • 1.4k views

ADD COMMENT • link 5.8 years ago by Suicyte ▴ 10

0

Entering edit mode

You can simply download any RNA-seq dataset (the FASTQ files) and then process the data using one or more combinations of HISAT2 / StringTie, DEXSeq, and rMATS. This would give you information on expression of different splice isoforms.

Online repositories where RNA-seq is commonly stored include:

SRA
GEO
EGA
ArrayExpress

I do not know anything about the Medicalgenomics website. For specific queries, I would contact them directly.

ADD REPLY • link 5.8 years ago by Kevin Blighe 88k

0

Entering edit mode

In addition to that, if you are interested in such things, you can also look at the coverage of exon junctions, using R packages like for example "spliceSites".

ADD REPLY • link 5.8 years ago by caggtaagtat ★ 1.9k

score 3 · Accepted Answer · 2019-02-15

What you are asking is not trivial - and for a very large proportion of genes they will have multiple isoforms which all contribute meaning there will in many cases not be a clear dominant feature. The answer also depends on which organisme you are interested (aka human vs nonhuman) and if you what you mean by "spliceform". I'll try and answer all combinations:

If you are interested in human and you refer to a specific splice junction the best one I know about is probably ASCOT where they have reprocessed all human data (published until a few years ago). Alternatively the Recount2 database can give you the raw junction counts.

If you are interested in human and you refer to a specific isoform the better option is probably GTEx here you can search for gene/transcript expression across all human tissues.

If you are interested in non-human I am not aware of any resource which have re-analyzed all the data so there you would probably need to, as @Kevin suggest, download and process the data yourself (quite easy these days). With regards to how to analyse alternative splicing in such data please refer to this answer for considerations and tools.