I'm quite new to bioinformatics and I was wondering about the following.
Is there a way to obtain information about which transcripts are expressed in which tissues (including RPKM values)? I already obtained this information on gene level from GTEx portal (--> Download --> "Gene RPKM"), and I basically would wish for the same file structure, but on transcript level instead of gene level.
Does anybody know how to to that?
Thank you very much in advance and best regards.
PS: I already tried to download the file "Transcript RPKM" from GTEx, however it does not provide the right information, or maybe I just don't seem to understand it.
As of V6 data release, the transcript isoform values are in this file - GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt.gz
That file on decompressing is ~15Gb.
Then that means there are 8559 cols in the file.
And the snippet here on GTEx site about V6 data release says this -
2015-10-19 V6 Data Released The GTEx Portal has been updated to data
release V6 (dbGaP accession phs000424.v6.p1). In this release the
number of genotyped donors has increased to 450 and the number of
RNA-seq samples to 8555 across 51 tissue sites and 2 cell lines,
giving sufficient power to detect eQTLs in 44 tissues. Full gene and
isoform expression datasets are available for download. Genotypes and
RNA-seq bam files are available via dbGaP.
You're absolutely right, it does add up, so the information I'm looking for is basically contained in this file. Unfortunately in this file, the information is not sorted neatly, so I cannot work with it in its current form. What I would need is the same format as the following: