To access GTEx data you need to register first on their website.
In the data download page, you are interested in the following files:
- GTEx_Data_V6_Annotations_SampleAttributesDS.txt -> table with tissue of origin and site of each sample
- GTEx_Analysis_v6_RNA-seq_RNA-SeQCv1.1.8_gene_rpkm.gct.gz -> RPKM expression for every sample/gene (note: it's a big file)
For the boxplot, you will have to adapt the following code. It uses dplyr, tidyr and ggplot2. Hope you are familiar with these libraries :-)
# preparing data
> library(dplyr)
> library(tidyr)
> library(ggplot2)
> samples = read.delim('GTEx_Data_V6_Annotations_SampleAttributesDS.txt', sep='\t') %>%
select(SAMPID, primary.tissue=SMTS, tissue=SMTSD)
> gtex = read.table('GTEx_Analysis_2014-01-17_RNA-seq_RNA-SeQCv1.1.8_gene_rpkm.gct', skip=2, colClasses=c('character', 'character', rep('numeric', 2921)), stringsAsFactors=F, header=T) # go take a coffee!
> gtex.bysample = expdata %>%
gather(SAMPID, expression, -Name, -Description) %>%
mutate(SAMPID=gsub('\\.', '-', SAMPID)) %>%
left_join(samples)
At this point you will have a dataframe with one line for every gene and sample:
> gtex.bysample
Name Description SAMPID expression primary.tissue tissue
1 ENSG00000223972.4 DDX11L1 GTEX-N7MS-0007-SM-2D7W1 0.00000 Blood Whole Blood
2 ENSG00000227232.4 WASH7P GTEX-N7MS-0007-SM-2D7W1 2.95098 Blood Whole Blood
3 ENSG00000243485.2 MIR1302-11 GTEX-N7MS-0007-SM-2D7W1 0.00000 Blood Whole Blood
4 ENSG00000237613.2 FAM138A GTEX-N7MS-0007-SM-2D7W1 0.00000 Blood Whole Blood
5 ENSG00000268020.2 OR4G4P GTEX-N7MS-0007-SM-2D7W1 0.00000 Blood Whole Blood
6 ENSG00000240361.1 OR4G11P GTEX-N7MS-0007-SM-2D7W1 0.00000 Blood Whole Blood
You can filter by your gene and tissues of interest:
runx3.blood_breast = gtex.bysample %>% filter(Description=='RUNX3', primary.tissue %in% c("Blood", "Breast"))
You can plot it with ggplot2:
runx3.blood_breast %>%
ggplot(aes(x=primary.tissue, y=expression)) +
geom_boxplot()
In alternative, in NCG we also provide a summary of Gene expression for cancer genes:
Thank you Giovanni M Dall'Olio
Thanks Giovanni.
Is there a typo in
where "expdata" should be "gtex" and this step takes forever?
@Giovanni M Dall'Olio
How to generate genotype specific plots for an RSID
Say rs1998081 or rs 2567619 as done in this