Explain fig. 5 c of "The impact of rare variation on gene expression across tissues" doi:10.1038/nature24267
0
1
Entering edit mode
7.1 years ago

In

"The impact of rare variation on gene expression across tissues" Nature 550, 239–243 (12 October 2017) doi:10.1038/nature24267

As far as I understand, using the GTex data, the authors have written a predictive algorithm (RIVER) to predict the consequences on gene expression of a set of variants.

Can you please explain me the Figure 5c. "Performance of RIVER for prioritizing functional regulatory variants. "

http://www.nature.com/nature/journal/v550/n7675/full/nature24267.html#f5

enter image description here

Distribution of RIVER scores (shades of blue) as a function of expression and genomic annotation scores. The distributions of variant categories across expression and genomic annotation scores are shown as histograms aligned opposite the corresponding axes.

I don't understand how I should read that figure ? What is the Y axis ? Whare are the red/oranges circles in the figure ?

rare variant article paper expression • 2.2k views
ADD COMMENT
1
Entering edit mode

I am still trying to understand but the biorxiv version has a better legend for the same figure - "Distribution of RIVER scores (shades of blue) as a function of scores from genomic annotation or gene expression alone. Pathogenic SNVs annotated in ClinVar are shown in red if they were likely regulatory (nonsense, splice-site, or synonymous) and orange otherwise (missense). The distributions of variant categories across absolute median Z-scores and predictions from genomic annotation are shown as histograms aligned opposite the corresponding axes"

https://www.biorxiv.org/content/biorxiv/early/2016/09/09/074443.full.pdf

ADD REPLY
1
Entering edit mode

Pathogenic SNVs annotated in ClinVar are shown in:

  • red if they were likely regulatory (nonsense, splice site, or synonymous)
  • orange otherwise (missense)
ADD REPLY
0
Entering edit mode

@aditi.qamra @cpad0112 thanks for the colored-dots ! :-) I still don't get the whole figure itself. How should I read it ? Why is it interesting ?

ADD REPLY
1
Entering edit mode

Here's my quick attempt - Again from the biorxiv version - Although RIVER was trained in an unsupervised manner, the learned model prioritized variants that were supported by both extreme expression levels for a nearby gene and genomic annotations suggestive of potential impact (Fig.5c). Rather than using a heuristic or manual approach, RIVER automatically learns the relationship between genomic annotations and changes in gene expression from data to provide a coherent estimate of the probability of regulatory impact.

So variants with higher expression level and higher RIVER(G only) score will be prioritised. Outliers according to their code have been categorised as those with median score >=2 (line 96 https://github.com/joed3/GTExV6PRareVariation/blob/master/call_outliers/call_outliers_medz.R) so you start seeing more blues around that (?). And no, its not interesting or clear.

If you really want to get in deep here's the code for the figure - https://github.com/joed3/GTExV6PRareVariation/blob/master/paper_figures/figure5c.R :)

p.s please correct me if I'm wrong - which I very well might be :)

ADD REPLY
0
Entering edit mode
  • Figure a is self explanatory. It is RIVER model
  • Figure b is comparative predictive power between two methods: RIVER (integrates genomic and transcriptomic information) and another method using genomic annotations only. My understanding is that gene expression is being predicted.

To my understanding authors are trying to show how RIVER model (genomic and transcriptomic information integrated) is good predicting out come compared to those only use genomic annotations

ADD REPLY

Login before adding your answer.

Traffic: 2394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6