Question

gTEX normalize counts by genotype

0

Entering edit mode

7.1 years ago

Bioninbo • 0

Hello,

I want to analyze the gTEX dataset, but I am not interested in the impact of genotypes on phenotypes (RNA-Seq data).
In the lastest release of the gTEX data, more than 170,000 cis eQTL have been detected. Therefore a large amount of variation can be accounted by genetic loci.

I was wondering if anyone considered normalizing transcripts abundance by the genotype of the donors? My aim would be to obtain a matrix of gene counts normalized by genotypes.

Thanks a lot for inputs!

gTEX genotype eQTL • 2.4k views

ADD COMMENT • link 7.1 years ago by Bioninbo • 0

score 0 · Answer 1 · 2017-12-07

0

Entering edit mode

7.1 years ago

cindy.perscheid ▴ 100

Hi,

could you specify what you mean with "normalized by genotypes" here? As far as I understand, you have gene counts on the other hand, and a genotype (with variants, ergo non-numerical?) on the other hand. What would be your intention to do such a thing?

Best,

Cindy

ADD COMMENT • link 7.1 years ago by cindy.perscheid ▴ 100

score 0 · Answer 2 · 2017-12-07

0

Entering edit mode

7.1 years ago

Bioninbo • 0

Hi Cindy,

Thanks for your answer! Let me clarify myself.

What I suggest is to use the information of eQTL in a given tissue to normalize each sample individually.

For instance, if variant V is an eQTL that increases the expression of gene G. Could we quantify the amount of increase in expression of G for people bearing V (e.g. 50%)? This would be done at the level of all samples (individuals) for a given tissue.

And then use this number to normalize the count matrix, of each separate sample (individual) during or after its creation. For instance, by dividing the total counts of all reads mapped on gene G that bear variant V by 1.5. (Further optimizing one could maybe try to estimate the number of different transcript on each gene and to normalize this number instead). This is probably not the best way to normalize but this way I hope to clarify my thought. And my question is: do anyone know of studies/methods that tried to use such an approach?

Best, Jerome

ADD COMMENT • link 7.1 years ago by Bioninbo • 0

0

Entering edit mode

What do you plan to do with the normalized data? The notion of "correcting" an analysis for the effect of a specific genotype is not uncommon, and this can be done by adding it as a covariate in your regression model. I'm sure it is technically possible to do this correction / "normalization" upfront on the count matrix, as one may do for batch effect, but I am less sure about this.

ADD REPLY • link 7.1 years ago by christopher medway ▴ 460

0

Entering edit mode

Hi mbyvcm and thanks for the input!

I checked the thread What is the simple way to remove known batch effect from RNA-seq data ? and I think I understand what you suggest. I want to explore the gTEX data, and in particular the expression of genes/alleles in different tissues and for different types of individuals. I thought that correcting for the effect of eQTL could improve the accuracy of all analysis (even though it means tools like DESeq2 are not available for normalization afterwards, but other tools are better adapted to multi-tissues RNA-Seq normalization such as YARN/qsmooth). This should be done upfront in order to have correct subsequent analysis. Maybe a new tool, similar to ComBat, could be created for that purpose? Because, I guess that would be quite computationally intense, and would need some optimization.

ADD REPLY • link 7.1 years ago by Bioninbo • 0