Hello,
I want to analyze the gTEX dataset, but I am not interested in the impact of genotypes on phenotypes (RNA-Seq data).
In the lastest release of the gTEX data, more than 170,000 cis eQTL have been detected. Therefore a large amount of variation can be accounted by genetic loci.
I was wondering if anyone considered normalizing transcripts abundance by the genotype of the donors? My aim would be to obtain a matrix of gene counts normalized by genotypes.
Thanks a lot for inputs!
What do you plan to do with the normalized data? The notion of "correcting" an analysis for the effect of a specific genotype is not uncommon, and this can be done by adding it as a covariate in your regression model. I'm sure it is technically possible to do this correction / "normalization" upfront on the count matrix, as one may do for batch effect, but I am less sure about this.
Hi mbyvcm and thanks for the input!
I checked the thread What is the simple way to remove known batch effect from RNA-seq data ? and I think I understand what you suggest. I want to explore the gTEX data, and in particular the expression of genes/alleles in different tissues and for different types of individuals. I thought that correcting for the effect of eQTL could improve the accuracy of all analysis (even though it means tools like DESeq2 are not available for normalization afterwards, but other tools are better adapted to multi-tissues RNA-Seq normalization such as YARN/qsmooth). This should be done upfront in order to have correct subsequent analysis. Maybe a new tool, similar to ComBat, could be created for that purpose? Because, I guess that would be quite computationally intense, and would need some optimization.