Thank you!! TVP is exactly the kind of thing I'm looking for. I've already gathered lots of GWAS studies,but RNA expression is a whole other dimension to approach it from, and this removes the variables associated with SNPs (as in the way haplotypes and other SNP-SNP influence the p-values of SNP-disease association studies, making it not so clear cut if the SNP by itself contributes to the disease). Thanks for informing me about these different sources you use to create the scores this will help me greatly.
I'd be pretty interested to find out more details about how you use these different sources. The text mining one I'm very interested in, I found this tool called BeFree which is built specifically for mining medical related data from scientific publications. I'm very interested in RNA expression data too, since that tells not just whether the gene or protein levels are upregulated or downregulated, but also tells you that its happening at the transcription level (as opposed to some biochemical process thats interfering on the protein level).
Thanks again!
EDIT: A question that came to mind. When you say you can get the RNA expression, how do they get this data? Is it through measuring the levels of mRNA in the tissue? Or do you mean quantifying the levels of the trascribed protein?
And another slightly less relevant thing I'm wondering about both of these things: is it also possible to detect modified mRNA or proteins, modifications caused by variants presents in the gene? This would be a really useful variable for determining how the different SNPs in the gene alter the structure of the gene, and then helping us better understand the correlation between the correlation between gene and disease, for example a small peptide like oxytocin has a very limited number of SNPs, so we could gather a list of protein symmetries (important for QSAR related stuff), as well as the altered amino acid sequences, and then calculate a score based on the proportions of these altered proteins, and then run GWAS type studies using this score, rather than just focusing on the presence of a genotype. Maybe techonlogy isn't there yet, but then again with small peptides, I don't see why not, we know what each SNP does to the structure so using some computational software we can find all the possible permutations bpjased on the known SNPs, and run further computational jobs to determine their symmetry, charge distribution, lipophilicity etc. and use QSAR analysis to determine how effectively these modified versions of the protein will interact with the target.
I have a feeling science isn't quite there yet though, Its brilliant work you're doing with this database! The world needs more of this.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts. This helps keep the threads logically organized.Great to know you've found what you were looking for. Why not check Open Targets: a platform for therapeutic target identification and validation published online two days ago? That will give you more details on how we integrate all that information in one place plus the scoring. For details on the the text mining, have a look at Literature Evidence in Open Targets – a target validation platform. We mine biomedical research papers and look for the co-occurrence of a target and a disease in the same sentence. For the RNA expression data (both baseline and differential), we get the data from Expression Atlas. The data is curated from both microarray and RNASeq experiments deposited in ArrayExpress and it can include expression profiles for proteins too such as Human immunochemistry data on 83 different normal cell types from 44 tissue types from the Human Protein Atlas project. Please check their help for more.
"As for the slightly less relevant thing", we'd like to map variants to protein structure, perhaps highlight if they are in or between protein domais. At the moment we classify the effect of the variant on genes and transcripts e.g. missense variants (would change the amino acid), stop gain (truncated protein), etc. At any rate, I will pass your words on QSAR and protein symmetries to the team.