I asked this today on Research Gate, but I'm not receiving any response there... Maybe you folks can help me out.
I am a little disappointed here. After studying a lot of data from these sites, I simply can't find a way to compare the data between them.
My goal is to compare tumor data (cell line or patient samples) and normal tissue data for a particular protein. These sites provide rna-seq and immunohistochemistry (in the case of HPA) data.
From my research, I just can't compare their data for RPKM and FPKM are relative quantitation data and aren't comparable for the most of situations. In addition, CCLE uses RMA!! Man!!!
So, how people make use of the data in these databases? And can I somehow compare the data? Do you know an example of a paper that has done a research using normal x tumor data obtained from these databases and made a valid comparison?
I'm about to lose my mind. Why there isn't a single standard unit for RNA-seq? This is very confusing.
What is the correct way of using GTEx, CCLE and HPA data?
Thank's for the response. However, I think I didn't get the last point right. To my knowledge I can't compare rna-seq data from HPA with rna-seq data from CCLE... or with rna-seq data from TCGA... because they came from different labs and therefore should have different bias... Am I understanding it correctly? How do I escape from that?
I think (and I could be wrong), only microarray samples from CCLE are available. You cannot compare Microarray data from CCLE with RNA-Seq data from TCGA because they have different statistical distributions and their units are not same. Moreover, if RNA-Seq data from CCLE is available then you can compare directly with TCGA datasets; however, you need to check PCA plots and see if the samples are clustering based on genotype or based on batches (where, a batch in this case is a consortium). If, it is clustering based on batches then you need to do batch correction.
Now I get it! Thanks. Would you have any references on that so I could study this matter? As I'm new to computational biology, my knowledge on clustering is still shallow. I have never heard about PCA plots before... Thank you again!
I would highly recommend to search these topics on Biostars. Many of these questions are always repeated and answered. IMHO, its one of the best resources for Computational Biology.
Thank you again! I'm gonna search more on the topics and read these references! Thank you very much!