I have the TCGA-READ RNA-seq data obtained from GDC Portal.
The samples are of types: "primary tumor"
and "solid tissue normal"
collected from individuals. The solid tissue normal is a normal tissue sample that is adjacent to the primary tumor. Henceforth, the solid tissue normal may not necessarily be a normal tissue as the sample is still from an individual who has a tumor. So, it would be incorrect to label such samples as normal.
For the binary classification problem, I need tumor samples
and normal samples
from disease and healthy individuals respectively.
I don't know if I am right but I think the normal samples from any given TCGA data that are of blood-derived/ solid tissue normal tissue sample types may not be having samples collected from normal individuals (disease-free).
Can anyone please suggest on where/how do I get the normal samples?
If there is some website with normal samples, how do I match the genes from the current tumor data?
Any suggestions are highly appreciated. Thanks
If you want to put them into the same statistical analysis then you are restricted to those TCGA normals since any independent dataset is a completely different experiment so batch effects would dominate any result you generate rather than a true biological effect.
Thanks for the suggestion.