Hi,
I am new to statistics and bioinformatics. I have a very basic question regarding sample size. If I'm using TCGA data, and I select 600 lung cancer samples for expression analysis and along with it I select 60 adjacent control data (paired data). Will I get an error in the result (false positive)? Kindly explain.
Your 600 lung cancer samples are sequence data or microarray data?
You can try generating PCA plot or heatmap with FPKM RPKM or count file (if this is sequenced data)
After generating PCA plot you can select the group of sample take avarage and consider that as one and compare with your control.
This will result less false positive.
Also you can check abundance of the house keeping genes among 600 and control there should not be much variations