Entering edit mode
6.1 years ago
sugus
▴
150
Hi there,
I am wondering how to remove batch effect on segmented_scna data downloading from TCGA PANCANA project. The demo of data format is as following:
Sample Chromosome Start End Num_Probes Segment_Mean
TCGA-KL-8323-11A-01D-2308-01 1 3218610 104558357 58272 0.0026
TCGA-KL-8323-11A-01D-2308-01 1 104561488 104573702 10 -0.6372
TCGA-KL-8323-11A-01D-2308-01 1 104579877 179610058 27754 0.0041
TCGA-KL-8323-11A-01D-2308-01 1 179621932 179622081 2 -1.6956
TCGA-KL-8323-11A-01D-2308-01 1 179623244 247813706 43114 0.0043
TCGA-KL-8323-11A-01D-2308-01 2 484222 242476062 131310 0.006
TCGA-KL-8323-11A-01D-2308-01 3 2212571 197538677 106379 0.0022
TCGA-KL-8323-11A-01D-2308-01 4 1053934 71781186 38527 0.0048
TCGA-KL-8323-11A-01D-2308-01 4 71781554 71782247 2 -2.2184
I am trying to remove batch effect across tumor type but I am not sure if the segment_mean value could be treated as gene expression and remove batch effect by using ComBat.
If not, could anyone give me some suggestions?
Many thanks advanced!
Why do you believe there is a batch effect?
Because it is a Pan Cancer analysis and it may have a batch effect.
If you are unsure about a batch effect existing in the first place, then you should not blindly assume that there does exist one - that could result in adjusting your data too much to the extent that you eliminate any interesting clinical implications that may exist in the data. Indeed, the copy number profile varies among different cancers, and also the grade / stage of these. How could you distinguish between technical and biological variability in this context?
You could just include '
CancerType
' as a covariate in whichever statistical modeling that you are doing, and proceed from there.