PCA result and batch effect?
2
1
Entering edit mode
3.2 years ago
k0stasmp ▴ 10

Hello,

I am processing a dataframe that consists of about 55000 genes(TPM values,no access to raw data) and 400 samples. After removing the zero variance genes, I am performing a PCA on the samples trying to detect outliers. I have noticed that there are consistently 2 different populations of samples. I have tried to log2 and center/scale my data but the effect remains. Then I filtered the samples by race and sex with no effect. Is this behaviour to be expected? Can it be batch effect?

I have also uploaded the dendrogram of my data derived through:

sampleTree = hclust(dist(n13_pca_scz_min), method = "average");

I draw the red line which gives me a cluster of around 240 samples (everything below the line). Is it correct to go on to wgcna analysis using just them?

Thank you,

Costas

PCA

Dendrogram

PCA • 1.2k views
ADD COMMENT
3
Entering edit mode
3.2 years ago

I'd highly recommend trying an eigencorplot from PCAtools to determine what variable is driving that difference. You can then account for it or subset samples as appropriate. I wouldn't toss one population or the other, but analyze them side by side to see how similar the results from each are. There could be interesting biology there (or not), but you won't know if you don't look.

ADD COMMENT
2
Entering edit mode
3.2 years ago
Chris Dean ▴ 420

Batch effects are an important source of variation in all types of NGS studies.

Although race and sex do not appear to be a major source of variation in your data, there may be technical factors that are causing this variation (i.e., a batch effect).

If you have access to these factors, e.g.,

  • The technician who processed each sample (if there were multiple technicians)
  • The sequencing batch (if there were multiple sequencing runs)
  • The date of sample collection (if samples were collected on different dates)
  • Others

you can color your samples by those and see if they reveal any additional insights. If so, you can account for that variation in your statistical analysis.

Hope this helps!

ADD COMMENT

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6