Entering edit mode
5 weeks ago
Meghan.T
•
0
I have RNA-seq data of two very similar conditions ( monomer and dimer of a treatment) and although other invitro tests show some differences, when I perform PCA, they do not cluster well together. The result looks like:
I looked through the top 20 PC1 genes but they were irrelevant genes. Do you have any suggestion to make this plot better? Should I remove some genes? If so, should I remove them when doing DGE analysis too?
Has the experiment been performed in two batches? PC1 drives unwanted variation so could simply be a batch effect.
Unfortunately all of the samples were in the same batch. However we used Takara's TCR RNA-seq library prep kit. Essentially it collects 10,000 T cells and PCR amplify them and then sequence them.Since the T cells are very heterogeneous, this could be a reason for that. In this case, Do you have any suggestions?
Still, PC1 drives unwanted variation, for whatever reason. Define all samples left of PC1=0 as batch1 and right as batch2 and include that into your design. You can use removeBatchEffect on the log2-scale normalized counts to explore the effect of including this batch information into the design.
I'd also look at PC3 and onwards to see if any of those capture your expected variation between groups.