Entering edit mode
4.3 years ago
tanya_fiskur
▴
70
Hello everyone.
I work with brain sample transcriptome which seems to be contaminated with muscular tissue. It was not my fault, and I also cannot re-do the sequencing. I want to do the differential expression analysis, but, of course, part of the "differentially expressed" genes are just a reflection of contamination. They are especially pronounced in the GO analysis. Is it somehow possible to remove all, at least, definitely muscle-related genes from the list?
Thank you.
how did you know that brain transcriptome was contaminated by muscular tissue? Based on GO and expression levels? Or during data preparation for analysis? Have you done PCA or MDS for your samples to see correlation of samples?
Yes, by GO and by the abundance of muscular myozin and troponin genes among the differentially expressed genes. On the MDS there are signs of batch effect (it looks like this: https://ibb.co/yPWsRFz ). Actually, the samples seem to be quite equally contaminated, if I compare the normalized counts of the troponin and myosin, it does not seem to be in one or two samples only.
you are right, it clearly is seen that distribution of samples are not correlated well. (14 weeks and 8.5 weeks named samples especially). Is it possible to mislabeling some samples (14 weeks and 8.5 weeks?).
I would do clustering of DEG genes to separate them and assign to brain and muscular using following approaches :
[http://cbdm.hms.harvard.edu/LabMembersPges/SD.html][1]
[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1536-8][2]
Please keep in mind that, on the other hand, even if some genes belong to the brain sample, they may not be up-regulated at a sampled condition or not up-regulated at any condition.
Its not clear how many replicates you have and what are the "conditions" that you want to perform differential analysis and if both the "conditions" are contaminated etc. Did you try to plot PCA on samples ? How does the contaminated samples behave ?