Hi,
I am trying to remove the batch effect from my RNSeq Data. I hope someone can help me determine how my correction has gone, has it worked.
Using Combat: I used combat to correct for batch. The resulting MSDplot gives signficantly better grouping of the sampels on the basis of condition. However, the comat requires log transformed data, so I need to revese transform the data to use in DESeq. I can do that, but after doing that and doing differential analysis, is there a way I can compare my combat-corrected data with the un-corrected data ? (Just wondering because I need to reverse log transform)?
I can use svaseq, which gives me surrogate variations to model the RNASeq data.
ddssva <- dds ddssva$SV1 <- svseq$sv[,1] ddssva$SV2 <- svseq$sv[,2] design(ddssva) <- ~ SV1 + SV2 + condition
how to visualise the correction.
i tried following this. http://genomicsclass.github.io/book/pages/adjusting_with_factor_analysis.html
Have you looked into DESEq2 tutorial which also has batch effect analysis information with svaseq? There is a piece of code missing. Follow the below links: Link1
Link 2 (with limma)
LinK 3 shows how to create the corrected matrix with gene and normalized batch corrected expression data. This is what you need to plot in a PCA/MDS etc. to see if the batch confounders are addressed or not. There is a thing missing there. The tutorial is how to use surrogate variables in multi-factor approach in linear model but for correction you need to use the SV variables on count data entire expression set , correct/fit/ normalize the data either in TPM/FPKM/logscaleCPM scale and then make overall dimension reduction visualization.
Code you need:
The last two are already answers and threads where I have addressed the similar issues earlier. Good luck!
Thanks for your response. I did at those links, thanks again for posting.
I have used svaseq to get the surrogate variatables. Used them as covariates in
removebatcheffect
in limma. Then used the data to get PCA plot. For DE, I followed the SVs as covariates in the model. So the 2nd part of my question is addressed.When I used combat to remove batch effect, the MDS plot looked much better. I would like to use this data for DE. However, this is log transformed data, which i am not able to transform back into counts to use to make a DGElist for EdgeR or DESeqDataset in DESeq2. If you can suggest something, that would be great ?
Well what you view in MDS after correction is already batch corrected and normalized. So you have already addressed the issue of the entire gene set compendium. But this is not what you feed into a model matrix for linear fit using any DE method for differential expression analysis. What you do is to take the number of confounders like n.sv and use them as covariate in the limma/edgeR/DESeq2 model for multifactorial linear model fit. So in theory you are using your counts but they are being already given the information for your confounders as factors to not estimate/influence any differentially expressed genes due to those batch effects. Batch corrected data plotted in MDS/PCA is only used for visualization. For using DE , I would not recommend to use DESEq nornalization inside of limma-voom. Both are different methods with different normalization and also handles batch surrogate analysis in its own way. One reason edgeR has its own batch correction method. However, SVASEQ with sva is something that can be worked with any of the methods. So just extract the number of sv and use them downstream in your model matrix that normalizes your initial count data and adjusts the effects of batches with n.sv as covariates in design matrix.
In theory: This is what you are doing. Addressing the batch effect via combat/sva/RUSeq, etc, view them on adjusted log transformed data, followed by extracting the n.sv's and use them as factors in DE model matrix with count data for a standard DE analysis with edgeR/limma-voom/DESeq2 . Check links how one used in limma below. For DESeq2, the thread I posted earlier should be enough for you to figure the code for analysis. You should also read some batch correction papers and follow gitbook tutorails of RNASeq analysis like this.
Link 1 for limma
This Link 2 should give an idea how to use edge and limma voom with batch correction
P.S: If answers help you solve your problems and query, accept them as answers or upvote them for the thread to be also useful for others.