I am working on some RNA sequencing data from multiple cell lines, but with the same deletion across all the cell types. For example, I have the same deletion and wildtype for iPSC, neurons, and neural progenitor cells, totally 18 samples (3 cases and 3 controls each).
I was wondering what your thoughts were on using cell line as a covariate in DESeq2 and aggregating all the data together?
I have separated the cell lines and looked at the correlation between the Z test statistics, and found little to no correlation between cell lines, so it makes me think I shouldn't try combining all the data. But, I would really appreciate any insight and if any experts had any input.
As with everything in bioinformatics, if you have time, try both approaches. I am not sure that a simple correlation analysis is sufficient to conclude that cell-lines should be analysed together (or not).
I have worked on cell-lines a lot within the past year. Including 'line' as a covariate can help to adjust for the cross-line differences that may exist. However, if your lines are from disparate tissues, like CNS tissues and skin tissues, then that may be too much of a difference for which the model could account.
Also, consider the following:
if you normalise the lines separately, then the end results are not quite cross-comparable —certainly not the expression levels— as the lines will not have been normalised together. You could possibly do a meta-analysis at the end, if you choose to normalise them separately, though.
Thanks a lot Kevin! Is there any methodology you would suggest for meta-analyzing? I was thinking of using an inverse-variance weighted meta-analysis but I'm unsure whether I would use the overall variance of the Z-statistic or if I should use the logFC SE and find the variance for each gene.
Also, if I had different mutations (all implicated in the same disease) in different genes, but the same cell line, should I use mutation as a covariate? Sorry for all the questions, I'm fairly new to all this.
Meta-analysis is not quite my area but the program that comes to mind is rankProd (in R). I don't know which specific method(s) is / are implemented in rankProd, though.
I cannot really comment on the mutation part. It seems like it is only relevant to one cell-line. If you are interested in performing differential expression across the mutation states, then you will have to include it in the design formula anyway.
Thanks a lot Kevin! Is there any methodology you would suggest for meta-analyzing? I was thinking of using an inverse-variance weighted meta-analysis but I'm unsure whether I would use the overall variance of the Z-statistic or if I should use the logFC SE and find the variance for each gene.
Also, if I had different mutations (all implicated in the same disease) in different genes, but the same cell line, should I use mutation as a covariate? Sorry for all the questions, I'm fairly new to all this.
Meta-analysis is not quite my area but the program that comes to mind is rankProd (in R). I don't know which specific method(s) is / are implemented in rankProd, though.
I cannot really comment on the mutation part. It seems like it is only relevant to one cell-line. If you are interested in performing differential expression across the mutation states, then you will have to include it in the design formula anyway.