Question

RNAseq cell line as a covariate

0

Entering edit mode

5.3 years ago

gradstudentNew ▴ 50

Hi all,

I am working on some RNA sequencing data from multiple cell lines, but with the same deletion across all the cell types. For example, I have the same deletion and wildtype for iPSC, neurons, and neural progenitor cells, totally 18 samples (3 cases and 3 controls each).

I was wondering what your thoughts were on using cell line as a covariate in DESeq2 and aggregating all the data together?

I have separated the cell lines and looked at the correlation between the Z test statistics, and found little to no correlation between cell lines, so it makes me think I shouldn't try combining all the data. But, I would really appreciate any insight and if any experts had any input.

Thank you so much!

rnaseq deseq2 cellline transcriptome • 1.2k views

ADD COMMENT • link updated 5.3 years ago by Kevin Blighe 89k • written 5.3 years ago by gradstudentNew ▴ 50

score 2 · Answer 1 · 2020-02-26

2

Entering edit mode

5.3 years ago

Kevin Blighe 89k

As with everything in bioinformatics, if you have time, try both approaches. I am not sure that a simple correlation analysis is sufficient to conclude that cell-lines should be analysed together (or not).

I have worked on cell-lines a lot within the past year. Including 'line' as a covariate can help to adjust for the cross-line differences that may exist. However, if your lines are from disparate tissues, like CNS tissues and skin tissues, then that may be too much of a difference for which the model could account.

Also, consider the following: if you normalise the lines separately, then the end results are not quite cross-comparable —certainly not the expression levels— as the lines will not have been normalised together. You could possibly do a meta-analysis at the end, if you choose to normalise them separately, though.

In conclusion: no right or wrong answer here.

Kevin

ADD COMMENT • link 5.3 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks a lot Kevin! Is there any methodology you would suggest for meta-analyzing? I was thinking of using an inverse-variance weighted meta-analysis but I'm unsure whether I would use the overall variance of the Z-statistic or if I should use the logFC SE and find the variance for each gene.

Also, if I had different mutations (all implicated in the same disease) in different genes, but the same cell line, should I use mutation as a covariate? Sorry for all the questions, I'm fairly new to all this.

ADD REPLY • link 5.3 years ago by gradstudentNew ▴ 50

0

Entering edit mode

Meta-analysis is not quite my area but the program that comes to mind is rankProd (in R). I don't know which specific method(s) is / are implemented in rankProd, though.

I cannot really comment on the mutation part. It seems like it is only relevant to one cell-line. If you are interested in performing differential expression across the mutation states, then you will have to include it in the design formula anyway.

ADD REPLY • link 5.3 years ago by Kevin Blighe 89k