Question

Too many RNAseq samples. What to do?

0

Entering edit mode

4.3 years ago

english.server ▴ 300

Dear Biostarsers!

I wanted to obtain TCGA vs GTex differentially expressed genes using either DESEQ2 or EdgeR; however, I cannot use Amazon and Galaxy failed to do the job, and the amount of RAM on my laptop didnt allow the computation. I thought using non-bayesian techniques might solve my problem. Should I give it a try? What Nonbayesian R packages are commonly used? On the other hand, Is it possible to filter genes to keep only 20% or so top variable ones without prior estimates? I am starting my analysis with normalized TCGA+GTEx count data. Thanks in advance.

RNA-Seq deseq2 edger • 1.2k views

ADD COMMENT • link 4.3 years ago by english.server ▴ 300

2

Entering edit mode

I don't think that these methods use much more RAM than the genes x counts matrix, so I'm not sure that the method used is going to make that much of a difference. 20,000 individuals vs 20,000 genes is just a big matrix. Can I ask why you want to do this? Are you aware that comparing TCGA to GTEx is mainly like to leave you with batch effects rather than biologically differential genes?

ADD REPLY • link 4.3 years ago by i.sudbery 20k

0

Entering edit mode

Thanks. That would be of great advantage if I could compute the amount of RAM for my work. I am going to compare whithin TCGA DEGs and TCGA vs GTEx DEGs; ie once compare cancerous vs normal tissue present in TCGA data and then the TCGA cancerous tissue with GTEX. For the batch effects, no actually I wasnt aware! Thanks a lot.

ADD REPLY • link 4.3 years ago by english.server ▴ 300

0

Entering edit mode

My guess is that you are going to find that within TCGA DEGs are very different from TCGA vs GTEx DEGs, and many of those difference will probably be because GTEx was prepared by different people, using different protocols on different days to TCGA. But it will be interesting to find out.

ADD REPLY • link 4.3 years ago by i.sudbery 20k

2

Entering edit mode

I wanted to obtain TCGA vs GTex differentially expressed genes using either DESEQ2 or EdgeR ... I am starting my analysis with normalized TCGA+GTEx count data

You should be starting with the raw counts for those packages.

If you are concerned about computational resources, the limma-voom or limma-trend pipeline should be less intensive. See this discussion: https://support.bioconductor.org/p/112573/

ADD REPLY • link 4.3 years ago by igor 13k

1

Entering edit mode

Not enough info to answer here. How are you trying to do this? DESeq2, edgeR, limma? How are you generating counts? What exactly is the issue(s) you're running into?