Question

Differential expression analysis with RNA-Seq samples that vary in depth

0

Entering edit mode

8.6 years ago

Satyajeet Khare ★ 1.6k

Biostars,

I am performing differential gene expression analysis between "control" and "treated" samples that differ 2-3 fold in their depth (control samples are half to one third in number of reads as compared to treated samples). If I perform DE analysis using the old Tuxedo protocol, I do not observed many differentially expressed genes. Not even those that have been used for sample validation before subjecting them for sequencing.

If I load Bigwig files (relatively better normalized) for these samples onto the genome browser, I can see expected difference in reads on the genes of interest. In order to normalize samples for the depth of sequencing, I am trying Samtools view -s to subset the .bam files of samples to similar sizes. But these subset files ain't compatible with Cufflinks since they lack the EOF marker.

I am wondering if such normalization is a good idea and if yes, how to get around this problem of incompatibility with Cufflinks.

Thanks a lot for your help in advance!

RNA-Seq Hisat2 Cufflinks Depth of sequencing • 2.8k views

ADD COMMENT • link updated 8.6 years ago by Devon Ryan 105k • written 8.6 years ago by Satyajeet Khare ★ 1.6k

2

Entering edit mode

I would expect that htseq-count/featurecounts followed by DESeq2/edgeR/limma-voom would be able to deal with this difference in depth, but that's not what you ask for.

The old tuxedo pipeline isn't considered "the best tool in the shed" anymore.

ADD REPLY • link 8.6 years ago by WouterDeCoster 48k

0

Entering edit mode

Okay. To make things worse, there is only one sample per group (no replicates). limma-voom cannot calculate Common Dispersion and hence Tag Dispersion for this reason. There might be a way out, but whats the best option of the three?

We can test differentially expressed genes that will come out of the analysis but biological replicates of RNA-Seq are not possible for now.

Thanks for the help!

ADD REPLY • link 8.6 years ago by Satyajeet Khare ★ 1.6k

2

Entering edit mode

If you have unreplicated data then all of the presented options are equally crappy. GPower is supposed to be slighty better, but honestly you'd be better off not wasting your time on this dataset.

ADD REPLY • link 8.6 years ago by Devon Ryan 105k

0

Entering edit mode

Okay. Thanks a lot for all the help.

ADD REPLY • link 8.6 years ago by Satyajeet Khare ★ 1.6k

score 1 · Answer 1 · 2017-01-13

1

Entering edit mode

8.6 years ago

Devon Ryan 105k

cuffdiff does an appropriate normalization (the same one as DESeq2, if I recall correctly) internally, so please don't subsample. Having said that, as WouterDeCoster wrote, you're strongly encouraged to not use cufflinks/cuffdiff, but rather one of the standard R-based tools.

ADD COMMENT • link 8.6 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you! How about the new Tuxedo pipeline? The Ballgown seem to rely on countMatrix. For small sample sizes (n < 4 per group), Balldown recommends regularization using the limma anyway.

Best

ADD REPLY • link 8.6 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

I've never used it, but given who wrote Ballgown it should be much better.

ADD REPLY • link 8.6 years ago by Devon Ryan 105k