Question

Differential Expression Using Different Libraries (Truseq, Nextera)?

2

Entering edit mode

11.0 years ago

Nick ▴ 290

I have 3 replicates for 2 conditions (6 samples) which have been sequenced using different protocols (nextera, truseq). Do I just need to use the library type as a blocking factor when defining the model for the differential expression?

I am planning to use edgeR but, I reckon, the same logic would apply to DeSeq, too.

rna-seq • 5.5k views

ADD COMMENT • link updated 11.0 years ago by Michele Busby ★ 2.2k • written 11.0 years ago by Nick ▴ 290

1

Entering edit mode

Cross-posted here (and as I mentioned there, using the library type as a factor would indeed be the normal solution).

ADD REPLY • link 11.0 years ago by Devon Ryan 104k

score 4 · Answer 1 · 2013-11-14

It is difficult to know specifically what will happen with two different protocols unless you sequence the same sample with both methods. We do that a lot with K562 samples but I don't seem to have a Nextera sample handy to check for you.

My guess is that if you prepared the exact same sample with Nextera and TruSeq you would get a higher variance than if you compared Nextera and Nextera. Some protocols are very close to one another but others do not look like the same sample. e.g. http://michelebusby.tumblr.com/image/62718357939 Since the variance may be raised with the TruSeq data included you might not actually get a very great increase in power by adding the third replicate, and you could even lose power.

If you make scatter plots with each sample compared to one another would be the first place to look.

There is usually bias in the genes that have more variability in samples, i.e. it's non-random noise. Usually the high gc genes vary more by protocol than low GC and sometimes it is the short reads that bounce around more.

If you put all three samples through EdgeR you might get a screwy variance fit. EdgeR and DeSeq both use a uniform or quasi-uniform variance calculation, which means they basically say all genes at a given depth have the same variance. But then the call is based on the difference in the means. The means might bounce around more for some genes so I would expect you to be introducing some bias into what you are calling. Without an experiment looking at Nextera vs TruSeq it is difficult to correct for that bias in downstream analyses.

I might devise a different design where you put the Nextera samples through the EdgeR by themselves and then confirm the direction of the calls against the TruSeq data separately. You'd have to think on the stats but I think that would use all the information without introducing too much bias in your calls.

Edit: Joshua Levin publishes this type of work a lot. If you look at his papers they give a good overview of what happens with different protocols, e.g. http://www.nature.com/nmeth/journal/v7/n9/abs/nmeth.1491.html http://www.nature.com/nmeth/journal/v10/n7/full/nmeth.2483.html

score 1 · Answer 2 · 2013-11-12

1

Entering edit mode

11.0 years ago

Rory Kirchner ▴ 10

How are the different preps distributed? Is one condition made with one kit and the other with another kit? Or are some of the replicates from each condition made with the different kits?

ADD COMMENT • link 11.0 years ago by Rory Kirchner ▴ 10

0

Entering edit mode

Each kit is used for equal number of control/treatment samples, i.e. truseq for one treatment + one control, nextera for 2 treatments and 2 controls.

ADD REPLY • link 11.0 years ago by Nick ▴ 290

0

Entering edit mode

Great-- adding it as a blocking factor is the way to go then. I would look at a MDS plot of the samples and see how they clustered together; if it looked like the library prep wasn't influencing the clustering at all, I'd also consider dropping it from further analyses. I'd also take a look at any genes that are called DE between the two library preps too, you might be able to glean some information regarding how the different preps are affecting your experiment.

ADD REPLY • link 11.0 years ago by Rory Kirchner ▴ 10