Question

How to combine datasets from different sequencing platforms?

0

Entering edit mode

3.5 years ago

Juan Cordero ▴ 140

Hi,

I have two datasets:

A commercial targeted RNASeq (HTG EdgeSeq) with ~ 1.5k coding genes. There is one single probe sequence (75 bp long) per RNA transcript. It contains samples considered as "treated".
- A public (whole) RNASeq with > 60k genes. It contains samples considered as "control".

I'd like to perform differential expression tests between them, but there are obviously several issues I'd have to deal with. I have already raw counts for each one, so this would be the starting point.

I thought I could do the following:

Subset the "big" dataset, selecting only genes present in the targeted RNASeq.
Use the EDASeq package to correct for length effects with the function withinLaneNormalization, for each dataset independently. I assume this would normalize counts by length, after having set the gene lengths in every dataset differently (the whole RNASeq would consider the real gene lengths, whereas the targeted dataset would consider 75 bp as the gene length).
Create a DGEList object for each dataset from the data generated with EDASeq.
Then cbind these two DGEList objects to combine them into a single one.
Perform differential expression analysis with EdgeR as usual.

I'm a bit suspicious about the rightness of this procedure at several steps. For example, at step 1), wouldn't it change dramatically the distribution of data affecting the library sizes? At step 3) and 4), is it possible to combine DGEList objects with different corrections in gene length through EDASeq? And at step 5), would the results be trustworthy?

I know of course that mixing different sequencing technologies won't yield the best results, but this is the data I possess at the moment.

Thanks

(NOT A DUPLICATE OF Combining two rnaseq platforms in one)

RNASeq Expression Differential Normalization • 1.2k views

ADD COMMENT • link updated 3.5 years ago by Jeremy Leipzig 22k • written 3.5 years ago by Juan Cordero ▴ 140

0

Entering edit mode

Combining datasets implies you have both control and treatment data from different sources

ADD REPLY • link 3.5 years ago by Jeremy Leipzig 22k

score 2 · Answer 1 · 2021-05-31

2

Entering edit mode

3.5 years ago

swbarnes2 14k

RNASeq is sensitive to batch effect; you can't use controls from a totally different experiment carried out at a different lab at a different time by different people.

ADD COMMENT • link 3.5 years ago by swbarnes2 14k