Question

Do varying amounts of input RNA for library preperation have an effect on differential gene expression analysis?

0

Entering edit mode

5.3 years ago

c.kohli • 0

Hey everyone!

I hope that question is allowed as it is on the intersection of wet-lab and dry-lab.

We have prepared Illumina libraries with the RiboZero kit, however we noticed afterwards that the normalization was messed up so different amounts of total RNA went into the library prep. Effectifely, we added between 20 ng and 500 ng of total RNA as input to the library preparation due to the miscalculation.

I am wondering if there will be an observable effect on the low amount samples when we perform differential gene expression analysis between the samples? Will we for example recover less genes? We are of course going to normalize the libraries and sequence them to the same depth. I think that even 20ng should be enough to capture the diversity of the transcriptome, but I was hoping someone had maybe experience with lower than 100ng input samples.

Thanks a lot in advance.

RNA-Seq sequencing • 2.4k views

ADD COMMENT • link updated 5.3 years ago by Kristoffer Vitting-Seerup ★ 4.1k • written 5.3 years ago by c.kohli • 0

0

Entering edit mode

Intuitively I would argue that it will not make a notable difference but the library with the 20ng will require slightly more PCR amplification and might therefore suffer a bit more from PCR bias. You can of course after normalization do quality control via e.g. PCA or MDS plots and then see if there is a sign of batch effect, but I would not expect it especially if RNA was of high quality (RIN) as the kits cover a wide range of inut amounts. I cannot really argue for RNA-seq but for ATAC-seq where I made library from the same cell populations of different donors ranging from roughly 10.000 to 50.000 cells with no sign of cell-number-based batch effect at all. This might happen if you process the exact same specimen with different input amounts but I assume that the biological variation will dominate these slight differences in library prep. Still, I have no hard proof for my statement towards RNA-seq.

ADD REPLY • link 5.3 years ago by ATpoint 85k

0

Entering edit mode

Thanks for the input! Even without proof it is at least comforting to read :)

ADD REPLY • link 5.3 years ago by c.kohli • 0

score 0 · Answer 1 · 2019-08-23

0

Entering edit mode

5.3 years ago

Kristoffer Vitting-Seerup ★ 4.1k

The RNA input ~~definitively have an effect~~ does not have a large effect - see figure 4+5 of this article. ~~The question is whether you can normalize/model yourself out of it afterwards~~ I would however still recommend paying attention to it during EDA and see if any clustering appears as RNA concentration does have an effect on the total number of reads sequenced per library. How are the low vs high concentrations are distributed across the two (or more) conditions you want to do differential expression for?

ADD COMMENT • link 5.3 years ago by Kristoffer Vitting-Seerup ★ 4.1k

1

Entering edit mode

How does this line from the abstract match your statement that it definitely has an effect?

Our comprehensive comparison results suggested that different cDNA library storage time, quantity of input RNA, and cryopreservation of cell samples did not significantly alter gene transcriptional expression profiles generated by RNA-seq experiments.

ADD REPLY • link 5.3 years ago by ATpoint 85k

0

Entering edit mode

Their lowest input is however 100ng, which really isnt that low.

ADD REPLY • link 5.3 years ago by c.kohli • 0

1

Entering edit mode

I might have been to fast in interpreting the results - the effects are quite small and I've updated my answer accordingly. Thanks for the feedback :-)

ADD REPLY • link 5.3 years ago by Kristoffer Vitting-Seerup ★ 4.1k

0

Entering edit mode

They are quite spread, with some samples from the same condition getting one time more, then less input. However the experiment is with human primary cells, so the variation is going to be high enough already, which makes me wonder if a model would just think this is donor to donor variation.

ADD REPLY • link 5.3 years ago by c.kohli • 0

0

Entering edit mode

Is the experiment maximally confounded like one cell type always had 20ng while the other always had 500ng or is this mixed between the groups. If the latter you can then explore by PCA if the "low" samples show signs of batch effect.

ADD REPLY • link 5.3 years ago by ATpoint 85k

0

Entering edit mode

No, some got more than 100ng for each cell type, some of them got less. So, would you then regress out the effect observed in the PCA?

ADD REPLY • link 5.3 years ago by c.kohli • 0

0

Entering edit mode

If there is evidence for a batch effect you could add the DNA amount as a categorical covariate into the design of the differential analysis. This is (from what I understand) preferred to directly regressing-out the effect from the counts e.g. by RemoveBatchEffect from limma when it comes to differential analysis.