Question

Replicates for RNA-seq from 1 cell line undergoing different treatments

2

Entering edit mode

6.9 years ago

azayob ▴ 20

Hi everyone! I am very new to the RNA-seq analysis and having quite a dilemma to design my experiment for the RNA-seq, in terms of the number of replicates required for each group of sample, with the consideration of budget constraint.

So here is the design of my experiment: I am deriving different models of stem-like cells (or cancer stem cells, CSC) derived from HepG2 cell line, with a total of 3 models. So for each model, HepG2 cells were treated/cultured in different culture condition to enrich CSCs, meaning at the end of the day the total sample is 4 (1 parental HepG2 cells, 3 CSC-enriched models derived from HepG2 cells).

My questions here, if I do three replicates of these treatment/models, are they considered to be biological replicates? How are these considered biological replicate since they all come from the same cell line, just different treatment for the treated samples, aren't these technical replicates? Because I have read that technical replicates are unnecessary for RNA-seq but my concern is that whether the data will be reliable and publishable. But then again, is it right to do triplicates for RNA-seq from these samples? Plus, another concern are also considering the budgetary constraint of sequencing 12 samples, which are not affordable on our part.

Any thoughts? Sorry if my write-up is confusing, will really appreciate inputs from experts on this matter! Thanks!!!

RNA-Seq Replicates Cell lines • 6.0k views

ADD COMMENT • link updated 6.5 years ago by BIOTECH.DEEPTI911 ▴ 10 • written 6.9 years ago by azayob ▴ 20

0

Entering edit mode

Hi all, I am also doing the RNA-seq studies without replicates. I also got confused because I got the log2 fold change values but not the p-value. Is this fine to select the DEG's based on the fold change rather on the basis of p-value? Please shed some light on this.
Dear Azyob as far as my knowledge is concerned the biological replicate always means growing cell line in three different flasks and sequence them individually.

ADD REPLY • link 6.5 years ago by BIOTECH.DEEPTI911 ▴ 10

8

Entering edit mode

6.9 years ago

i.sudbery 20k

What counts as a technical replicate and what as a biological replicate is not a fixed thing and depends on what your statistical question is and how generalizable you wish to be.

For example let us consider a single gene: gene A and cell types: parent and derived. I'm going to take as if we are interested in a one-tailed result (gene A is greater in derived than parent), but the following applies equally to the one-tailed less than case and the two-tailed case.

If you take multiple aliquots of the same library as your replicates then you are asking: "Are there definitely more reads from gene A in the library from the derived cells than the parent cells". You do not need replicates for this question. We know what the distribution of repeated sampling from the same library should be.

If you take multiple libraries prepared from cells in the same dish as your replicates you are asking: "Do cells in this dish of derived cells definitely have more gene A than cells from that dish of parental cells. We generally regard repeated library preps from the same source material as reproducible enough that this can be modelled as above.

If you take your dish of derived cells and split it into three plates, grow to confluence and then prepare libraries you are asking: "Do cells from this derivation event definitely express more gene A compared to the parental line?"

If you taken your dish of parental cells, split it into six dishes, grow to confluence and then derive in three of those dishes you are asking the question: "Do cell derived from this dish of parental cells definitely have more gene A than the cells which were not derived?"

If you were to take three separate sets of the parental cell line from different source and derive each of them you are asking: "Is gene A always unregulated when I derived cells from this cell line?". Here different source could mean purchasing triplicate aliquots from ATCC or it could mean defrosting separate aliquots from your own freezer.

If you took cell lines derived from three different people with the same cancer, and performed the derivation on each one you are asking: "Is gene A always upregulated when stem cells are derived from this cancer?"

If you were to take 3 different parental cancer cell lines from different cancers and derive each one you would be asking: "Does deriving stem cells from cancer cell lines definitely lead to an upregulation of gene A?

_Practical advice_

What you should do probably depends on how difficult and time consuming the derivation is. The ideal compromise is probably to defrost 3 separate aliquots of HepG2 cells, and perform the derivation of each line three times.

However if the derivation is the sort of thing that is going to take you six months and $1000s to do, then there might be an argument for taking three aliquots of cells from the same derivation, growing them to confluence and using these as your replicates.Just remember that if you do this, the claim you can make of the result is "This is how RNA levels changed when we derived stem cells" NOT "This is how RNA levels change when stem cells are derived from HepG2 cells" (in general).

Replicates from the same cell line often show very little variation, and so in the second case you are likely to get a large number of genes as significant. That is, many genes DID change when you derived stem cells. However, that list will not necessarily be very reproducible if you did the derivation again, and definitely not so reproducible if you did the derivation on a different hepatic cancer cell line.

ADD COMMENT • link 5.2 years ago by i.sudbery 20k

GenoMax · Accepted Answer · 2018-01-08

4

Entering edit mode

6.9 years ago

shawn.w.foley ★ 1.3k

You're correct that technical replicates are not necessary for RNA-seq, however biological replicates are always necessary for statistical analyses. Without replicates you cannot have any p-values.

A technical replicate in RNA-seq, in my experience, refers to taking one tube of RNA and then making multiple sequencing libraries from that tube. This is to control for errors in pipetting/PCR/anything technical during the library prep procedure. The kits available nowadays are so reproducible that this really isn't necessary. Biological replicates, however, are needed.

If I were designing the experiment, I would take the parental HepG2 cells, split them into three separate plates, then when the plates reach the appropriate confluence, I would treat them to generate the CSC-enriched models. Although there is an original single plate of cells, each of these biological replicates have undergone the CSC-enrichment separately, therefore any differences should be representative of this CSC-enrichment process (or biological differences).

You COULD get away with doing duplicates if sequencing 12 samples is absolutely unaffordable, some programs such as DESeq2 and edgeR are able to perform analyses with fewer replicates, but it really is not advisable. You can pool your 12 samples and run them down a single sequencing lane to save money, you'll have lower depth but you can always resequence and pool the raw fastq files if necessary.

I hope all of this is clear, please let me know if you have further questions.

ADD COMMENT • link 6.9 years ago by shawn.w.foley ★ 1.3k

0

Entering edit mode

Thank you so much for your answer.

You can pool your 12 samples and run them down a single sequencing lane to save money, you'll have lower depth but you can always resequence and pool the raw fastq files if necessary.

Actually, I do not get this statement. Does that mean pooling all the sample into one lane? Or the respective control and treatment groups pooled as one therefore I'll have 4 samples to be sequenced?

ADD REPLY • link updated 6.9 years ago by GenoMax 147k • written 6.9 years ago by azayob ▴ 20

0

Entering edit mode

Sentence as written means you pool all samples together and run in one lane. If you don't have enough reads from one lane then you can run more as needed and pool the data.

ADD REPLY • link 6.9 years ago by GenoMax 147k

0

Entering edit mode

A single lane of sequencing provide sufficient capacity to sequence more than one sample. The number of samples you can get on a single lane is dependent on the the model of the machine that will be used for sequencing and the generation of sequencing chemsitry used, but almost no-one these days would use a whole lane of of sequencing for a single sample.

Asumming that you are not going to actually run the sequencing machine yourself, but are getting a service to do it, the service provider will usually handle the step necessary to multiplex samples on a single lane. In fact if they have quoted you per-sample prices they've almost certainly taken some amount of multiplexing into account.

I would usually advocate that more replicates is better than more reads, but 12 samples on a single lane (assuming HiSeq 2500 rapid run chemistry) if almost certainly pushing it too far. I wouldn't normally recommend more than 6, with 8 being a stretch position for the human transcriptome.

ADD REPLY • link 6.9 years ago by i.sudbery 20k

0

Entering edit mode

12 samples per lane should still give you over 10M reads per sample. It's not great, but certainly useable, depending on how much you care about low abundance transcripts.

ADD REPLY • link 6.9 years ago by igor 13k

0

Entering edit mode

Even if 10 million reads per sample is enough, putting 12 samples on a lane will give you 10 million reads per sample on average, not definitely at least 10 million in each sample. You are likely to end up with at least one sample with an unusable small number of reads, and will have to discard that sample (which I suppose may or may not be a problem: some samples replicated three times is better than no samples replicated three times).

ADD REPLY • link 6.9 years ago by i.sudbery 20k

0

Entering edit mode

You should be able to get 250M (or 300M if you are lucky) reads per lane. That gives you a little bit of a leeway. Yes, you may not quantify very well and lose some, but maybe not. It's certainly not an ideal setup. I just meant it's not completely unreasonable.

ADD REPLY • link 6.9 years ago by igor 13k