Please advice on an experiment set-up for RNA-Seq
1
0
Entering edit mode
6.7 years ago
User 4014 ▴ 40

Dear Biostars experts

I am working with fungal tree disease and want to do RNA-Seq for one of on-going experiments. The goal is to see how different tree genotypes (tolerant and susceptible) respond to different fungal genotypes (virulent and avirulent) and vice versa. I am thinking about the following set-up, which will land me a total of 72 samples for RNA-Seq.

  • 4 tree genotypes with 2 individuals each
  • 3 time points (0, 3 and 7 days after infection)
  • 3 fungal genotypes

At the moment it is possible to sequence all samples at 30 million reads/sample, but I am curious if it is sufficient to extract crucial information, especially for the fungal infection? It is also possible to reduce tree genotypes and sequence at a deeper depth (50 million reads/sample). May I have your opinion which strategy sounds better, please? The tree genome is ca. 800 Mb and the fungal one is ca. 65 Mb.

Thank you very much in advance and Happy Easter!

RNA-Seq next-gen • 2.3k views
ADD COMMENT
1
Entering edit mode

What is the status of genomes you are interested in? Reasonable/Better/Best in terms of quality/completion?

ADD REPLY
0
Entering edit mode

Hi genomax, Thanks for your quick reply. The tree genome is just a draft but the fungal one is quite okay.

ADD REPLY
0
Entering edit mode

But is the draft reasonably complete? Are you able to judge size of the expected transcriptome? Are there ploidy issues that may need to be taken into account?

ADD REPLY
0
Entering edit mode

Sorry, I misunderstood your question. For the tree, its reference genome has 165x coverage. I did not have a chance to dive into it yet since I normally work on its fungal counterpart. For ploidy, I understand that it is hexaploid; so yes, there could be a ploidy issue.

ADD REPLY
1
Entering edit mode

which will land me a total of 72 samples for RNA-Seq.

From 3*3*4 = 36 and

4 tree genotypes with 2 individuals each

I understand you want to perform this huge experiment with only 2 biological replicates per treatment. Don't. Please increase the number of biological replicates. If you had a simple design with two treatments, I would say the bare minimum is 3 biological replicates (and I would hesitate, because I think 3 is already too few). For such a complex design, I would suggest 5 or 6 biological replicates as minimum.

The sequencing depth is more critical for transcript-level expression analysis, if you will perform gene-level analysis, 20 million should already be sufficient. Increasing the number of biological replicates improves both analyses.

ADD REPLY
0
Entering edit mode

Thanks for your reply h.mon. Sorry, I am quite new to RNA-Seq. Please correct me, I understand that each biological replicate should be sequenced individually so that's why I ended up with 72 samples?

ADD REPLY
2
Entering edit mode

Yes, each biological replicate should be sequenced individually. From your description of the experimental design, I understood you had two biological replicates per treatment, so 72 samples total.

I think you should have less treatments and more biological replicates per treatment. In fact, I agree with kpr: design an experiment with one comparison only. If you really want to keep your complex design, do it properly and really increase the number of biological replicate - and be prepared to sequence loads of data and spend loads of money.

Here are two manuscripts that deal with samples sizes for RNAseq:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878611/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5817962/

Takeaway message from both: increase number of biological replicates.

ADD REPLY
0
Entering edit mode

Thanks for your suggestions. I will increase the replication to three. However, I was wondering what about using each individual for both treatment and control since I expect each individual will have at least 5 twigs. I can then use only 6 individuals from each tree genotype per each fungal genotype (i.e., 3 individuals each at 0 and 5 days post inoculation, and for each individual two twigs each will be inoculated with either a fungal genotype or control). Do you think it is okay? I also posted the same question to kpr, but I would be interested in hearing your opinion as well.

Thanks again and happy Easter!

ADD REPLY
1
Entering edit mode

Since you are doing time series experiment, ideally you will need to make fungus-free and tree-free controls on each time point (or at least for the last time point) to be able to differentiate between the effect of time from inter-species interaction.

ADD REPLY
1
Entering edit mode

I'm a statistician who works with biological data. My advice is simplify your experiment more. Do it with the appropriate number of replicates (absolute minimum 3, but I always recommend more), and focus on one thing at a time. Maybe keep the tree genotype constant, and just change the fungal genotypes and look at the results after x amount of days (only one time point). Use biological reasoning to make these desicions.

ADD REPLY
0
Entering edit mode

As a rule of thumb the very minimal number of samples per group is 3, but if you can afford it it's better to go for 5. In a perfect world you would do 10 per group but that's rarely affordable.

ADD REPLY
0
Entering edit mode

Hi WouterDeCoster, Thanks for your reply. I wish I could, but in reality I am not sure I can afford even 3 individuals/genotype of tree/treatment. It is very difficult to prepare them, and the trees I have at the moment they are prepared 2 years ago. Do you see any problems in case only 2 individuals/genotype is used?

ADD REPLY
2
Entering edit mode

There are a few problems with just having two replicates. (1) Replication is a cornerstone of most statistical inference procedures. The purpose of replication is to find the variability between two groups. A change in expression in one group is only significant if the difference between the two groups is large compared to the variability within a group. We calculate the variability within a group based on sample size, the bigger the better. (We can't calculate variability with just one sample, and the variability of two just measures how different two individuals are from each other). (2) Another reason for having more samples is the risk of a failed sample at some point in the experiment. Which is why most people say we need at least 4 or 5. (3) In addition, in some experiments the level of expression varies greatly between individuals within the same group. In this case, 4 or 5 may still be insufficient. To truly capture the variability you'll have to do several more samples to get a good idea about your data. Having only a few samples would like provide misleading information. (4) On a final note about sample sizes, you will also need to consider the statistical power. The power of your tests (ability to determine the truth) increases as the sample increases.

Check out the book "RNA-seq Data Analysis: A Practical Approach", it has a fair amount of information for just getting started. :)

ADD REPLY
0
Entering edit mode

Thanks a lot! I will read the book, but it will take some time before I understand it by heart. In the meantime, I hope that you do not mind me asking more questions.

I thought about increasing the number of replicates to three per treatment and reduce time-point to two (0 and 5 days post inoculation). However, I couldn't stop wondering what about using each individual for both treatment and control since I expect each individual will have at least 5 twigs. I can then use only 6 individuals from each tree genotype per each fungal genotype (i.e., 3 individuals each at 0 and 5 days post inoculation, and for each individual two twigs each will be inoculated with either a fungal genotype or control). Do you think it is okay?

Thanks again and happy Easter!

ADD REPLY
0
Entering edit mode

You are describing a matched-pairs design (specifically the type were study participants are measured twice for two different conditions). There are definitely statistical tests we can do with a matched-pairs design, but I haven't seen it used for RNA-seq (I'm new to bioinformatics so someone please correct me if I'm wrong). Most analysis methods that I have seen uses the negative binomial model which doesn't account for matched pairs, and just ignoring matched-pairs when you have them normally leads to biased results. All that being said, I think I know a few statistical tests that would be appropriate :).

One other thing that might be problematic though, with this matched-pairs design it looks like you will do RNA-seq on the same tree at the same time, just different twigs. I may be wrong, but wouldn't we expect the RNA to be the same/similar for the whole organism?

Last comment. Again, my biology knowledge is limited. If you have a tree with no inoculation, and it continues to have no inoculation, I would think we wouldn't expect much to change. There is another type of matched-pairs experiment were we take repeated samples of the same subject/individual. We use the same organisms and compare the difference before and after treatment (matched pairs longitudinal study). This may be a good approach.

ADD REPLY
1
Entering edit mode

If you have a tree with no inoculation, and it continues to have no inoculation, I would think we wouldn't expect much to change.

Not necessarily. There could be normal changes in the tree unrelated to the infection. You would want to account for those during analysis in treated samples.

I may be wrong, but wouldn't we expect the RNA to be the same/similar for the whole organism?

No. That is why people are doing single-cell RNA seq.

ADD REPLY
0
Entering edit mode

I may be wrong, but wouldn't we expect the RNA to be the same/similar for the whole organism?

Even without single-cell RNA seq: expression varies a lot between tissues.

ADD REPLY
0
Entering edit mode
6.7 years ago
taoi2 ▴ 40

I think kpr's comment is good suggestion. If you think time course experiments is much important, I recommend to you that sampling should be done with a constant time interval: "0, 3 and 6 days after infection" instead of "0, 3 and 7 days after infection". In pharmacological statistic field, the theoretical models for time course experiments are established. I recommend you to consult to the statisticians in this field.

ADD COMMENT

Login before adding your answer.

Traffic: 2027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6