Question

RNA-seq analysis of virally infected cells

0

Entering edit mode

6.2 years ago

Adrian Pelin ★ 2.6k

I have a human cancer cell line infected with 2 different vaccinia pox viruses (~200 kb dsDNA genome, 200 genes).

At 12 hours post infection I harvested my RNA and send for RNA-seq. It looks like for strain_A, 40% reads are viral (60% host) while strain_B 60% reads are viral (40% host).

This makes my analysis as I want to answer 2 different questions: 1) How are viral genes differently expressed between strain_A and strain_B 2) How does the host (human cancer cell) respond to either strain?

So far, to answer question #1 I mapped to viral genome only and to answer question #2 I mapped to human genome only.

It has been suggested that I map to both host and virus genomes at the same time. This will lead to a different levels of gene normalization.

What are your thoughts?

RNA-Seq virus host • 2.4k views

ADD COMMENT • link updated 6.0 years ago by Biostar 20 • written 6.2 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

Just curious, but do you have multiple replicates for each strain?

ADD REPLY • link 6.2 years ago by spvensko ▴ 240

1

Entering edit mode

Duplicates, and multiple time points, but 12h is a more crucial one for us.

ADD REPLY • link 6.2 years ago by Adrian Pelin ★ 2.6k

score 4 · Accepted Answer · 2018-08-28

4

Entering edit mode

6.2 years ago

Carlo Yague 8.9k

Yes, it is a good idea to map to both host and virus genome at the same time. It is more efficient and reads that can map to both viral and human genome will be identified and dealt with approprietely – for instance by setting the mapping quality to 0, which is usually interpreted as an "ambiguous" mapping.

But mapping on both genomes doesn't mean that you have to take all genes into account when normalizing. Regardless on how you map, you have at least three options to normalize the expression of the viral genes. Choosing one option over another will depend on what you can safely assume:

1. Normalize the viral genes on the viral genes: To do that, you have to assume that there is no global change in viral gene expression between your conditions. This might be possible if the two strains are not too different in the context of infection.

2. Normalize the viral genes on the human genes: You have to assume that viral charge is the same in the two conditions (which might have been experimentally controlled) and that there is no global change in the human gene expression between the two conditions.

3. Normalize the viral genes on both human and viral genes: You have to assume all of the above, or that any change in global gene expression in one species is balanced in the other species. I think that this is harder to assume so I would not recommend it.

(4. Normalize on a spike-in): You don't have to assume anything (except that the spike-in was properly mixed with the samples). But if your data is already generated and is "spike-in-less", then you have to choose another option.

ADD COMMENT • link 6.2 years ago by Carlo Yague 8.9k

0

Entering edit mode

What do you mean by global change? You mean that overall there is more viral genes in one condition compared to another, for instance more viral reads?

ADD REPLY • link 6.2 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

Yes. For instance, we could imagine that if one of the viral strain expresses all its genes to a much higher level than the other strain (which could results in higher virulence), then if you normalize on the viral gene expression, you will not be able to see that global increase. On the contrary, if only a few genes are differentially expressed, then you will be able to see it with that kind of normalization.

Note that this is also a recurrent problem in cancer research: most cancers results in global misregulation of the transcriptome with cancer cells often containing 2-3 times more RNA than healthy cells (1). Hence the need for spike-in normalization.

(1) Revisiting global gene expression analysis

ADD REPLY • link 6.2 years ago by Carlo Yague 8.9k

0

Entering edit mode

I do not mean to advertise products here, but when you say spike-in normalization, do you mean something like this: https://www.thermofisher.com/order/catalog/product/4456739 ?

Are there better, maybe cheaper options out there?

ADD REPLY • link 6.2 years ago by Adrian Pelin ★ 2.6k

1

Entering edit mode

This is certainly one option. I remember Devon Ryan discussing some of the issue of the ERCC spike-in in this post. Other options are the sequin spike-in or more generally, the addition to the samples of a fixed amount of foreign RNA.

ADD REPLY • link 6.2 years ago by Carlo Yague 8.9k