RNA-seq to look at the Effect of Viral Infection
1
0
Entering edit mode
5 weeks ago
Nicholas • 0

Hello,

I am doing an RNA-seq analysis of virally infected cells. My sample table looks something like this

Sample Dose hours
1 0 24
2 0.1 24
3 5 24
1 0 48
2 0.1 48
3 5 48

My question is about the effect of the virus on the analysis itself. In the dose 0 and dose 0.1 samples there are few to no viral transcripts, so almost all of the sequencing depth went to human genes. Whereas in the dose 5 samples, 50% or more of the reads map to viral genes. I am not interested in the viral genes at all, merely the effect of the virus on the host cell.

I used STAR to align to the human genome, so most of the transcripts that map should be human, except possibly for some which are similar.

But because half of the transcripts in the 5 samples are viral, I'm worried the effective difference in sequencing depth on the human genome will cause an artificial difference in differential expression, even after normalizing based on estimated size factors. On a PCA plot the samples with the higher number of viral transcripts (which, depending on the viral species, is sometimes independent of the initial viral dose), the samples with more viral transcripts clearly separate out.

When I do a basic contrast between the 5 dose and the zero dose, I get almost 30k genes as DE. That should be most protein coding genes or about half of all genes.

How can I be sure the difference is a biological one and not an artifact?

virus DESeq2 differential-gene-expression • 558 views
ADD COMMENT
0
Entering edit mode

How many reads do the samples in the end have, after all filtering? It becomes a problem when counts are so low that this group has many zerops due to insufficient depth. Then you would need to resequence. You can color your PCA by depth and see whether depth is a major driver of variation as well.

ADD REPLY
0
Entering edit mode

Thanks for getting back to me ATpoint (ATpoint). My apologies for not responding earlier. I took some vacation plus another project.

Anyway, there are about 9 million uniquely mapped reads on the lowest mapped one. And about 52 for the highest. The PCA idea makes sense, but the depth correlates very much with the number of viral reads, so it's hard to determine whether depth is the issue or legitimate viral activity.

It's unclear to me how DESeq2 estimates size factors. Does it look for particular genes and normalize based on them?

When I examine the dds object, the number of genes with zero counts is inversely proportional to the size factor. So samples that had almost all human reads and a much higher size factor have far more genes with all zeros in their counts. Genes with fewer human reads had far fewer zeros. That seems very strange to me. But it does account for the differences we see in expression.

ADD REPLY
0
Entering edit mode

DESEq is going to find the gene whose expression is in the "middle" (using the geometric mean of each gene across all samples) and assume that gene is constant across all samples, and normalize to make that constant. That assumption may not be valid in your samples.

ADD REPLY
0
Entering edit mode
17 days ago
Asaf 10k

I would initially map each sample to the viral genome, remove the reads that map to the virus and map the rest against the human genome. Also, how did you normalize the reads?

ADD COMMENT

Login before adding your answer.

Traffic: 2281 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6