Time series RNA-Seq with Time-Matched Controls
2
0
Entering edit mode
8.1 years ago
jmsyl.hong • 0

Hey guys,

I'm new to doing bioinformatics so bear with me.

My experimental design is as follows:

  • 4 timepoints (3d, 7d, 14d, 56d)
  • 2 conditions, each with time matched controls
  • For every condition, there are 5 replicates per time point, 3 replicates for each time-matched control

That is, condition 1 at 3d has n = 5, and control of condition 1 at 3d has n = 3 etc...

The core facility provided the complete cufflinks output and provided a merged.gtf file that was made from cuffmerge.

How do I:

  1. Find DE genes between every time point of each condition and its time-matched control? Do I do individual EdgeR at every time point or is deseq or cuffdiff preferred?
  2. Find DE genes between the conditions (should I just filter for the genes DE to controls as found in #1?)
  3. Find the biological pathways (GO/KEGG) that are distinct between the two conditions?
  4. How do I then visualize these distinct networks (e.g. put them into figures), cytoscape?

Package names would help a lot, I have some proficiency in R and Java.

Thanks so much in advance!

RNA-Seq • 3.4k views
ADD COMMENT
2
Entering edit mode
8.1 years ago
Ron ★ 1.2k

1.It really depends on you what package you would like to use.You can use either DEseq or EdgeR. Check out this post on differential expression analysis: Rnaseq Differential Expression

2 . No,Take all the timepoints of one condition vs all timepoints of other condition and then do differential expression again.

3 . For pathway analysis,GSEA from Broad Institute is one of the most preferred methods:http://software.broadinstitute.org/gsea/msigdb/ You can choose different signatures to look for enrichment in the comparison.

4 . There are different softwares available such as circos ,cytoscape Gene Network Construction... Web Based Tool

ADD COMMENT
0
Entering edit mode

Thanks so much, Circos looks amazing. I played with Cytoscape a bit but found that it was really difficult to get what you see in the publication figures (they look amazing, but the default layouts aren't that great in cytoscape). Also, is there a reason for not filtering them out in 2, because I assume I would be taking the output of 2 into steps 3 and 4 (i.e. what is the point of 1?)

Thanks again!

ADD REPLY
1
Entering edit mode

The differentially expressed genes are based on the comparisons you make.In 1) the genes are reported that change between different time points,whereas in 2) the genes are reported that change between different conditions irrespective of time points.

ADD REPLY
2
Entering edit mode
8.1 years ago

1)Find DE genes between every time point of each condition and its time-matched control? Do I do individual EdgeR at every time point or is deseq or cuffdiff preferred?

There are optimized workflows for time course experiments, googling will give you quite a lot of hits. Example: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments

2) Find DE genes between the conditions (should I just filter for the genes DE to controls as found in #1?)

You can test conditions by taking all timepoints for condition one versus all timepoints for condition 2, but specify those timepoint-groups in the model for differential expression analysis

3) Find the biological pathways (GO/KEGG) that are distinct between the two conditions?

You can use GSEA or use tools like Enrichr to analyze overrepresented pathways

4) How do I then visualize these distinct networks (e.g. put them into figures), cytoscape?

Enrichr provides some basic networks, cytoscape is another good option although it probably requires some experience to get nice figures. You can also find some easy but informative visualizations (and code) in this post from getting genetics done

ADD COMMENT
0
Entering edit mode

Thanks!

Would you still use deSeq2 for 2)?

ADD REPLY
1
Entering edit mode

I would use DESeq2, edgeR and/or limma-voom. The results and statistics are quite similar.

ADD REPLY
0
Entering edit mode

One more question, if I'm using the original .bam files given, do I use the merged.gtf (from cuffmerge) to get the raw counts or use the up-to-date rat genome gtf file from ensembl (what are the differences?) and would I need to perform any normalization/filtration of the raw counts prior to inputting it into deseq2?

ADD REPLY
1
Entering edit mode

Interesting suggestion, and out of curiosity I would try both of the gtfs. I have no experience with cuffmerge but can't think of anything problematic right now with using the obtained gtf.

You absolutely shouldn't normalize your data prior to deseq2 as it expects raw counts. Low counts are filtered out by default so that's also something you don't have to worry about.

ADD REPLY

Login before adding your answer.

Traffic: 1836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6