Question

In need of guidance: RNA-seq analysis

4

Entering edit mode

4.9 years ago

jonathanU ▴ 40

Background: I have been learning how to analyze RNA-seq data for my research lab - our model organism is Arabidopsis thaliana. I have ~2 years of experience working with R, and feel comfortable learning how to use new functions/packages. I can comfortably read and edit scripts written by others, but am still practicing writing my own. The steps that I have completed so far for RNA-seq analysis have been performed either in R or with a user-friendly online platform. I have only some basic knowledge of Python.

I would like advice on how to proceed with my RNA-seq analysis. I have specific ideas in mind, but would appreciate receiving some direction along the way. Additionally, if something I describe is pure nonsense, I would appreciate someone correcting me.

Experimental design: Four different groups (genotypes), with and without treatment, three biological replicates each (twenty four samples total, if my math is correct)

Steps I have completed:

I have completed the initial steps of aligning the reads to the transcriptome (I used Kallisto for pseudo-alignment) and quantification of transcripts.
I have also performed differential expression analysis using the Sleuth package in R.
For visualization of the results, I am using Integrative Genomics Viewer (IGV)

Now that I have information about differential expression, I would like to make sense of the data (i.e. compare within the same group with or without treatment, and between groups after treatment). I have some ideas/questions that I will post below.

I am familiar with Gene Set Enrichment Analysis using Gene Ontology terms, but I am also interested in what I believe is called pathway analysis. I have downloaded annotations for Arabidopsis metabolic pathways from AraCyc, and I'm sure there are other databases such as KEGG. Could someone point me in the direction of how to accomplish this?
After applying the pathway annotations, I would like to be able to visualize the results somehow, which might help me be able to compare between the different groups. I am not yet sure how to reach this point, but I believe this is the next step.
Also, is it common to only use annotations for a single database at a time, or can two annotation databases be combined? I have done a little reading on an algorithm known as SetRank which exists for this purpose. However, I would like to learn the 'usual' methods before others.
Ultimately, I would like to be able to perform what I believe is called correlation network analysis. For example, I would like to be able to create a visual network showing how transcription factors are associated with metabolic pathways. I've read a few papers that have done this using Cytoscape - metabolic pathways are represented by circles and transcription factors by triangles. The closer the triangles are to the circles, the 'stronger' the association - in addition, lines running between the transcription factors and metabolic pathways are colored red or blue to represent positive or negative correlation. I do not currently understand how to produce the data required for input into Cytoscape, but I believe I am capable of learning given some direction.

Most of what I have learned about analyzing RNA-seq data, I've learned from reading journal articles and applying them to my lab's data. However, the gaps in my knowledge and my learning methods out of order have left me without a sense of direction. I would be extremely grateful for any advice, resource links, or general clarification.

rna-seq rna RNA-Seq R • 1.7k views

ADD COMMENT • link updated 4.9 years ago by Barry Digby ★ 1.3k • written 4.9 years ago by jonathanU ▴ 40

score 3 · Answer 1 · 2020-02-13

Hi Jonathan,

GSEA is a form of pathway analysis.. depends how far you want to go. Here is an example of how to use fgsea (fastGSEA) here by Stephen Turner https://stephenturner.github.io/deseq-to-fgsea/#using_the_fgsea_package.. For human data, I download gene set files (.gmt) here, not sure where to find Aradopsis files. Here is a further example of fgsea where I automate the output enrichment plots. I also format expression matrices into GSEA format (GSEA is the full scale downloadable desktop GUI version) if you want to try that.
Enrichment plots covered above in step 1 in the 3rd link. To make a circos plot showing the overlap of your DE genes and which pathway they link to have a look at my markdown file here. It's the last block of code, you might need to 'massage' your data into the correct format.
not sure sorry, but assuming you are combining 2 databases, they must have a common column or key to match if that makes sense.
Here is a tutorial on how to prepare a file for cytoscape link (and its corresponding youtube lecture). However, this lad tells you to filter the expression matrix by capturing the differentially expressed genes. This directly contradicts what Steve Horvath and Peter Langfelder advise in their packages FAQ section 2. Try filtering by highly variable genes instead?

Here is a decent post on the analyzing genes in cytoscape. I was unaware that a lot of cytoscapes functionality comes via apps you install via the gui!

p.s all links above are compatible with DESeq2.

score 0 · Answer 2 · 2020-02-13

0

Entering edit mode

4.9 years ago

tothepoint ▴ 940

Explore the step by yourself. You will able to explore more. RNA Seq Analysis Pipeline

ADD COMMENT • link 4.9 years ago by tothepoint ▴ 940