Tools for finding gene clusters in RNA-seq differential expression data?
3
1
Entering edit mode
7.0 years ago
am ▴ 10

I have a list of deferentially expressed genes and their gene counts obtained via RSEM/EBSeq. My raw data consists of RNA-seq reads from disease-free and relapse patients. When I make a heatmap, I can see that there are clusters of over or under expressed genes within each condition (relapse vs. disease-free). What tools exist to identify these clusters of genes? I believe Bioconductor's ConsensusClusterPlus is the only tool I'm familiar with. What other tools exist?

Note: I'm not seeking gene enrichment or over-representation tools, e.g. TopGo, ConcensusPathDB, DAVID, PANTHER. . . .

RNA-Seq R rsem ebseq cluster • 5.8k views
ADD COMMENT
2
Entering edit mode

If you have large number of samples ( more than 15 samples per condition ) you can use WGCAN as mentioned by others to find modules of genes. But, if you dont have large samples, I would suggest to use simple clustering methods like hierarchical or k-means clustering. You can use different methods like elbow, or gap statistics to identify the possible number of clusters and use that information to create n-number of clusters by k-means.

ADD REPLY
0
Entering edit mode

Hi and thanks,

I have 3 replicates for condition A and 3 for condition B.

ADD REPLY
2
Entering edit mode
7.0 years ago
biofalconch ★ 1.3k

WGCNA might do the trick for ya: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/

If you want to read about the basics, this paper might be useful: https://www.nature.com/articles/nbt1205-1499.pdf

Cheerio

ADD COMMENT
1
Entering edit mode

This was a useful comment. Gets my up vote! Possibly could have been an answer.

ADD REPLY
0
Entering edit mode

Dear @biofalconch Hi,

I have done de novo assembly (Trinity) and DEG analysis using edgeR.

1- Can I use WGCNA for clustering or network analysis of my differentially expressed transcripts/genes?

2- which files of Trinity or edgeR result I can use as input of WGCNA? should I create a special file containing both my Condition-1 and Condition-2 DEGs information in it?

Thank you in advance

ADD REPLY
2
Entering edit mode
7.0 years ago

Just off the top of my head:

Cluster identification is an interesting area and there exists no consistent and standardised way to do it. To be honest, simply generating a dendrogram and cutting the tree at a certain chosen height with cutree can be one of the most effective ways to do it, but this is obviously then biased because it's the human brain that's choosing the clusters indirectly via a height metric. That said, I cannot see your dendrogram and don't know how different these clusters you mention are.

Also be aware that your distance and linkage methods will ultimately affect how clusters are chosen in pretty much all of the methods that I list above. Not many people realise that and thus the default of Euclidean distance with average linkage is usually chosen, even thought these may not be suitable to all data types.

Kevin

ADD COMMENT
1
Entering edit mode
7.0 years ago
Farbod ★ 3.4k

Dear @am, Hi and welcome to Biostars.

Please have a look at "Automatically Partitioning Genes into Expression Clusters", in Trinity website. HTH.

~Best

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 1405 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6