How to extract from original counts genes with > or equal 2 fold change and FDR <0.001?
1
0
Entering edit mode
7.1 years ago
Agamy89 • 0

I am analyzing RNA-Seq data in a time course dependent manner (5-time points). I set up three replicates for each time point, so total samples are 15. I want to apply this data to fuzzy soft clustering analysis to figure out the cluster genes change according to the time points and explore the enriched GO to each cluster. For doing that, I want to filter the data to focus the analysis on the highest expressed genes. I have removed the genes with zero counts as well as with low expression level.

to remove the genes with zero counts

raw_counts_wn1 <- raw_counts1[rowSums(raw_counts1) > 0, ]

to remove the low expressed genes

library(edgeR) library(HTSFilter) dge <- DGEList(raw_counts_wn1) dge <- calcNormFactors(dge, method = "TMM") dge <- estimateCommonDisp(dge) dge <- estimateTagwiseDisp(dge) filtData1 <- HTSFilter(dge)$filteredData

Before applying the filtered data to soft clustering analysis, I would like to filter the counts more to focus on the genes have ≥ 2 fold change and FDR (0.001). I don't know the right command to accomplish this target on Rstudio. I will be thankful for anyone offer me good suggestions to perform this filteration.

RNA-Seq R • 2.5k views
ADD COMMENT
0
Entering edit mode
7.1 years ago
Jake Warner ▴ 840

Hi,
First you would need to perform your test for fold change. In edgeR this is exactTest(). For time series data you could compare everything to t=0. Then you can subset the DE list for genes that meet your criteria using subset:
subset(exactResults, abs(logFC) >= 2 & FDR <= 0.001, select=c(logFC, FDR)).
Then you can use this list to filter your normalized count table before clustering. Beware though, you will miss a lot of genes which have perfectly clusterable profiles, but don't meet this strict cutoff. Since soft clustering is generally noise robust, you might want to rethink your filtering strategy.

ADD COMMENT
0
Entering edit mode

Hi, At first, I'd like to thank you so much for your suggestion. As far as I know, exactTest () need only two groups for making the comparison, but I still know in RNA-Seq analysis and R language handling. I have tried to perform the test for fold change as following exactResults <- exactTest(filtData1) but I got this error message Error in exactTest(filtData1) : At least one element of given pair is not a group. Groups are: 1 In this case, what is the problem? I may miss considering your comment regarding making the comparison to t=0 but I did not exactly know how to add it in exactTest() command.

ADD REPLY

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6