Question

Cummerbund Volcano plot infinite value

1

Entering edit mode

7.5 years ago

niu.shengyong ▴ 70

I'm faced with a problem in the volcano plot of CummeRbund. As shown in enclosed file, you could see that there are plenty of infinite dots on the borders and also on the bottom. Is there any method to avoid this problem? Volcano plot

I've checked some pages, it doesn't work when I put "pseudocount=0.0001" argument in csVolcano function. Do you have any advice? Thanks!

Sincerely, Simon

RNA-Seq Cummerbund • 5.3k views

ADD COMMENT • link updated 7.1 years ago by matine • 0 • written 7.5 years ago by niu.shengyong ▴ 70

1

Entering edit mode

Can you show the code where you add the pseudocount? It is usually +1.0 and must be added before log normalising function.

Also print the matrix created and extract the infinite values --> trace those values back to the original input values before normalising --> maybe the input values are "NA" or something weird.

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 2.1k

2

Entering edit mode

you can create the plot on your own. Just pull put the adj.Pval and the log2FC . Pull the values and plot the distribution first and see the value limits. Then it should not be a problem. Otherwise, you can always inflate with a constant counter for representation. use something like below. You have NA values for p.adjusted. So that's the problem. if it's a visualization you might not need them as well. Get rid of them and plot.

with(df, plot(log2FoldChange, -log10(padj), pch=20, main="Volcano plot",))
# Add colored points:
with(subset(df, padj <0.01 ), points(log2FoldChange, -log10(padj), pch=20, col="blue"))
with(subset(df, abs(log2FoldChange)>1.5), points(log2FoldChange, -log10(padj), pch=20, col="orange"))
with(subset(df, padj <0.01 & abs(log2FoldChange)>1.5), points(log2FoldChange, -log10(padj), pch=20, col="grey60"))

ADD REPLY • link 7.5 years ago by ivivek_ngs ★ 5.2k

2

Entering edit mode

First: extract the ID, adj.Pval and log2FC from cuff_data into a dataframe called "df"

Second: remove all lines where log2FC=NA

Third: follow the code vchris posted here.

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

Should I use the file "gene_exp.diff" and extract the columns of "gene_id", "gene", "sample1", "sample2", "log2(fold_change)", and " p_value"? Thanks!

ADD REPLY • link 7.5 years ago by niu.shengyong ▴ 70

1

Entering edit mode

Yes gene_exp.diff provides the gene differential FPKM after cufflinks tests differences in the summed FPKM of transcripts sharing each gene_id.

Take a look here for the structure of the gene_exp.diff file.

To reliably extract what you need requires knowledge of a programming language (perl, python) or terminal commands (cut -f). If you do not have that ability, then you can use a spreadsheet application like Excel but prepare to get EXCELLED!!

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

oh that's some handy code there.

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

I could delete infinite points successfully, but all the them become in grey color in the end. How could I solve this? Here is my codes:

table <-read.table("diff_list_subset.diff", header = TRUE)
df <- data.frame(table)
df <-df[Reduce('&', lapply(df, is.finite)),]
with(df, plot(df$log2.fold_change., -log10(df$p_value), pch=20, main="Volcano plot",))
with(subset(df, df$p_value < 0.01 ), points(df$log2.fold_change., -log10(df$p_value), 
pch=20, col="blue"))
with(subset(df, abs(df$log2.fold_change.)>1.5), points(df$log2.fold_change., -
log10(df$p_value), pch=20, col="orange"))
with(subset(df, df$p_value < 0.01 & abs(df$log2.fold_change.)>1.5), 
points(df$log2.fold_change., -log10(df$p_value), pch=20, col="grey60"))

ADD REPLY • link 7.5 years ago by niu.shengyong ▴ 70

0

Entering edit mode

Here is the plot: volcano plot

ADD REPLY • link 7.5 years ago by niu.shengyong ▴ 70

0

Entering edit mode

Sorry I'm pretty new in this area. How could I extract the ID, adj.Pval and log2FC from cuff_data into a dataframe called "df", and also remove all lines where log2FC=NA?　Appreciate it!

Simon

ADD REPLY • link 7.5 years ago by niu.shengyong ▴ 70

0

Entering edit mode

Here is my code:

cuff_data <- readCufflinks('diff_out/')
pdf(file="Volcano.pdf")
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, features=TRUE,   xlimits=c(-40,40))
dev.off()

How could I add this before normalization and log function?

Thanks!

ADD REPLY • link 7.5 years ago by niu.shengyong ▴ 70

0

Entering edit mode

Here is my code:

cuff_data <- readCufflinks('diff_out/')
pdf(file="Volcano.pdf")
csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, features=TRUE,   
xlimits=c(-40,40))
dev.off()

How could I add this before normalization and log function?

Thanks!

ADD REPLY • link 7.5 years ago by niu.shengyong ▴ 70

0

Entering edit mode

Please also include the code where "pseudocount=0.0001" is added.

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

csVolcano(genes(cuff_data), 'S63', 'S58',showSignificant=TRUE, alpha=0.05, pseudocount=0.0001, features=TRUE,  xlimits=c(-40,40))

Thanks!

ADD REPLY • link 7.5 years ago by niu.shengyong ▴ 70

score 0 · Answer 1 · 2017-06-05

0

Entering edit mode

7.5 years ago

niu.shengyong ▴ 70

Is the padj value the same as q value? Thanks!

ADD COMMENT • link 7.5 years ago by niu.shengyong ▴ 70

0

Entering edit mode

yes..."p adjusted value" = p value adjusted for multiple testing = q value

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 2.1k