Question

How To Identify The Up And Down Regulated Genes From Cuff Diff

2

Entering edit mode

11.9 years ago

sridhar2bioinfo ▴ 20

Dear All,

How to identify the Up and down regulated genes from the diff genes, I got diffgenes using Codes ref:(write diffGenes/diffGeneIDs in a tab file)

In my diff_gene.txt file could see most of the log2fold change values are +inf and -inf. From the list how to separate the up and down regulated genes??

Cheers

Sridhar

cuffdiff cummerbund • 12k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 11.9 years ago by sridhar2bioinfo ▴ 20

1

Entering edit mode

Hi, do you mean you want to separate the genes or to know which is which? If you get +inf or -inf, then one of your condition might have zero count such that you've got 0 fold change. As a result of that you will get +inf or -inf values. If I remember correctly, a genes with positive log2fold change will be up-regulated from condition A to condition B (if condition A/condition B) and negative log2fold change means a down-regulation.

ADD REPLY • link 11.9 years ago by Sam ★ 4.8k

0

Entering edit mode

Thanks for the reply Sam.. yeah i want to know which genes are up and down regulated... you mean +inf are up regulated and -inf are downregulated??

ADD REPLY • link 11.9 years ago by sridhar2bioinfo ▴ 20

Ram · Answer 1 · 2013-10-24

4

Entering edit mode

11.8 years ago

seidel 11k

The gene expression ratio results of a cuffdiff analysis are found in the file: gene_exp.diff. Each row of the file represents a gene measured under two conditions. There are three essential columns you need to answer your question: value_1, value_2, log2(fold_change). The value_1 column contains FPKM values for each gene in sample 1. The value_2 column contains FPKM values for each gene in sample 2. The log2(fold_change) column represents the ratio of the two measurements as: log2(sample_2/sample_1). Since it is in log2 space, positive values mean enrichment in sample_2 over sample_1, and negative values mean enrichment in sample_1 over sample_2. As pointed out by Sam, because it's easy to have 0 reads for some genes in either sample, you will get +Inf and -Inf when taking a ratio that has 0 in either the numerator or denominator. Thus the answer to your question is that in your list, the genes with positive values in the log2(fold_change) are enriched in sample_2 (what you are referring to as up-regulated), and those with negative values are enriched in sample_1 relative to sample_2 (which you refer to as down-regulated). Keep in mind however, that enrichment is an observation, up or down-regulation is a mechanism. You are observing enrichment, not a mechanism.

ADD COMMENT • link 11.8 years ago by seidel 11k

2

Entering edit mode

Yup, siridhar2bioinfo, seidel should have answered your question. If you want to pick the up-regulated genes or the down-regulated genes by using commands like awk:

assuming the column of log2(fold_change) is 3 (I am not familiar with cuffdiff output as I prefer using DESeq and edgeR), then you can do:

awk '{if($3 > 0 && index($3,"+inf")==0){print $0 > "Up_Regulated.txt"}else if($3<0 && index($3,"-inf")==0){print $0 > "Down_Regulated.txt"}}' InputFile

Then you will have the Up_Regulated.txt file which contains all the up-regulated observations and Down_Regulated.txt which contains all the down-regulated observations, removing all the +inf and -inf genes.

ADD REPLY • link 11.8 years ago by Sam ★ 4.8k

0

Entering edit mode

Thank you seidel and sam for your reply..

Please make me clear whether to consider the log2 foldchange values -inf and +inf or remove these values??

Because after running the cuffdiff i used cummerbund to identify the diff genes (The cuffdiff output file gene_exp.diff file.) In this file the last column is 'significant' contains either yes or no. this means the gene is significantly expressed or not. i filtered based on 'significant' column which has 'yes'. And most of the log2foldchange value is +inf and -inf.

Thank you

ADD REPLY • link 11.8 years ago by sridhar2bioinfo ▴ 20

2

Entering edit mode

The significance column is simply indicating which genes meet a 0.05 p-value cutoff. Whether to remove or count the Infinite values depends on the data itself, and on the question you are trying to answer. If I have a penny in my pocket, and you have none, I have infinitely more money than you. Does that make me rich? Examine how many reads are giving rise to your ratios, and think about the likelihood of getting that number of reads by chance (relative to 0). It's hard to evaluate significance when one of the values is 0, so be very skeptical. On the other hand, let's say you're comparing two different tissues, genes specifically expressed in one tissue and not the other may be expected to have this property (+Inf or -Inf), and thus may be what you are looking for. So the answer is: it depends.

ADD REPLY • link 11.8 years ago by seidel 11k

0

Entering edit mode

Thank you Seidel and sam for the reply and clarifying my doubts. The explanation was very much in detail. .

ADD REPLY • link 11.8 years ago by sridhar2bioinfo ▴ 20

0

Entering edit mode

Sorry,

How I can extract columns 2 and 3 if only the column 14(significant) is yes and only between samples C1 and C2 because I have another samples in lower rows and put the result separately for which column 10 <0

and another folder for which column 10 > 0

Thank you so much

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.5 years ago by zizigolu ★ 4.4k