Question

Write Diffgenes/Diffgeneids In A Tab File

0

Entering edit mode

13.0 years ago

GPR ▴ 390

Hello, I am relatively new to CummeRbund. Can somebody tell me how to write a tab file containing diffGenes/diffGeneIDs? Thanks G.

cummerbund • 4.2k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 13.0 years ago by GPR ▴ 390

0

Entering edit mode

Sukhdeep..

Thanks for a wonderful script. It worked for me but the output file include non-significant genes.. What am I doing wrong.. I want the sig genes only.

Please include in your answer how to replace the XLOC thingy with the real gene id. Thanks.

Fahim

ADD REPLY • link 10.3 years ago by mfahim ▴ 10

0

Entering edit mode

Hi Fahim,

Either you should as a new question or add a new comment, don't put these as answers unless what you are writing is a real answer.

You have to use -g with an appropriate GTF file to be used to with cufflinks to get gene id's.

XLOC are the cufflinks locus id's which are mapped to the locus information in the provided gtf file to fetch the geneids

http://seqanswers.com/forums/showthread.php?t=19079

Cheers

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 10.3 years ago by Sukhi Singh 11k

Ram · Answer 1 · 2012-07-31

1

Entering edit mode

13.0 years ago

Sukhi Singh 11k

Using CummeRbund:

diff_genes=subset(diffData(genes(diff_data)),(significant=='yes'))

where diffdata is the initial diffout folder generated after running cuffdiff and read in R using readCufflinks

Now, write out the diff_genes(list of significant DE genes)

write.table(diff_genes,'diff_genes.txt',sep='\t',quote=FALSE,row.names=FALSE,col.names=TRUE)

Using awk in terminal (In case you just need the list freshly out from cuff_diff without any R manipulation)

awk '$14=="yes"' diff_out/gene_exp.diff > diff_genes.txt

where diffout is again the output folder containing results of cuffdiff and geneexp.diff contains the list of genes tested for DE. In most cases the 14th column is the column which says the gene is significantly expressed or not, if you have some other column replace the number 14 by that.

If just interested in number of DE genes, then

awk '$14=="yes"' diff_out/gene_exp.diff | wc -l

Cheers

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 13.0 years ago by Sukhi Singh 11k

0

Entering edit mode

Sorry,

How I can extract columns 2 and 3 if only the column 14(significant) is yes and only between samples C1 and C2 because I have another samples in lower rows and put the result separately for which column 10 <0 and another folder for which column 10 > 0

Thank you so much

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.5 years ago by zizigolu ★ 4.4k

1

Entering edit mode

sigGenes=subset(diffData(genes(cuff)),(significant=='yes'))

This gives you sigGenes as a dataframe, you can now subset it to anything you like. I don't understand your question completely, but you can subset it sample names C1/C2 etc by

subGens=subset(sigGenes,sample_1=="C1" & sample_2=="C2")

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.5 years ago by Sukhi Singh 11k

0

Entering edit mode

I am thankful

May you please tell me these in unix code

Thanks again

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.5 years ago by zizigolu ★ 4.4k

1

Entering edit mode

awk '$14=="yes"' gene_exp.diff > sigGenes
awk '$2=="C1" && $3=="C2"' sigGens > subGenes

or combining them

awk '$2=="C1" && $3=="C2" && $14=="yes"' gene_exp.diff > subGenes

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.5 years ago by Sukhi Singh 11k

0

Entering edit mode

thank you very much

ADD REPLY • link 9.5 years ago by zizigolu ★ 4.4k