Genes With List of Go terms
1
0
Entering edit mode
22 months ago
cthangav ▴ 110

Hello,

I have a column of GO terms, and each GO term has a list of associated Genes in the second column.

term id         term name                            genes in term
GO:0005737  cytoplasm                      Gene 1 | Gene 2 | Gene 3 
GO:0032502  developmental process          Gene 1 | Gene 2 | Gene 3
GO:0048856 anatomical structure development    Gene 1 | Gene 2 | Gene 3

I want to change the format so I have a column of Genes, and followed by a list of associated GO terms.

gene                GO terms
Gene 1            cytoplasm | developmental process |  anatomical structure development
Gene 2            cytoplasm | developmental process |  anatomical structure development
Gene 3            cytoplasm | developmental process |  anatomical structure development

What is the best way to do this?

go-terms GO • 751 views
ADD COMMENT
0
Entering edit mode

By 'what is the best way', I assume you have tried to do it or you might have some general ideas. If you share those people would be able to help you by considering your approach.

ADD REPLY
2
Entering edit mode
22 months ago

assuming tab delimited input:

awk -F '\t' '/^term/ {next} {N=split($3,a,/[ ]*[\|][ ]*/);for(i=1;i<=N;i++) printf("%s\t%s\n",a[i],$2);}'  input.txt |\
sort -t $'\t' -k1,1 |\
datamash groupby 1 unique 2

Gene 1  anatomical structure development,cytoplasm,developmental process
Gene 2  anatomical structure development,cytoplasm,developmental process
Gene 3  anatomical structure development,cytoplasm,developmental process
ADD COMMENT

Login before adding your answer.

Traffic: 1642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6