Propagating Gene Ontology labels to ancestors
1
0
Entering edit mode
7.9 years ago
kandoigaurav ▴ 150

I want to propagate the gene ontology annotations for every GO term to it's ancestor terms. To this end, I've created two tables, namely GeneOntology and OntologyAncestors.

The GeneOntology table has the following structure:

Gene Ontology
GeneA TermA
GeneB TermA
GeneA TermB
GeneC TermC

Similarly, the OntologyAncestors table has the following structure:

Ontology Ancestor
TermA AncestorA
TermA AncestorB
TermC AncestorA
TermC AncestorD

How can I obtain a table which has gene propagated to every ontology term's ancestor. Something like this:

Gene Ontology
GeneA TermA
GeneA AncestorA
GeneA AncestorB
GeneB TermA
GeneB AncestorA
GeneB AncestorB
GeneA TermB
GeneC TermC
GeneC AncestorA
GeneC AncestorD

I was planning to do matrix multiplication to get the results, but after converting these tables to matrix of 0's and 1's, the matrices become very big.

Gene Ontology Protein Function GO RNA-Seq • 2.4k views
ADD COMMENT
0
Entering edit mode

those tables are plaintext?

ADD REPLY
0
Entering edit mode

Yes, just two tab-separated columns as .txt for both.

ADD REPLY
2
Entering edit mode
7.9 years ago
Diego ▴ 50

Ok, I know this is not exactly what you need, but if you are using a linux machine this will give you an almost useful result:

join -t $'\t'   GeneOntology.txt OntologyAncestors.txt -a1 -1 2 -2 1 --header

This relational operation called join. The result is

Ontology    Gene    Ancestor
TermA   GeneA   AncestorA
TermA   GeneA   AncestorB
TermA   GeneB   AncestorA
TermA   GeneB   AncestorB
TermB   GeneA
TermC   GeneC   AncestorA
TermC   GeneC   AncestorD

Of course this isn't the view you wanted, but it's closer.

EDIT

I came up with a solution!

 join -t $'\t'   GeneOntology.txt OntologyAncestors.txt -a2 -1 2 -2 1 --header|sed 1d |cut --complement -f 1 -d $'\t' >> GeneOntology.txt

I'll explain step by step.

First you apply a right join

 join -t $'\t'   GeneOntology.txt OntologyAncestors.txt -a2 -1 2 -2 1 --header

This will leave you this output:

Ontology    Gene    Ancestor
TermA   GeneA   AncestorA
TermA   GeneA   AncestorB
TermA   GeneB   AncestorA
TermA   GeneB   AncestorB
TermC   GeneC   AncestorA
TermC   GeneC   AncestorD

Then you remove the headers with sed, ordering to (d)elete the line 1.

sed 1d

Then you remove the first column (the Ontology column):

cut --complement -f 1 -d $'\t'

The final result will be this

GeneA   AncestorA
GeneA   AncestorB
GeneB   AncestorA
GeneB   AncestorB
GeneC   AncestorA
GeneC   AncestorD

Then you just append this to the original GeneOntology.txt (probably you should consider backing up that file).

standard_output >> GeneOntology.txt

Your final file will be

Gene    Ontology
GeneA   TermA
GeneB   TermA
GeneA   TermB
GeneC   TermC
GeneA   AncestorA
GeneA   AncestorB
GeneB   AncestorA
GeneB   AncestorB
GeneC   AncestorA
GeneC   AncestorD

Now, if you want to sort the file you can simply:

echo -e "Gene\tOntology"> sorted.txt
sed 1d GeneOntology.txt|sort>>sorted.txt

Final result

Gene    Ontology
GeneA   AncestorA
GeneA   AncestorB
GeneA   TermA
GeneA   TermB
GeneB   AncestorA
GeneB   AncestorB
GeneB   TermA
GeneC   AncestorA
GeneC   AncestorD
GeneC   TermC
ADD COMMENT
0
Entering edit mode

Thank you @Diego for the solution.

ADD REPLY

Login before adding your answer.

Traffic: 2376 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6