Extract ID and go terms from annotation gff file, one go term per line
1
0
Entering edit mode
4.6 years ago
Cecelia ▴ 30

I have a gff file, each line looks like this:

   2 scaffold1  maker   mRNA    1443962 1446567 .   +   .   ID=ASPAM00000002080;Parent=ASPAG00000001349;Dbxref=MetaCyc:PWY-7980,InterPro:IPR024034,InterPro:IPR005725,Gene3D:G3DSA:1.10.1140.10,Gene3D:G3DSA:2.40.50.100,Gene3D:G3DSA:2.40.30.20,Gene3D:G3DSA:3.40.50.300,TIGRFAM:TIGR01042,KEGG:00190+7.1.2.2,KEGG:00195+7.1.2.2,ProSitePatterns:PS00152,Pfam:PF16886,Pfam:PF00006,Hamap:MF_00309,CDD:cd01134,CDD:cd18119;Name=vhaa;Ontology_term=GO:0005524,GO:0046034,GO:1902600,GO:0033180,GO:0046961;_AED=0.10;_QI=0|0|0|1|1|1|8|0|613;_eAED=0.10;makerName=maker-scaffold3-augustus-gene-14.11-mRNA-1;product=V-type proton ATPase catalytic subunit A;uniprot_id=Q2TJ56

Here I only want to extract the fields with ID and go terms and divide them into several rows, first column is ID, second column is GO accession. Now I only managed to extract id and go terms that looks like this:

ASPAM00000002080 GO:0005524,GO:0046034,GO:1902600,GO:0033180,GO:0046961

The ideal output should look like:

ASPAM00000002080    GO:0005524
ASPAM00000002080    GO:0046034
ASPAM00000002080    GO:1902600
ASPAM00000002080    GO:0033180
ASPAM00000002080    GO:0046961

Is there a easy way to do this?

Thx in advance, C

GO term gff • 2.5k views
ADD COMMENT
0
Entering edit mode

Dear Cecelia,

I trying to obtain what you was able to do I mean :

ASPAM00000002080 GO:0005524,GO:0046034,GO:1902600,GO:0033180,GO:0046961

From the gff3 file

I would be delighted if you could share how you did this

Thanks

Best

AG

ADD REPLY
1
Entering edit mode
4.6 years ago
JC 13k

This can be done in Perl-one-liner:

$ perl -lne '/ID=(\w+);.+Ontology_term=(.+?);/; $id=$1; @go=split(/,/, $2); foreach $go (@go) { print "$id\t$go";}' < data
ASPAM00000002080        GO:0005524
ASPAM00000002080        GO:0046034
ASPAM00000002080        GO:1902600
ASPAM00000002080        GO:0033180
ASPAM00000002080        GO:0046961
ASPAM00000002080        GO:0005524
ASPAM00000002080        GO:0046034
ASPAM00000002080        GO:1902600
ASPAM00000002080        GO:0033180
ASPAM00000002080        GO:0046961
ADD COMMENT

Login before adding your answer.

Traffic: 2634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6