Question

Is there any script for retrieving GO entries from PFAM GO entries

0

Entering edit mode

10.3 years ago

vahapel ▴ 210

Dear All,

Recently, I did de novo RNA-seq experiment for a non-model fish species and assembled transcripts for functional annotations. Before GO annotation, I blasted my assembled transcripts (1e-10) to InterProScan 5 and got PFAM IDs (File 1). Then, I downloaded GO annotations for each PFAM entries (from here: http://www.geneontology.org/page/download-mappings) and got PFAM IDs with corresponding GO mappings (File 2). I want to compare these two files and save the PFAM IDs with GO annotations for my file 1 entries:

File 1: (my file for analyses, as you notice that PF00015 and PF00016 are not absent in my file)

PF00001
PF00002
PF00003
PF00004
PF00005
PF00006
PF00007
PF00008
PF00009
PF00010
PF00011
PF00012
PF00013
PF00014
PF00017
PF00018

File 2: (downloaded file from the database PFAM IDs with GO annotations)

PF00001    GO:0004930
PF00001    GO:0007186
PF00001    GO:0016021
PF00002    GO:0004930
PF00002    GO:0007186
PF00002    GO:0016021
PF00003    GO:0004930
PF00003    GO:0007186
PF00003    GO:0016021
PF00004    GO:0005524
PF00005    GO:0005524
PF00005    GO:0016887
PF00006    GO:0005524
PF00008    GO:0005515
PF00009    GO:0003924
PF00009    GO:0005525
PF00010    GO:0046983
PF00013    GO:0003723
PF00014    GO:0004867
PF00015    GO:0004871
PF00015    GO:0007165
PF00015    GO:0016020
PF00016    GO:0000287
PF00017    GO:0005515
PF00018    GO:0005515

My question is that is there any perl or phyton script for comparing these two files and generating an output including File 1 PFAM IDs with GO annotations.

Thank you all in advance

Best regards

-vahap-

rna-seq • 3.2k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by vahapel ▴ 210

Ram · Answer 1 · 2014-07-21

1

Entering edit mode

10.3 years ago

pld 5.1k

Here's an approach:

import sys
import csv

def add_safe(dict, key, val):
    try:
        dict[key].append(val)

    except KeyError:
        dict[key] = [val]

def __main__():
    infi  = sys.argv[1]
    mapfi = sys.argv[2]
    outfi = sys.argv[3]

    with open(infi, 'rb') as fi:
        raw_pfams = [x[0] for x in list(csv.reader(fi, delimiter='\t'))]

    with open(mapfi, 'rb') as fi:
        raw_map = list(csv.reader(fi, delimiter='\t'))[:-1]

    id_map = dict()
    for id in raw_map:
        add_safe(id_map, id[0], id[1])

    with open(outfi, 'w') as fi:
        for id in raw_pfams:
            try:
                fi.write(id + '\t' + ';'.join(id_map[id]) + '\n')
            except KeyError:
                print 'ERROR: {} not found in reference map!'.format(id)

__main__()

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by pld 5.1k

0

Entering edit mode

Dear joe.cornish826,

Many thanks for the prompt reply. But, unfortunately, I could not execute this script :( , because I am newbie ! Do you give me a bit additional info for how to execute it ? Many thanks again.

Best regards,

-vahap-

ADD REPLY • link 10.3 years ago by vahapel ▴ 210

0

Entering edit mode

Usage is as follows, assuming you save the script as pfam-to-go.py

python pfam-to-go.py <file 1 from your post> <file 2 from your post> <desired output file>

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by pld 5.1k

Ram · Answer 2 · 2014-08-01

You should be able to do this with a simple join:

join <(sort -nk1.3,1.7 file1) <(sort -nk1.3,1.7 file2)

The sort is just to be safe, and perhaps this command could even be shortened/improved. Note that you can get at these mappings pretty easy with HMMER2GO without developing custom scripts/wrappers. Assuming you have a file of sequences to search for domains, you can find matches with:

hmmer2go search -i genes_orfs.faa -d Pfam-A.hmm

and then you can map GO terms to your matches:

hmmer2go mapterms -i genes_orfs_Pfam-A.tblout -p pfam2go -o genes_orfs_Pfam-A_GO.tsv

I didn't provide much description about what is going on with those commands, but hopefully that helps (feel free to ask questions!).