Compiling UniProt txt file into a table of ( AC - ACcession number) and (MOD_RES)
0
0
Entering edit mode
9.0 years ago
ahmedakhokhar ▴ 150

Dear all,

I am not much familiar with python and trying to retrieve data from a text file(test1), Uniprot, that looks like this:

ID   YSH1_YEAST              Reviewed;         779 AA.
AC   Q06224; D6VYS4;
DT   10-JAN-2006, integrated into UniProtKB/Swiss-Prot
DT   01-NOV-1996, sequence version 1.
..
FT   METAL       184    184       Zinc 1. {ECO:0000250}.
FT   METAL       184    184       Zinc 2. {ECO:0000250}.
FT   METAL       430    430       Zinc 2. {ECO:0000250}.
FT   MOD_RES     517    517       Phosphoserine; by ATM or ATR.
FT                                {ECO:0000244|PubMed:18407956}.
FT   MUTAGEN      37     37       D->N: Loss of endonuclease activity.
.
.

So far I am able to retrieve the MOD_RES and AC separately, by using these codelets:

test = open('test1', 'r')
regex2 = re.compile(r'^AC\s+\w+')
for line in test:
    ac = regex2.findall(line)
    for a in ac:
        a=''.join(a)
        print(a[5:12])
Q06224
P16521
testfile = open('test1')
regex = re.compile(r'^FT\s+\MOD_RES\s+\w+\s+\w+\s+\w.+')
for line in testfile:
    po = regex.findall(line)
    for p in po:
        p=''.join(p)
        print(p[23:48])
517       Phosphoserine;
2       N-acetylserine
187       N6,N6,N6-trime
196       N6,N6,N6-trime

The goal is to get AC and their relevant Modification residues (MOD_RES) into a tab separate format. Also, if more than one MOD_RES appear in the data for a particular AC, duplicate that AC and get the table format like this:

AC  MOD_RES
Q06224  517    517       Phosphoserine
P04524  75    75       Phosphoserine
Q06224  57    57       Phosphoserine
python uniprot • 1.8k views
ADD COMMENT
1
Entering edit mode

Step-1: Use Bio.SwissProt to parse the input. If you don't, this is just text processing - not many would help you out with such generic code.

ADD REPLY

Login before adding your answer.

Traffic: 2158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6