Entering edit mode
8.2 years ago
User 6777
▴
20
Hi all, I have a IUPRED-GLOBPROT flat text result file as my input. Part of the file is like:
# IUPred
# Copyright (c) Zsuzsanna Dosztanyi, 2005
#
# Z. Dosztanyi, V. Csizmok, P. Tompa and I. Simon
# J. Mol. Biol. (2005) 347, 827-839.
#
#
# Prediction output
# NC_179987
Number of globular domains: 1
globular domain 1. 1 - 112
>NC_179987
MSSKQEISKK IISLLNTLPK EKLKHYSSFK DSQIKRFSDL QKVNQISEQD LKLQYIALKN
LCNDKYKRYY ELDDKLLRPK GNPHYYERLM NEINGEKKEN LFSALRTVVF GK
# IUPred
# Copyright (c) Zsuzsanna Dosztanyi, 2005
#
# Z. Dosztanyi, V. Csizmok, P. Tompa and I. Simon
# J. Mol. Biol. (2005) 347, 827-839.
#
#
# Prediction output
# 68476204
Number of globular domains: 0
>68476204
dledaydkfa iydkvdngsg geeqqpeldp nvnynevtde epseeessed ssddffedep
pkkd
# IUPred
# Copyright (c) Zsuzsanna Dosztanyi, 2005
#
# Z. Dosztanyi, V. Csizmok, P. Tompa and I. Simon
# J. Mol. Biol. (2005) 347, 827-839.
#
#
# Prediction output
# 684723624
Number of globular domains: 3
globular domain 1. 267 - 307
globular domain 2. 765 - 829
globular domain 3. 1141 - 1197
>684723624
msetkeapkp tkqesqgilk kltsgdtwvs pfrsqaseed pkkkinlykq fkesnkiehi
kv..
# Copyright (c) Zsuzsanna Dosztanyi, 2005
...
...
From this, I want to parse the 'Start-End' positions in lines start with "globular domain" of each refseq/gi id (located below the 'globular domain' line or above the 'Number of globular domains:' line). For the above input, the output will be:
NC_179987: 1 - 112
684723624: 267 - 307, 765 - 829, 1141 - 1197
I have tried:
with open("input.txt") as f:
first_time = True
for line in f:
line = line.rstrip()
if line.startswith(">"):
if not first_time:
if start_ends:
print("{}: {}".format(header,", ".join(start_ends)))
else:
first_time = False
header = line.lstrip(">")
start_ends = []
elif len(line.split()) == 6 and "".join(line.split()[3:]).isnumeric():
start_ends.append("{}-{}".format(line.split()[3],line.split()[5]))
if start_ends:
print("{}: {}".format(header,", ".join(start_ends)))
But I could not get any output.
Is this a different question than the one you just got an answer for or are you trying to come up with a python solution for the same problem?
thanks for reply .. its a different file generated from iupred globprot result. Previously, the output generated from different program. I have tried in python, but this script yields no output.
I originally only looked at the expected output but I see the difference now.