python newbie - only else statement is printed
1
0
Entering edit mode
8.0 years ago
mdrnao • 0

Complete beginner so I'm sorry if this is obvious!

I have a file which is name | +/- or IG_name | 0 in a long list.

S1      +
IG_S1   0
S2      -
IG_S3   0
S3      +
S4      -
dnaA    +
IG_dnaA 0

Everything which starts with IG_ has a corresponding name. I want to add the + or - to the IG_name.

The information is gene names and strand information, IG = intergenic region. Basically I want to know which strand the intergenic region is on.

what I want:

open file
if starts with IG_*
    find the line with *
    print("IG_" and the line it found)
else 
    print(line)

what I have:

with open(sys.argv[2]) as geneInfo:
    with open(sys.argv[1]) as origin:

            for line in origin:
                    if line.startswith("IG_"):
                            name = line.split("_")[1]
                            nname = name[:-3]
                            for newline in geneInfo:
                                    if re.match(nname, newline):
                                            print("IG_"+newline)
                    else:
                            print(line)

where origin is the mixed list and geneInfo has only the names not IG_names.

With this code I end up with a list containing only the else statements.

S1  +

S2  -

S3  +

S4  -

dnaA    +

My problem is that I don't know what is wrong to search!

python • 2.0k views
ADD COMMENT
0
Entering edit mode

What 2 files do you start with?

ADD REPLY
0
Entering edit mode

"where origin is the mixed list and geneInfo has only the names not IG_names"

So origin is the first example, and geneInfo has everything except the ones which start with IG.

ADD REPLY
0
Entering edit mode

What others are saying is that you should show just a few lines of each input file, then show the exact command as you invoke it. These to ingredients are necessary to troubleshoot.

ADD REPLY
0
Entering edit mode

Sorry, I should have made my other file more obvious! second file looks like this:

S1  +
S2  -
S3  +
S4  -
dnaA    +
ADD REPLY
1
Entering edit mode
8.0 years ago
Zaag ▴ 870

you need to open the file twice and there is a double loop, so there should be a few better ways to do this.

NIG = []
with open('input.txt') as f:
      for line in f:
              line = line.strip()
              if not line.startswith('IG_'):
                      name, strand = line.split()
                      NIG.append([name, strand])

with open('ig_inout.txt') as f:
      for line in f:
              line = line.strip()

              if line.startswith('IG_'):
                      [print(line,  i[1]) for i in NIG if i[0] == line.split()[0].split('_')[1] ]
              else:
                      print(line)

but this gives me this output:

S1      +
IG_S1   0 +
S2      -
IG_S3   0 +
S3      +
S4      -
dnaA    +
IG_dnaA 0 +
ADD COMMENT
0
Entering edit mode

Excellent! Thank you!

Would you mind explaining the second loop?

[print(line,  i[1]) for i in NIG if i[0] == line.split()[0].split('_')[1] ]

this is confusing me!

ADD REPLY
1
Entering edit mode

You can write it like this:

for i in NIG:
    if i[0] == line.split()[0].split('_')[1]:
        print(line,  i[1])
For each entry without IG:

    if NAME is the same as the second part of IG_NAME

        print the line of the file and the + or the -
  
ADD REPLY
0
Entering edit mode

Hi Zaag, why not use a dictionary? I reused most of your code: (untested!)

stranddict = {}
with open('input.txt') as f:
      for line in f:
              if not line.startswith('IG_'):
                      name, strand = line.strip().split('\t')
                      NIG[name] = strand

with open('ig_inout.txt') as f:
      for line in f:
              if line.startswith('IG_'):
                      print(line.split('\t')[0] + "\t" + stranddict[line.split('\t')[0].replace("IG_", "")]
              else:
                      print(line)
ADD REPLY

Login before adding your answer.

Traffic: 2831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6