Hi there, I have two files, file 1 looks like this :
NP_208181.1
NP_220259.1
NP_224629.1
WP_232131
WP_3432434
WP_2441241221
File 2 looks like this:
NP_208181.1,GCF_000008525.1
NP_212206.1,GCF_000008685.2
NP_213866.1,GCF_000008625.1
NP_219784.1,GCF_000008725.1
NP_220151.1,GCF_000008725.1
NP_220259.1,GCF_000008725.1
NP_224628.1,GCF_000008745.1
NP_224629.1,GCF_000008745.1
NP_224939.1,GCF_000008745.1
My purpose is to find which ID in file 1 is in file 2 too, so here we can see NP_208181.1, NP_220259.1, NP_224629.1 can be found in file two, followed by GCF blabla, i wrote a small script like this :
import re
with open("file1") as ID, open("file2") as data:
for line1, line2 in zip(ID,data):
if line1 in line2:
print(line1)
However, the result was blank, which does not make sense, any one knows why? how t modify this script?
Without testing it I think you're zipping the 2 lines together from each file, so it's only comparing line1 in file 1 with line 2 in file 2, then line 2 with line 2 etc You'll need 2 loops for this to work as you've got it - e.g:
and so on..
I'd look in to using the
any
andall
python keywords though, they may help here.If you're not bothered about using python specifically, you could do this in a single line (sort of) with
grep
:Hi, thanks for correction, but I tried , still blank, here is my new code:
I believe this answer is important: A: Technical question about python "to find the strings"
what about comm?
Fixed some duff logic in my answer, it should work for your case now.