Python: count how many lines have a specific word
3
0
Entering edit mode
9.7 years ago
Illinu ▴ 110

I don't know what's wrong with this code. I want to count from a blast report, how many hits correspond to a specific species (the keyword). So my idea is to loop through each line and when the keyword is found, add 1 to count and go to the next line, this because sometimes the name of the species is present more than once in the subject name.

#!/usr/bin/python
import sys
      ##usage: python filterbyword.py file keyword
file = open(sys.argv[1],'r')
keyword = sys.argv[2]
count = 0
for line in file:
    while True:
        if keyword not in line:
            continue
        else:
            break
    count = count + 1
print count
python count one per line loop while • 24k views
ADD COMMENT
2
Entering edit mode

what's wrong with `grep` ?

ADD REPLY
0
Entering edit mode

I was gonna ask that (along with why not awk for more advanced grepping), but maybe OP wishes to add this functionality as a module to existing code?

ADD REPLY
0
Entering edit mode

OMG you are right! How didn't I think about grep. Shame on me!!

ADD REPLY
2
Entering edit mode
9.7 years ago

The problem is, specifically, that your while loop is inside the for loop.

So if the keyword is not in the line, then your script keeps iterating through the while loop, and it appears to "hang".

All you have to do is take out the while loop and test directly:

for line in file:
    if keyword in line:
        count = count + 1
ADD COMMENT
0
Entering edit mode

Ok, thanks, I did this but the problem is I don't know if it is adding up each time the word is in the line or if it would add up only once and jump to the next line.

ADD REPLY
0
Entering edit mode

The test should only be done once per line, but you can verify this by trying it out with test input that contains multiple instances of a keyword on a line, and seeing if the final count matches what you expect.

ADD REPLY
0
Entering edit mode

Exactly as Alex says. Your whole mini-script can even just be compacted into:

with open(sys.argv[1]) as f:
    count = sum(sys.argv[2] in line for line in f)
ADD REPLY
0
Entering edit mode
9.7 years ago

What is happening with this script ? If it not looping again for next search, try to useseek()method. Something like file.seek(0)

ADD COMMENT
0
Entering edit mode

(Haven't tested it) but for line in file should iterate through the file.

ADD REPLY
0
Entering edit mode

Why would you seek to the start of a file on each iteration?

ADD REPLY
0
Entering edit mode
9.7 years ago

You are incrementing count regardless of whether keyword is in line or not. Maybe you want:

#!/usr/bin/python
import sys
      ##usage: python filterbyword.py file keyword
file = open(sys.argv[1],'r')
keyword = sys.argv[2]
count = 0
for line in file:
    while True:
        if keyword not in line:
            continue
        else:
            count += 1
            break
print count

Or even simpler:

for line in file:
    if keyword in line:
        count+=1
print count
ADD COMMENT
0
Entering edit mode

The simpler one is what I was using but then I got confused about whether there would be an addition for each time the word appears in the line. I don't want that, I want to have one count for each line with the word, not a count for each time the word is in the file.

ADD REPLY
0
Entering edit mode

Just try it. if keyword in line simply tests that statement, it doesn't count how many times the keyword is found in line.

ADD REPLY

Login before adding your answer.

Traffic: 774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6