Question

Extract the targeted txt using python

0

Entering edit mode

7.4 years ago

horsedog ▴ 60

Hi, I'm beginner to python, here I have a very basic question about extracting targeted text. I have thousands of strings like this :

>ref|WP_070076791.1| iron-sulfur protein [Acinetobacter proteolyticus]

Here I only need WP_070076791.1, so I write a script in python:

data = open("data.fasta").read()

import re

for line in data:

 start = line.startswith(">ref|")

 end = line.endswith("| ")

 number = re.search(r'start(.*?)end',line)

print(number)

But it gives me "none", does anybody have idea?

python • 1.7k views

ADD COMMENT • link updated 7.4 years ago by WouterDeCoster 47k • written 7.4 years ago by horsedog ▴ 60

0

Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 7.4 years ago by WouterDeCoster 47k

score 1 · Answer 1 · 2017-10-27

Do you only need what would be in the position of each header in the fasta, that WP_070076791.1 is in?

with open('data.fasta', 'r') as f:
    for line in f:
        if line.startswith('>'):
            print line.strip().split('|')[1]

if this isn't an assignment, and you can use other options:

grep -e '>' data.fasta | cut -f 2 -d '|'

score 1 · Answer 2 · 2017-10-27

If you don't need to use Python, you can use grep with awk:

$ grep '^>' data.fasta | awk -v FS="|" '{ print $2; }' > result.txt

If you have to use Python:

#!/usr/bin/env python

import sys

for line in sys.stdin:
    if line.startswith('>'):
        line = line.strip()
        elems = line.split('|')
        sys.stdout.write("%s\n" % (elems[1]))

You could use it like so:

$ ./filter.py < data.fasta > result.txt

score 0 · Answer 3 · 2017-10-27

0

Entering edit mode

7.4 years ago

Pierre Lindenbaum 165k

start = line.startswith(">ref|")

startsWith returns a boolean not an index/integer: https://docs.python.org/2/library/stdtypes.html

I think you're looking for find

ADD COMMENT • link 7.4 years ago by Pierre Lindenbaum 165k