Extract the targeted txt using python
3
0
Entering edit mode
7.2 years ago
horsedog ▴ 60

Hi, I'm beginner to python, here I have a very basic question about extracting targeted text. I have thousands of strings like this :

>ref|WP_070076791.1| iron-sulfur protein [Acinetobacter proteolyticus]

Here I only need WP_070076791.1, so I write a script in python:

data = open("data.fasta").read()

import re

for line in data:

 start = line.startswith(">ref|")

 end = line.endswith("| ")

 number = re.search(r'start(.*?)end',line)

print(number)

But it gives me "none", does anybody have idea?

python • 1.6k views
ADD COMMENT
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
1
Entering edit mode
7.2 years ago
st.ph.n ★ 2.7k

Do you only need what would be in the position of each header in the fasta, that WP_070076791.1 is in?

with open('data.fasta', 'r') as f:
    for line in f:
        if line.startswith('>'):
            print line.strip().split('|')[1]

if this isn't an assignment, and you can use other options:

grep -e '>' data.fasta | cut -f 2 -d '|'
ADD COMMENT
1
Entering edit mode
7.2 years ago

If you don't need to use Python, you can use grep with awk:

$ grep '^>' data.fasta | awk -v FS="|" '{ print $2; }' > result.txt

If you have to use Python:

#!/usr/bin/env python

import sys

for line in sys.stdin:
    if line.startswith('>'):
        line = line.strip()
        elems = line.split('|')
        sys.stdout.write("%s\n" % (elems[1]))

You could use it like so:

$ ./filter.py < data.fasta > result.txt
ADD COMMENT
0
Entering edit mode
7.2 years ago
start = line.startswith(">ref|")

startsWith returns a boolean not an index/integer: https://docs.python.org/2/library/stdtypes.html

I think you're looking for find

ADD COMMENT

Login before adding your answer.

Traffic: 1815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6