Question

Removing gap using array

0

Entering edit mode

7.0 years ago

skjobs1234 ▴ 40

#/usr/bin/python

import sys
inp=sys.argv[1]
output=sys.argv[2]
count=0
tempcount=0
with open(inp, 'r') as file:
    for word in file:
        if word[0]=='>':
            if count<tempcount:
                count=tempcount
            tempcount=0
        elif set(word)=={'-','\n'}:
            tempcount=tempcount+1
if count<tempcount:
    count=tempcount
with open(inp, 'r') as file2:
    words=file2.read().split('>')[1:]
list=[ ]
i=count+1
for fa in words:
    fasta=fa.split('\n')
    list=list+['>'+fasta[0]]+fasta[i:]
list=[i+'\n' for i in list if i]
with open(output, 'w') as out:
    out.writelines(list)

I would like to remove the gap one by one. while this script is removing line by line. My objective is to remove the gap if found more than 20 gap (-) in template sequences.

software error alignment sequence • 1.6k views

ADD COMMENT • link updated 6.9 years ago by Biostar 20 • written 7.0 years ago by skjobs1234 ▴ 40

1

Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

1

Entering edit mode

Because this code : 'for word in file' treat file as a list of lines. Thus, you are counting lines.

Maybe you could try this way:

remove all the '\n' in the fasta sequence
code a regex of more than 20 '-', such as --------------------[-]*
find the all matches and delete them.

ADD REPLY • link 7.0 years ago by shoujun.gu ▴ 380

0

Entering edit mode

Please can u modify this script?

ADD REPLY • link 7.0 years ago by skjobs1234 ▴ 40

0

Entering edit mode

I have tried.. But not getting solution.

ADD REPLY • link 7.0 years ago by skjobs1234 ▴ 40

0

Entering edit mode

Hello skjobs1234!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/47508097/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 7.0 years ago by Pierre Lindenbaum 164k