Remove sequence by coordinates with Biopython
0
0
Entering edit mode
4.8 years ago
Chvatil ▴ 130

Hel lo

I have a sequence such as :

record_dict = SeqIO.to_dict(SeqIO.parse("sequence.fasta", "fasta"))

>sequence1 
AAACCCGGGTTTAAACCCGGGTTTGGGTTTGGG

and I know from this sequence how to select specific part with coordinates with :

print(record_dict[sequence1].seq[coordinate_start:coordinate_end])
print(record_dict[sequence1].seq[3:7])

and I get :

CCCGG

but what if I would like to remove this part from the

>sequence1 
AAACCCGGGTTTAAACCCGGGTTTGGGTTTGGG

and get

>sequence1 
AAACGTTTAAACCCGGGTTTGGGTTTGGG

Does someone have an idea?

Thanks for your help

Here is a better exemple

ACCGCTTTGAATCCGAGCTAG
           ---- ----

and I want to remove 2 parts :

TCCG and GCTA with corresponds to the coordinates

11:14 and 16:19

At the end I would like to remove both and get :

>seq
ACCGCTTTGAAAG
biopython fasta • 1.4k views
ADD COMMENT
0
Entering edit mode

If it were a string I'd say

print(record_dict[sequence1].seq[:3] + record_dict[sequence1].seq[7:])

But I don't know if + works for SeqIO records.

ADD REPLY
0
Entering edit mode

I see what you mean but here It is an easy example, in the real data I can have thousands of coordinates, I added another exemple in order to show you.

ADD REPLY
1
Entering edit mode

Assuming a list of tuples containing the coordinates you wish to remove:

ends, starts = zip(*to_remove)

final_seq = record_dict[sequence1].seq[:ends[0]]

for start, end in zip(starts[-1], ends[1]): 
   final_seq += record_dict[sequence1].seq[start:end]

final_seq = record_dict[sequence1].seq[starts[-1]:]

Again assuming you can just add seq records like that.

ADD REPLY
0
Entering edit mode

thank you for your help

ADD REPLY

Login before adding your answer.

Traffic: 2105 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6