Suppose your excel file looks like this: (ideally just remove columns other than ids and start/end values - and save as text tab-delimited (my_coords.txt). As Pierre said, don't use excel)
id1 34 4000
id2 45 3156
id3 33 3764
And your fasta looks like this (if you have a multi-line fasta, linearize it):
>id1
sequence
>id2
sequence
>id3
#!/usr/bin/env python
with open('my_coords.txt', 'r') as f1:
pos = {}
for line in f1:
pos[line.strip().split('\t')[0]] = (int(line.strip().split('\t')[1]), int(line.strip().split('\t')[2]))
with open('my_fasta.fasta', 'r') as f2:
seqs = {}
for line in f2:
if line.startswith('>'):
seqs[line.strip().split('>')[1]] = next(f).strip()
with open('my_fasta_trimmed.fasta', 'w') as out:
for i in seqs:
out.write('>' + i, '\n', seqs[pos[i][0]:pos[i][1]])
Condensed, write directly to output:
#!/usr/bin/env python
with open('my_coords.txt', 'r') as f1:
pos = {}
for line in f1:
pos[line.strip().split('\t')[0]] = (int(line.strip().split('\t')[1]), int(line.strip().split('\t')[2]))
with open('my_fasta.fasta', 'r') as f2:
with open('my_fasta_trimmed.fasta', 'w') as out:
for line in f2:
if line.startswith('>'):
out.write(line.strip(), '\n', next(f).strip()[pos[line.strip().split('>')[1]][0]:pos[line.stripI().split('>')[1]][1])
and please, don't use excel.
did you search this site for a similar question ?
Are you asking someone to do it for you in python?