extract list of positions from fasta file biopython

0

Entering edit mode

7.5 years ago

s.i.lipworth • 0

I have a list of positions of interest eg:

I want to extract the base call at these positions from a fasta file using biopython. This is what I have tried:

query_dic ={}
with open(line) as pos_file:
                for x in pos_file:
                        for seq_record in SeqIO.parse(query_file, "fasta"):
                                nuc = seq_record[x] 
                                query_dic[x]=nuc
The error message says 'invalid index' - what is wrong?

biopython python • 3.6k views

ADD COMMENT • link 7.5 years ago by s.i.lipworth • 0

2

Entering edit mode

Steps:

read the positions as list

iterate FASTA records:

for seq_record in SeqIO.parse(query_file, "fasta"):
       for x in positions:
              # get the base at position x
              seq_record.seq[x-1]

ADD REPLY • link 7.5 years ago by shenwei356 8.7k

0

Entering edit mode

Firstly, you should get the right Chromosome; then extract the base from fasta sequence.

ADD REPLY • link 7.5 years ago by Ben ▴ 60

0

Entering edit mode

Does you FASTA file have one sequence in it, or many?

If one, you only need to open the FASTA file once, and you should use SeqIO.read for that.

If many, you need to know which sequence each of the values x refers to. Perhaps SeqIO.index would be useful here for loading the relevant record from a multiple sequence FASTA file?

ADD REPLY • link 7.4 years ago by Peter 6.0k

Login before adding your answer.