extract list of positions from fasta file biopython
0
0
Entering edit mode
7.5 years ago

I have a list of positions of interest eg:

10
20
1000
4000000

I want to extract the base call at these positions from a fasta file using biopython. This is what I have tried:

query_dic ={}
with open(line) as pos_file:
                for x in pos_file:
                        for seq_record in SeqIO.parse(query_file, "fasta"):
                                nuc = seq_record[x] 
                                query_dic[x]=nuc
The error message says 'invalid index' - what is wrong?
biopython python • 3.6k views
ADD COMMENT
2
Entering edit mode

Steps:

  1. read the positions as list
  2. iterate FASTA records:

    for seq_record in SeqIO.parse(query_file, "fasta"):
           for x in positions:
                  # get the base at position x
                  seq_record.seq[x-1]
    
ADD REPLY
0
Entering edit mode

Firstly, you should get the right Chromosome; then extract the base from fasta sequence.

ADD REPLY
0
Entering edit mode

Does you FASTA file have one sequence in it, or many?

If one, you only need to open the FASTA file once, and you should use SeqIO.read for that.

If many, you need to know which sequence each of the values x refers to. Perhaps SeqIO.index would be useful here for loading the relevant record from a multiple sequence FASTA file?

ADD REPLY

Login before adding your answer.

Traffic: 1580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6