Question

Biopython intron-exon boundary nucleotide frequencies

0

Entering edit mode

2.6 years ago

ash.maladec • 0

Hello, I have two different arrays composing of 10 nucleotides in an intron and exon junction, and an exon intron junction that I got from a GenBank file. Now the arrays look simething like this:

Array1 = [Seq(ATTCATTCGG), Seq(GGCTAGATTG), Seq(CATGTAATGC)]

How do I calculate the frequency of each nucleotide in each position of the junction?

python arrays Biopython • 660 views

ADD COMMENT • link updated 2.6 years ago by Joe 21k • written 2.6 years ago by ash.maladec • 0

score 0 · Answer 1 · 2022-05-24

Eukaryote genomics is not a area of expertise for me so apologies if I miss some subtlety with the exons/introns here, but if the tasks is as simple as it appears you should just be able to do something like this:

from collections import Counter
from Bio.Seq import Seq

Array1 = [Seq("ATTCATTCGG"), Seq("GGCTAGATTG"), Seq("CATGTAATGC")]

for i in zip(*Array1):
    print(Counter(i))

Result:

Counter({'A': 1, 'G': 1, 'C': 1})
Counter({'T': 1, 'G': 1, 'A': 1})
Counter({'T': 2, 'C': 1})
Counter({'C': 1, 'T': 1, 'G': 1})
Counter({'A': 2, 'T': 1})
Counter({'T': 1, 'G': 1, 'A': 1})
Counter({'A': 2, 'T': 1})
Counter({'T': 2, 'C': 1})
Counter({'G': 2, 'T': 1})
Counter({'G': 2, 'C': 1})

This is just the printed representation, if you want to use the dicts produced by counter, you can simply assign them to a list or something.