Biopython intron-exon boundary nucleotide frequencies
1
0
Entering edit mode
2.6 years ago

Hello, I have two different arrays composing of 10 nucleotides in an intron and exon junction, and an exon intron junction that I got from a GenBank file. Now the arrays look simething like this:

Array1 = [Seq(ATTCATTCGG), Seq(GGCTAGATTG), Seq(CATGTAATGC)]

How do I calculate the frequency of each nucleotide in each position of the junction?

python arrays Biopython • 660 views
ADD COMMENT
0
Entering edit mode
2.6 years ago
Joe 21k

Eukaryote genomics is not a area of expertise for me so apologies if I miss some subtlety with the exons/introns here, but if the tasks is as simple as it appears you should just be able to do something like this:

from collections import Counter
from Bio.Seq import Seq

Array1 = [Seq("ATTCATTCGG"), Seq("GGCTAGATTG"), Seq("CATGTAATGC")]

for i in zip(*Array1):
    print(Counter(i))

Result:

Counter({'A': 1, 'G': 1, 'C': 1})
Counter({'T': 1, 'G': 1, 'A': 1})
Counter({'T': 2, 'C': 1})
Counter({'C': 1, 'T': 1, 'G': 1})
Counter({'A': 2, 'T': 1})
Counter({'T': 1, 'G': 1, 'A': 1})
Counter({'A': 2, 'T': 1})
Counter({'T': 2, 'C': 1})
Counter({'G': 2, 'T': 1})
Counter({'G': 2, 'C': 1})

This is just the printed representation, if you want to use the dicts produced by counter, you can simply assign them to a list or something.

ADD COMMENT

Login before adding your answer.

Traffic: 1594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6