recording for each position how many A/T/C/G (the key) you see from the input list in python?
0
0
Entering edit mode
8.6 years ago
Kevin_Smith ▴ 10

I have a list of kmers. I want to make a function that take kmers with same lend (5) and generate a PFM. The list is like ["TTTTA","TTTGA"] . I mean, record how many A/T/C/G you see from the input list. So far , I have this:

def make_pfm(sites, k=5):

pmf1= []

pfm = {"A":[1]*k, "T":[1]*k, "C":[1]*k, "G":[1]*k}

return pmf1

The output that I'm looking for is like:

{'A': [1, 1, 1, 1, 2], 'C': [1, 2, 1, 1, 1], 'T': [2, 1, 2, 1, 1], 'G': [1, 1, 1, 2, 1]}

I will appreciate very much any help. Thanks!

gene sequence alignment blast sequencing • 1.6k views
ADD COMMENT
1
Entering edit mode

This sounds a lot like homework...

A = sum([x.count('A') for x in list_of_kmers])
C = sum([x.count('C') for x in list_of_kmers])
G = sum([x.count('G') for x in list_of_kmers])
T = sum([x.count('T') for x in list_of_kmers])

Also what is PFM? Im going to guess that my Google result of "Pelvic Floor Muscle" is not accurate.

ADD REPLY
0
Entering edit mode

I tried the code bellow, but the output don't make sense , I see : {'A': [1, 1, 1, 1, 1, 1, 1], 'C': [1, 1, 1, 1, 1, 1, 1], 'T': [1, 1, 1, 1, 1, 1, 1], 'G': [1, 1, 1, 1, 1, 1, 1], (6482, 4427, 4626, 6087): 'AGTCCG\n'} PFM is just for call the python dictionary , pelvic floor muscle lol.

What do you think that is wrong? Thanks!

def make_pfm(list_of_kmers, k=5):

pfm = {"A":[1]k, "T":[1]k, "C":[1]k, "G":[1]k}

for x in list_of_kmers:

A = sum([x.count('A') for x in list_of_kmers])

C = sum([x.count('C') for x in list_of_kmers])

G = sum([x.count('G') for x in list_of_kmers])

T = sum([x.count('T') for x in list_of_kmers])

pfm[A, C, G, T] = x

return pfm

print make_pfm(list_of_kmers, k=7)

ADD REPLY

Login before adding your answer.

Traffic: 2310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6