Generate Random Dna Sequence Data With Equal Base Frequencies python
1
0
Entering edit mode
8.0 years ago
elisheva ▴ 120

Hello everybody!! I have some questions to ask: 1.I have to generate random dna sequence, length: 20KB with equal base frequency on python. I tried to use this function:

def dna(length):
    DNA = ""
    for i in range(length):
        DNA += choice('atcg')
    return DNA

But it doesn't return equal frequency for all the bases. Is there is any way to do it? (not too complicated...)

2.I have to calculate the frequency of all the bases from a given file. But I'v got a huge file so I have to split it. How can I split the file, send it to a function that calculate frequency (I'v already written it) and return the real frequency?

Thanks!!!

sequence • 4.9k views
ADD COMMENT
0
Entering edit mode

How did you assess that your function didn't return equal frequencies?

What is "huge" in your file? Does it contain one enormous sequence or multiple sequences? How is your function written?

ADD REPLY
0
Entering edit mode

About the second question my file contains only one sequence (human's chromosome) It does'nt matter how my function written. The problem is how to split the file correctly. But anyway this is my function:

def bases_freq (dna_seq_file):
    freq = {} #Creat empty dictionary
    nuc = ['a','t','c','g'] #Initializes the list with all the nucleotides 
    #Count the frequency of the nucleotides in the sequence
    for i in range(len(nuc)):
        freq[nuc[i]] = (str.count(dna_seq_file,nuc[i]))*1.0/len(dna_seq_file)
    freq['gc'] = freq['g'] + freq['c'] #Add "gc" content
    return freq
ADD REPLY
0
Entering edit mode

Thank you so much!!! Can anybody explain me the second question?

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to earlier posts, as such this thread remains logically structured and easy to follow.

ADD REPLY
2
Entering edit mode
8.0 years ago
Steven Lakin ★ 1.8k

"Random DNA sequence" and "equal base frequency" are two different concepts. If you for sure want equal base frequency but you want them in a randomized order, you should generate a string with 5000 A, 5000 C, 5000 G, and 5000 T and then randomly shuffle it using the random module:

import random
dna_list = [x for x in ''.join([ 'ACGT' for i in range(5000)])]
random.shuffle(dna_list)
result = ''.join(dna_list)
ADD COMMENT

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6