How to read a DNA sequence for Motif analysis more efficiently?
0
0
Entering edit mode
8.7 years ago
auryndb ▴ 70

I wrote a code in python to read DNA sequences and do a motif alignment on them but I'm looking for a more efficient way to do this. See below if you can help:

handle = open("a.fas.txt", "r")
a = handle.readlines()[1:]
a = ''.join([x.strip() for x in a])
with open("Output.txt", "w") as text_file:
    text_file.write(a)

f = 0
z = 100
b = ''
while f < len(a):
    b += a[f:z]+'\n'
    f += 1
    z += 1
with open("2.txt", "w") as runner_mtfs:
   runner_mtfs.write(b)

I want to do a bunch of analysis on each line of b, but I don't know of any more efficient way to do this. The out put file is more than 500 megabytes. Any suggestions, the first file is just a DNA sequence, and it the first line of code I'm joining all the lines together, and I'm departing 100 base pairs every time so I could do analysis on it.

python • 2.2k views
ADD COMMENT
0
Entering edit mode

Python is pretty slow, particularly at tasks involving I/O and string processing. You'd get a huge speedup using C to process arrays of strings and running an analysis on the substrings within memory (if possible).

ADD REPLY
0
Entering edit mode

So you are generating 100bp fragments from the initial string, with a sliding window of 1bp, to find motifs?

You might be interested in ACGTrie.

It's written in python, but does most of it's work in C arrays using either ctypes / Numpy / CFFI. It's pretty fast if you can use pypy. The end result is table/trie that contains all possible DNA substrings and their counts in a very space efficient format. You could also use a kmer tool for 100bp kmers. That might be the more established path, since ACGTrie is only a proof of concept.

ADD REPLY
0
Entering edit mode

Is there a specific reason you are not using something like the MEME suite (http://meme-suite.org/) for motif finding? In my experience, both the web submit and local version were fast and easy to use.

ADD REPLY

Login before adding your answer.

Traffic: 2333 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6