Question

Meme And Meme-Chip For Large Datasets

1

Entering edit mode

11.8 years ago

ashwini ▴ 100

Hi, I am facing problems while running meme for large data (17Mb) The input fasta file contains- 6623 -sequences 9925 - maximum length of the sequence 367 - minimum length of the sequence

The command line summary of meme is as follows- meme in.fa -nmotifs 1 -o out_dir -maxsize 150000000

But meme halts after a certain point without any error message-

So, I tried with meme-chip which is designed for large datasets. meme-chip works well with default parameters where it takes random 600 sequences and trimming then to central 100 positions (100bp). But this is not the required outcome. Therefore in the command I used the following parameters -ccut 0 -nmeme 6623 -meme-nmotis 1 Even this fails with the following error message - "Dataset too large (> 100000). Rerun with larger -maxsize" But, there is no option in meme-chip to set -maxsize

Your valuable inputs helps me in taking this forward

meme • 5.4k views

ADD COMMENT • link updated 11.8 years ago by Ian 6.1k • written 11.8 years ago by ashwini ▴ 100

score 1 · Answer 1 · 2013-10-23

1

Entering edit mode

11.8 years ago

Anthony Mathelier ▴ 910

MEME-chip is using MEME on an expected small dataset so this is why you are not able to set the maxsize argument.

For such a large dataset, I would recommend you use RSAT instead (http://rsat.ulb.ac.be/).

Hope it helps.

ADD COMMENT • link 11.8 years ago by Anthony Mathelier ▴ 910

score 0 · Answer 2 · 2013-10-23

MEME works well with a very defined set of sequences, e.g. the 600 most significant ChIP-seq summit regions comprising of 100bp centred on peak summits. For a much larger motif analysis, such as you are attempting, Weeder gives a fast and informative answer. It enumerates counted words in your sequences (compared to background counts upstream of genes). Words of 6 and 8bp takes a trivial amount of time to run, but 12mers take an exponentially long amount of time.