Question

Tools For Chipseq Scale De Novo Motif Finding On Unaligned Sequences?

2

Entering edit mode

13.5 years ago

2184687-1231-83- ★ 5.1k

Following up on this question: http://biostar.stackexchange.com/questions/598/tools-for-chipseq-scale-motif-finding

I've got a large amount of unaligned eukaryotic regulatory sequences and I want to do de novo motif discovery on them. These unaligned regulatory sequences are already filtered from reads that have no mapping, or reads that wouldn't make a peak.

I've seen most tools require aligned sequences and/or search only for a list of pre-defined motifs.

In it's simplest form, what I am looking for is a program that would read file.fa, where file.fa contains ~1M 50-200bp regulatory sequences, and produce the motif predictions, not needing to align it to a reference or scan for known motifs.

Does anybody know of a tool that would work for this amounts of unaligned fasta sequences and do de novo motif discovery?

chip-seq motif denovo • 4.3k views

ADD COMMENT • link updated 13.4 years ago by Dataminer ★ 2.8k • written 13.5 years ago by 2184687-1231-83- ★ 5.1k

3

Entering edit mode

How large were your ChIP fragments, and how far did you sequence in? As ChIP-seq sequences from the end of your fragment inwards, do you think the unaligned reads will even have the potential regulatory motifs contained within them?

ADD REPLY • link 13.5 years ago by Aaron Statham ★ 1.1k

1

Entering edit mode

On prokaryotic or eukaryotic data ?

ADD REPLY • link 13.5 years ago by Pasta ★ 1.3k

0

Entering edit mode

These unaligned regulatory sequences are already filtered from reads that have no mapping, or reads that wouldn't make a peak. So most of the data with no potential is already filtered out.

ADD REPLY • link 13.5 years ago by 2184687-1231-83- ★ 5.1k

0

Entering edit mode

It's in eukarya

ADD REPLY • link 13.5 years ago by 2184687-1231-83- ★ 5.1k

score 3 · Answer 1 · 2011-06-16

3

Entering edit mode

13.5 years ago

Amyemilie ▴ 30

Hi,

Im using GimmeMotifs, it is a de novo motif prediction pipeline, especially suited for ChIP-seq datasets.

Its free, easy to install and to launch. I also think this is the more precise tool on internet.

Good luck :).

http://www.ncmls.eu/bioinfo/gimmemotifs/

ADD COMMENT • link 13.5 years ago by Amyemilie ▴ 30

1

Entering edit mode

This looks interesting, thanks. How robust are its predictions?

ADD REPLY • link 13.5 years ago by Alastair Kerr 5.3k

score 1 · Answer 2 · 2011-06-16

1

Entering edit mode

13.5 years ago

Mikael Huss 4.8k

I don't quite see why there would be an issue with unaligned reads, as most de novo motif finding algorithms accept FASTA input.

You could try CisFinder or ChIPMunk. The already proposed GimmeMotifs seems nice too.

ADD COMMENT • link 13.5 years ago by Mikael Huss 4.8k

0

Entering edit mode

In it's simplest form, what I am looking for is a program that would read file.fa, where file.fa contains ~1M 50-200bp regulatory sequences, and produce the motif predictions, not needing to align it to a reference or scan for known motifs. Would CisFinder or ChIPMunk work like that?

ADD REPLY • link 13.4 years ago by 2184687-1231-83- ★ 5.1k

0

Entering edit mode

Yes - although 1 million is a lot. The most I have tried was about 100,000 sequences with CisFinder, which worked well.

ADD REPLY • link 13.4 years ago by Mikael Huss 4.8k

score 1 · Answer 3 · 2012-03-30

1

Entering edit mode

12.7 years ago

Dataminer ★ 2.8k

Try GimmeMotifs, it is one of the best in business and Emilie has done an internship on the same. Wish you luck

ADD COMMENT • link 12.7 years ago by Dataminer ★ 2.8k