Question

Best Practices For Using Fimo For Motif Scanning

5

Entering edit mode

11.0 years ago

UnivStudent ▴ 440

Hi everyone,

I'm having a bit of trouble finding out how to use fimo to discover TFBS. The questions I had were:

How do you decide on good thresholds for PWM matching (--thresh)
How much do the backgrounds matter, should these be calculated from the sequences you're submitting?
Does how does Fimo handle masked fasta files? Is it better to submit hard of soft masked files?

Any other tips on good usage for this program would be much appreciated as I'm finding the documentation quite vague.

meme • 7.8k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 11.0 years ago by UnivStudent ▴ 440

Ram · Accepted Answer · 2013-11-19

3

Entering edit mode

11.0 years ago

Ying W ★ 4.3k

My response is probably not that great but since nobody else is answering this question:

you probably want to set a q-value of something like <0.05 or <0.01 since those cutoffs are pretty standard
backgrounds matter a lot, they should be calculated from the sequence that you are looking for motifs in.
how are you masking the files? with upper and lower cases or with X/Ns? you should be submitting sequence that you want to look for motifs in. If you are uninterested in motifs in repetitive regions, then you should mask them out (with Ns).

Lastly, have you tried using HOMER? The documentation on it is quite extensive and you might find it easier to use

ADD COMMENT • link 11.0 years ago by Ying W ★ 4.3k

0

Entering edit mode

Should you be using genome wide averages? And does it automatically calculate it automatically from the sequence files?
Currently I have the soft-masked (lowercase) but maybe I should consider hard masking.

Also I'll look into HOMER, the main reason I started trying to use FIMO is because it seems to be the status quo in the literature for this type of thing.

ADD REPLY • link 11.0 years ago by UnivStudent ▴ 440

0

Entering edit mode

For 1) I am assuming you are asking about background. You should not be using genome wide averages and instead be using background from sequence file. On the FIMO page it details the background file, you can use the command --bgfile motif-file to generate background from sequence file

MEME suite might have been the status quo a couple years ago but I believe more labs now are using HOMER but each tool has their niche

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 11.0 years ago by Ying W ★ 4.3k

0

Entering edit mode

But wouldn't the sequence file contain subsequences that match the motif you are searching for? This would result in background frequencies estimates that are skewed towards sequences containing motifs, which would increase the false discovery rate of motif searches.

I didn't quite follow what is wrong with using the whole genome to give you the background. If you are searching for motifs in non-coding regions in, say, the human genome, > 97% is non-coding. I doubt the 3% that is coding would skew your background estimates very much.

Even if one did not want to use the whole genome, perhaps looking at non-coding regions nearby your regions of interest would provide a less biased background estimate, instead of using the fasta file containing the sequences of interest.

Am I missing something here?

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Vishaka Datta ▴ 100

2

Entering edit mode

What I meant by that comment is that sequence context will often differ between the regions that transcription factors bind versus the whole genome. See the auto-normalization point here: http://homer.salk.edu/homer/motif/index.html

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Ying W ★ 4.3k

0

Entering edit mode

Ah, now I see what you mean. Thanks :)

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Vishaka Datta ▴ 100