Entering edit mode
11.1 years ago
UnivStudent
▴
440
Hi everyone,
I'm having a bit of trouble finding out how to use fimo to discover TFBS. The questions I had were:
- How do you decide on good thresholds for PWM matching (--thresh)
- How much do the backgrounds matter, should these be calculated from the sequences you're submitting?
- Does how does Fimo handle masked fasta files? Is it better to submit hard of soft masked files?
Any other tips on good usage for this program would be much appreciated as I'm finding the documentation quite vague.
Also I'll look into HOMER, the main reason I started trying to use FIMO is because it seems to be the status quo in the literature for this type of thing.
For 1) I am assuming you are asking about background. You should not be using genome wide averages and instead be using background from sequence file. On the FIMO page it details the background file, you can use the command
--bgfile motif-file
to generate background from sequence fileMEME suite might have been the status quo a couple years ago but I believe more labs now are using HOMER but each tool has their niche
But wouldn't the sequence file contain subsequences that match the motif you are searching for? This would result in background frequencies estimates that are skewed towards sequences containing motifs, which would increase the false discovery rate of motif searches.
I didn't quite follow what is wrong with using the whole genome to give you the background. If you are searching for motifs in non-coding regions in, say, the human genome, > 97% is non-coding. I doubt the 3% that is coding would skew your background estimates very much.
Even if one did not want to use the whole genome, perhaps looking at non-coding regions nearby your regions of interest would provide a less biased background estimate, instead of using the fasta file containing the sequences of interest.
Am I missing something here?
What I meant by that comment is that sequence context will often differ between the regions that transcription factors bind versus the whole genome. See the auto-normalization point here: http://homer.salk.edu/homer/motif/index.html
Ah, now I see what you mean. Thanks :)