Entering edit mode
7.8 years ago
Lila M
★
1.3k
Hi everybody,
Is the first time that I've tried to look for TATA box in a gene list. I used homer (findMotif.pl) to do that, but I don't know how to read the results. Is there any way in which I can identify only the TATA boxes for a set of gene list? Ideally, I would like to know the frequency of the genes in my gene list with TATA box.
Thank you very much in advance!
Please, can you be more clear on what you mean with "gene list"? If it is a FASTA file with the sequences of the genes of your organism, like a transcriptome, you won't find TATA boxes (or maybe you will find only false positives) because it usually resides before the startpoint of transcription.
as HMMER accepts Ensembl Gene IDs list, this is what I am using at that moment
Did you use the
-find <motif file>
option?No, How can create this "motif file"?
Read carefully the documentation in the link you sent me. http://homer.ucsd.edu/homer/microarray/index.html
At the section "Finding Instances of Specific Motifs" they explain what you need.
Also: http://homer.ucsd.edu/homer/motif/creatingCustomMotifs.html
Yes, I did, but for me is not very intuitive, what I am trying is download the TATA motif and use it at <motif file=""> , could that work?
Thank you!
I can't tell unless you paste what is inside the motif_TATA.motif...
I assume you have to specify a motif name one per line, didn't the second link help on that?
Yes, and I assume that is the same, right?
Was this created with
seq2profile.pl
?seq2profile.pl <consensus> [# mismatches] [name] > output.motif
i.e. seq2profile.pl TATA 0 ets > output.motif
I think that is not necessary, If I downloaded the matrix from HOMER (paste above), is can be recognized.
So please, provide us the output that you're not able to understand and we can see if someone of us does! ;)
The output of the result! As in it I can see the sequence, is not exactly the TATA sequence (I don't know if HOMER only report the most similar one) so How can be sure that the genes that the output report has TATA boxes?
These kind of predictions usually come with an e-Value or a probability score. In this case, you have a lod-score (logarithm of the odds) that is associated to every line, present in field number 6 as "MotifScore". The documentation you provided says, at some point:
"Motif Score (log odds score of the motif matrix, higher scores are better matches)"
A good approach would be to plot all the lod-scores and see the distribution to infer which ones are the good ones are which ones are not.
You are right, but for me "higher scores are better matches" is not much informative (what is considered higher and lower?, how can I set a proper cut-off? in my opinion there is not much information and is a bit complicate be sure about the result...
Plot them > see the distribution > see what others do > decide your thresholds.
There is no good threshold, said once a lab guru. :)
What others do:
What Is The Lod Score Replication Threshold For Linkage Analysis?
http://www.bio.brandeis.edu/InterpGenes/Project/align16.htm
https://www.mun.ca/biology/scarr/LOD_analysis.html
Thank you very much for the information! :)