Hi to all!
I'm struggling to learn MEME, and I have a question on it.
I have 6 promoter sequences of one protein from Arabidopsis. I put them in a fasta file and input it to MEME to find out conserved motifs.
Finally, I got a motif from MEME and I thought there might be any functions or roles genetically since they are conserved promoter motif, so I put it on BLASTn but here is a problem.
The motif has other nucleotide symbol, not ATGC but MHW..stuffs so BLAST reject the input file. What I've done afterwards are 2 things.
- I can get exact conserved motif sequence from MEME composed of only ATGC symbol from 6 sequences, so I put the 6 motif sequences into a single fasta file and BLAST them at once. File looks like this. > motif seq from LOC_1 ATCGCGCTAGTCT > motif seq from LOC_2 CTCGTAGTAGCT > motif seq from LOC_3
And I got many results from BLAST saying many functions on each sequence.
- I put the 6 motif sequences into AME on MEME using JASPAR Redundant database since I'd like to know their functions among all capable species on database. It's been running so far so I don't know whether what I'm doing is right or not......
If there another better way to find out function of motif, please let me know...
Thank you
Hi jared.andrews07. Thank you for your reply.
I have a one question of using BLAST under my situation. The reason you don't recommend using BLAST is because there are too many information of it? Too degenerate to search the motif through BLAST??
Yes, motifs are degenerate, sometimes very much so. What do you hope to gain by running it through BLAST? It's unlikely to yield anything functional (i.e. what's binding it). If you want to find specific instances of the motif in specific regions, you can also do that with MEME (I think FIMO is the tool) or HOMER.
What did your AME run yield? You can also try the HOCOMOCO database (also available from the menu in AME), as I think it has more motifs than JASPAR.
Since the motifs are conserved sequences, I thought that a sequence(or sequences) same or very similar with the motif sequence must be somewhere on other species's genome too, like Mouth's genome or even E.coli's genome, and I can get some (actually, plenty of) result by putting the sequence into BLAST. What I expect is, It could be on a genome related to development of hand in mouth, or metabolite a nutrient for E.coli. This is what I wanna know at the end.
But there is no meaningful result like you said... could you explain why? I think I'm confused with notions of motifs and purpose of BLAST.
BTW, AME presents what I wanna know. In case of database, the reason I chose JASPAR is what I wanna know is functions of motif I input in other species. And JASPAR is integrated database of other species, isn't it?
Hmm, I don't know how well-conserved motifs for homologous proteins are between arabidopsis and something as far removed as E. coli. BLAST just looks for conserved regions, and it's really not meant for 8-12 bp targets. It's good for finding regions with conserved sequence within/across organisms - particularly useful for coding sequences, full genes, etc. in efforts to find homologs. It has many other uses as well, but I don't think it's well-suited to your task. I am sure the motifs you found could also be found in other genomes - they're only 8-15 bases long and are degenerate. Even if you use the absolute consensus sequence from each, you'll likely have at least a few hits on any genome just by chance.
I'm still rather confused by your actual question - are you trying to find genes of similar function to your gene of interest in other organisms? Or are you trying to determine what might be regulating expression of your gene of interest in arabidopsis? Motifs may help you with the latter question, but are unlikely to yield anything useful for the first one. Motifs are typically used to describe the sequence to which a transcription factor or DNA-binding protein binds. They're often derived from ChIP-seq experiments for the TF of interest, where the regions (usually defined by peak-calling algorithms) heavily bound by the TF are used to find conserved sequences that occur at a much higher rate than background. If the data is high quality, these experiments usually provide relatively high confidence motifs for that given TF or protein.
These motifs are collected in databases like JASPAR. Users can then scan regions (like promoters or enhancers of differentially expressed genes) to find motifs that are enriched versus background, as you did with AME for your promoters. These tools will also compare these motifs to those databases of known motifs to tie them back to specific TFs or proteins that may bind those regions. These helps users tie together gene expression and the epigenomic landscape with specific transcriptional regulators.
JASPAR has many different collections of motifs, and yes, some of them do span several organisms.