Search for specific motif in MEME analysis
1
0
Entering edit mode
2.7 years ago
Laura ▴ 50

Hello!

I am looking into using the MEME suite to answer some questions about VDR motifs in L1 genes. I am able to use MEME to search for motifs in my fasta data with the web-based tool, where the command would look something like:

meme L1HS_plus.fa -dna -nmotifs 3 -minw 6 -maxw 50

With this, the output contains 3 discovered motifs, like so:

motif_output

but it doesn't look like they contain the VDR motif, which I think should look more like:

expected_output

My question is: is there a way to search for a specific motif, like VDR? Can I search by name? Should I look for more motifs, or limit the size of the motif to 16? I am looking for variations in this motif, so I don't think I can just search for the sequence of bases in this example, and I'm not sure if it is always 16 bp in length?

Thank you!!

meme motif-analysis VDR • 1.1k views
ADD COMMENT
0
Entering edit mode

if you know VDR motif sequence with redundancy, you can search input sequences for VDR motif sequence using tools such as seqkit.

ADD REPLY
1
Entering edit mode
2.7 years ago
Mensur Dlakic ★ 28k

I can't see how you got that result from the command listed. The motifs you have are exactly 50 bp, which should not happen just because the maximum width is set to 50. That makes me think that you are running this analysis on many similar GENE sequences. I am guessing that this needs to be done with PROMOTER sequences for those genes. So after collecting promoters rather than genes, this command may work:

meme L1HS_plus.fa -dna -nmotifs 3 -evt 0.01 -mod zoops -revcomp

Not sure why you are asking for 3 motifs if your interest is only in one motif. Setting the e-value threshold (-evt 0.01) should take care of that. If you know that there may be multiple motif occurrences per single sequence, I suggest you change -mod zoops to -mod anr. I don't think you need to set motif width as default values already cover a reasonable range.

ADD COMMENT
0
Entering edit mode

Thank you for your response, Mensur! Let me try to explain a bit better...

I am looking for VDR binding across various L1 subfamilies. I have a bed file with many regions which I used twobittofa to extract sequences in fasta format. In this example, I was looking at the L1HS subfamily. I was incorrect to call them L1 genes, as they are full-length L1 instances.

I had asked for 3 motifs because I don't know what I am doing, ha! I know that in some of the instances there are single elements that are recurring, but I suppose that doesn't matter, since I am only looking for the VDR motif in these various L1 subfamilies.

I ran the code you provided and the output contains these motifs:

motif1 motif2 motif3

So, still showing 3 motifs with lengths of 50. Does this change or confirm your suspicions?

Thank you SO MUCH!

ADD REPLY
1
Entering edit mode

This looks like extracting motifs from conserved gene sequences, so it is not a surprise that you get long and perfectly conserved motifs. I suspect if you added -maxw 100 to the command line that you would easily get 3 motifs that are 100 bp long. That simply doesn't happen with promoter sequences, which is where the regulatory elements are (as opposed to coding elements). I don't think you will be able to get your motif without extracting only the promoter regions.

ADD REPLY
0
Entering edit mode

Got it. I know what I need to do. I'll report back! Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 2599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6