I have identified a motif from my chipseq experiment and want to find location of binding sites of such motif in my 10 genes of interests. What should be the best way or tool to do this task.
Thanks
I have identified a motif from my chipseq experiment and want to find location of binding sites of such motif in my 10 genes of interests. What should be the best way or tool to do this task.
Thanks
Convert the motif to a MEME representation, unless you have this already. Use this MEME matrix with FIMO to find hits or binding sites across the genome at a specified threshold. Use BEDOPS bedmap
to map these binding sites to gene annotations, converted to BED with BEDOPS convert2bed
, if necessary.
Motif ID Alt ID Sequence Name Strand Start End p-value q-value Matched Sequence
2 16 - 52723 52751 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
2 17 - 78101 78129 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
2 17 - 100740 100768 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
Above are few lines of FIMO output it does not give any gene names, or exact coordinates. The coordinates it is showing are from the fasta seq that has been extracted 2500 bp TSS. so what should be intersected? If I start intersecting Sequence I think that may not be very efficient and correct way of doing. One seq may be present in multiple location then which p value should I assign to that match. Thanks
Your FIMO output should look like this, I think:
http://meme-suite.org/doc/examples/fimo_example_output_files/fimo.txt
You can convert that style of output to sorted BED:
$ awk -v OFS="\t" '{ print $3, $4, $5, $1, $8, $6 }' | sort-bed - > fimo.bed
Can you take another look and confirm?
# motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence
2 KI270742.1 15757 15785 - 49.5258 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
2 KI270755.1 21577 21605 - 49.5258 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
2 KI270714.1 22078 22106 - 49.5258 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
2 KI270719.1 22816 22844 - 49.5258 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
2 KI270746.1 34414 34442 - 49.5258 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
2 16 44269 44297 - 49.5258 2.77e-19 4.86e-14 CTCTGTCGCCCAGGCTGGAGTGCAGTGGC
This is what I am getting
You can use GimmeMotifs (disclaimer: I am the author). You will have to use a motif representation (positional frequency matrix) that looks like this:
>motif_name
0 0 0 1
0.2 0 0 0.8
Then you can use gimme scan
to scan with this motif:
$ gimme scan sequences.fa -p motif.pwm -g hg19 -b
Here the -b
argument specifies BED output. See the full documentation of gimme scan
here.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Version 4.12.0, hg19 default settings
HI,
What if I don't have "positional frequency matrix".