Identifying the orientation of a CTCF motif?
1
0
Entering edit mode
7.1 years ago
Sinji ★ 3.2k

I'm analyzing some CTCF ChIP-seq data, i'm interested in recording the orientation of CTCF sites as they have been shown to have important roles in the underlying biology. I can't seem to find any information on how to do this, despite it being fairly popular. Perhaps just not using the right search terms. Any ideas?

CTCF ChIP-Seq • 6.3k views
ADD COMMENT
5
Entering edit mode
7.1 years ago

Use the CTCF motif to scan the peaks. The directionality of the motif match presumably tells you the CTCF orientation.

For instance, you can use this CTCF motif (save in a tab-separated text file):

>C2H2_ZF_Average_200
0.081779124449  0.816566257007  0.0503624700168 0.0512921485275
0.00454091560919        0.992667683465  0.000844143310774       0.00194725761473
0.729859139975  0.0190570790231 0.169685871266  0.0813979097358
0.03204130191   0.630845335167  0.323279272143  0.0138340907799
0.123093260475  0.494372096106  0.0637025611488 0.31883208227
0.901255804964  0.0197757515554 0.0514199924903 0.02754845099
0.0032323521056 0.00108383141845        0.992842879801  0.00284093667487
0.416975026239  0.006048776898  0.5707589234    0.00621727346346
0.0353963142494 0.0316137184991 0.579714716892  0.353275250359
0.00986220321585        0.00125463577494        0.985814739139  0.00306842187041
0.0950088577041 0.0355503655191 0.815560120364  0.053880656413
0.0980555406351 0.793094920278  0.0235874699945 0.0852620690928
0.362317695845  0.0268864366115 0.577387352297  0.0334085152465

For scanning, you can try GimmeMotifs. Using gimme scan you can use this motif to scan your peaks. Replace hg38 with your genome of interest.

$ gimme scan CTCF_peaks.bed -p CTCF.pwm -g hg38 -b > CTCF_motifs.bed

This will report at most one match per peak, with an estimated FPR of 1% based on random genomic sequences. The strand column in the BED output will tell you the direction of the motif.

ADD COMMENT
0
Entering edit mode

This is excellent, thank you very much!

ADD REPLY
0
Entering edit mode

Even if an old answer, I am using it for my purposes. I want to have a final bed file with CTCF colour coded annotation according to the motif orientation on the genome. But I keep having problems with the code.

I am using this command: gimme scan MK_CTCF_From_Romina_hg38_c10.0_l245_g100_peaks -p CTCF.pwm -g hg38 -b > CTCF_motifs.bed

this is the structure of my bed file:

chr1:16100-16375
chr1:103922-104996
chr1:138811-139325
chr1:267382-268156
chr1:609167-610766
chr1:665825-666188
chr1:778686-779058
chr1:857890-858227
chr1:869737-870095
chr1:904592-904947

and this is the CTCF.pwm

>CTCF_known1 CTCF_1 CTCF_jaspar_MA0139.1
Y 0.095290 0.318729 0.083242 0.502738
D 0.182913 0.158817 0.453450 0.204819
R 0.307777 0.053669 0.491785 0.146769
C 0.061336 0.876232 0.023001 0.039430
C 0.008762 0.989047 0.000000 0.002191
A 0.814896 0.014239 0.071194 0.099671
S 0.043812 0.578313 0.365827 0.012048
Y 0.117325 0.474781 0.052632 0.355263
A 0.933114 0.012061 0.035088 0.019737
G 0.005488 0.000000 0.991218 0.003293
R 0.365532 0.003293 0.621295 0.009879
K 0.059276 0.013172 0.553238 0.374314
G 0.013187 0.000000 0.978022 0.008791
G 0.061538 0.008791 0.851648 0.078022
C 0.114411 0.806381 0.005501 0.073707
R 0.409241 0.014301 0.557756 0.018702
S 0.090308 0.530837 0.338106 0.040749
Y 0.128855 0.354626 0.080396 0.436123
V 0.442731 0.199339 0.292952 0.064978

I get this error message:

    Traceback (most recent call last):
  File "/Users/luca/anaconda3/envs/gimme/bin/gimme", line 513, in <module>
    args.func(args)
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/pwmscan.py", line 170, in pwmscan
    normalize=args.zscore,
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/commands/pwmscan.py", line 113, in command_scan
    fa = as_fasta(inputfile, genome)
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/gimmemotifs/utils.py", line 613, in as_fasta
    genome.track2fasta(seqs, tmpfa.name) 
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/genomepy/functions.py", line 466, in track2fasta
    track_type = get_track_type(track)
  File "/Users/luca/anaconda3/envs/gimme/lib/python3.6/site-packages/genomepy/functions.py", line 231, in get_track_type
    if isinstance(track, []):
TypeError: isinstance() arg 2 must be a type or tuple of types

Do you have any suggestions to sort this out?

ADD REPLY
0
Entering edit mode

Hello lu, your bed file doesn't look like a standard bed format, you can check the standard bed format on https://genome.ucsc.edu/FAQ/FAQformat.html#format1, also you can see the example of how to use gimme scan on https://gimmemotifs.readthedocs.io/en/master/tutorials.html#scan-for-known-motifs

ADD REPLY
0
Entering edit mode

Hello yztxwd, Thanks for your suggestions

ADD REPLY
0
Entering edit mode

Hi, Where to find mouse CTCF pwm file?

ADD REPLY
0
Entering edit mode

I found no mouse file on JASPAR website. So do CTCF binding motifs the same in mouse as in human?

ADD REPLY

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6