Question

transcription binding site prediction from fasta sequence

0

Entering edit mode

2.3 years ago

newbio • 0

Hi all I have 2 kb long sequence which is promoter region of a gene. From that sequence I would like to pinpoint transcription factor binding sites of smad3. How can I do that and is it possible with JASPAR?

Thanks

transcription site jaspar • 1.0k views

ADD COMMENT • link updated 2.3 years ago by ATpoint 85k • written 2.3 years ago by newbio • 0

0

Entering edit mode

Hi, you can use python to find all the smad3 binding sites in your query sequence, for example if smad3 binding site is 5'-GTCTAGAC-3' you can use the following dirty python code to get all your sites

#usr/bin/env python3
from re import finditer
query="your 2 kb DNA sequence"
smad3_binding_site="GTCTAGAC"
for matches in finditer(smad3_binding_site,query):
    print(matches.span(), matches.group())

hope it helps.

ADD REPLY • link 2.3 years ago by Prosad ▴ 30

score 3 · Answer 1 · 2022-08-12

3

Entering edit mode

2.3 years ago

rpolicastro 13k

JASPAR allows exporting motifs in MEME format which you can use along with FIMO from the MEME suite to look for specific motif occurrences in your sequence(s). As opposed to exact string matching it takes into consideration the probability of each base occurring per-position since chromatin binding proteins can be promiscuous with their binding sites.

ADD COMMENT • link 2.3 years ago by rpolicastro 13k

1

Entering edit mode

There is now also a wrapper package for the MEME suite (that includes FIMO) in R/Bioconductor: https://bioconductor.org/packages/release/bioc/html/memes.html You could use the HOCOMOCO motif collection, they offer downloads of motifs directly in MEME format.

ADD REPLY • link 2.3 years ago by ATpoint 85k