Over- or under-represented motifs in coding regions

0

Entering edit mode

5.8 years ago

Gene_MMP8 ▴ 240

I have a set of di-, tri- and tetra-nucleotide motifs from the coding region of the genome that are over-represented. Is there any way to establish the biological significance of the over-representation? Right now it's just statistically significant with respect to a null model (of random sequences). Just like TRANSFAC that contains the significance of short motifs for regulatory regions, is there an equivalent database for coding regions as well?

R snp • 955 views

ADD COMMENT • link 5.8 years ago by Gene_MMP8 ▴ 240

2

Entering edit mode

What are you trying to show? It seems to me all you’ve found so far is the beginning of codon bias, which is already a well known phenomenon.

ADD REPLY • link 5.8 years ago by Joe 21k

0

Entering edit mode

I have a list of mutations and the motifs are the bases flanking the mutation. I want to check whether over-representation of certain motifs influences the type of mutation.

ADD REPLY • link 5.8 years ago by Gene_MMP8 ▴ 240

1

Entering edit mode

You should really have added the information about the mutations to your post.

First thing you could consider doing is sequence querying the motifs to see if they map to insertion elements and/or inverted repeats. These are common mutation signatures, and you may first want to remove these from your dataset (or at least annotate them as such).

ADD REPLY • link 5.8 years ago by Joe 21k

Login before adding your answer.