Entering edit mode
4.3 years ago
Gene_MMP8
▴
240
I have a list of mutations of interest from the coding region in an experiment that I am performing. I have the mutation position, base substitution type (C>T, A>G, etc), Chromosome, and Gene name as input data. Now I was curious to explore the sequences surrounding those particular mutational positions. To do that, I extracted the raw nucleotide sequences 10 bases up and downstream of the mutation position and plot the sequence logos for the same. This is the image.
- One thing to note from this image is that C and G nucleotides are highly conserved in the majority of the locations. How do I build a background model for this and argue that whatever I am noticing here is not by chance and is significant?
- Also, I was also thinking about extracting motifs from the flanking nucleotides and see whether there is an overrepresentation of certain sequence motifs around the mutations. Given I am new to this field, is there a systematic way to do that?
Your seqlogo image shows the same proportion of each nucleotide at every location. If you want to get a 'conservation' score out of your region, you need to give it other species' sequences for context. That's going to be tricky to define. Why not just download a public conservation track for the region.
I understand your point. Can you tell me a bit more about downloading a "public conservation track for the region"? Where can I find this?
I don't know your experimental design or species of interest, but this exon you'll see is missing in chimpanzee. https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr12%3A110912989%2D110913241&hgsid=879063881_SpUoTYb5r8APW0GPAMijO7goYoUs