Question

Prioritizing Tfbs Snps

2

Entering edit mode

14.2 years ago

Jiny ▴ 20

I have selected 147 functional SNPs using genomatrix in a set of genes and tried to analyze the polymorphic status of the SNPs. 47 were polymorphic and located in TFBS (Transcription factor binding site). Can anyone please suggest me methods of prioritizing the polymorphic SNPs using bioinformatics So that I will be able to reduce the number of SNPs for further high throughput genotyping.

snp transcription binding • 3.7k views

ADD COMMENT • link updated 5.4 years ago by Biostar 20 • written 14.2 years ago by Jiny ▴ 20

0

Entering edit mode

What If we already have a TFBS (ChIP-Seq) dataset ? Can I use GATK ?

ADD REPLY • link 13.9 years ago by Curiosity ▴ 130

score 3 · Answer 1 · 2011-05-27

Montgomery et al in "A survey of genomic properties for the detection of regulatory polymorphisms" report that "distance to transcription start site, local repetitive content, sequence conservation, minor and derived allele frequencies, and presence of a CpG island" have discriminatory potential for identifying rSNPs.

Casey Bergman · Answer 2 · 2011-05-27

1

Entering edit mode

14.2 years ago

Dataminer ★ 2.8k

Have you tried MAPPER click here this might solve your problem to an extent... in my case it did.

ADD COMMENT • link updated 14.2 years ago by Casey Bergman 18k • written 14.2 years ago by Dataminer ★ 2.8k

score 1 · Answer 3 · 2011-05-27

MAPPER is our tool of choice as well as it uses both TRANSFAC and JASPAR motifs. Here's how we've analyzed SNPs with MAPPER:

Take a 41-bp segment of the genome with your SNP at position 21. That is 20bp of genome seq on either side of the SNP. I use 20 because the biggest models MAPPER uses are about 15 bp. Copy this sequence and append it to the end of your 41 bp segment and place an N between the two concatenated sequences (I use the N as a spacer or punctuation mark). Put allele 1 at position 21 and allele 2 at position 63. You have a sequence of 83 bp in teh following format:

(20 bp of genome, or bases 1-20)-allele 1-(next 20 bp of genome, or bases 22-41)-N-(20 bp of genome, or 1-20)-allele 2-(next 20 bp of genome, or 22-41)

In this manner I can assay one sequence to cover both alleles. Other approaches will work as well - e.g. two queries each with a different allele. Do as you wish.

Run MAPPER and save your results. I filter the results by score and E-value to retain only the most likely predictions.

I then look at for allele-specific binding of transcription factors that are relevant to the phenotypes we're following. This last point means that I delete those predictions that are for plant and invertebrate TFs. I am also not interested in many TFs that do not have a role in our research topics (obesity, diabetes, e.g.). For me, the predictions by MAPPER must encompass the positions where the SNP alleles are in the query sequence - positions 21 and 63.

I can highly recommend this approach as it has given us many good associations, even several that show interactions with components of the environment that drive activation of the TFs predicted by MAPPER.

score 0 · Answer 4 · 2013-07-12

0

Entering edit mode

12.1 years ago

mulin0424.li ▴ 120

Combining the genetic and epigenetic features by recent ENCODE project, a tool named GWAS3D can help you quit a lot on regulatory SNPs prioritization. Please visit this site: http://jjwanglab.org/gwas3d

ADD COMMENT • link 12.1 years ago by mulin0424.li ▴ 120