Hi all,
I have about 100 sequences of different transcription start sites with a SNP in them and I would like to know if this SNP is affecting the transcription factor binding site. These SNPs are already influencing the expression of these genes, so this would be interesting to show that these SNPs are the actual causal variant.
My first guess was to put both allele in LASAGNA 2.0 and use jaspar CORE matrices "all vertebrates".
>seq_reference
CCATCTTGCGTCGCTCTTGCTTGAAGGCCG
>seq_alternative = higher expression
CCATCTTGCGTCGCTGTTGCTTGAAGGCCG
output:
seq_reference
Name Sequence Position
(0-based) Strand Score p-value E-value
TFAP2A
(MA0003.1) GCCTTCAAG 19 - 7.54 0.00085 0.0187
PBX1
(MA0070.1) CCTTCAAGCAAG 15 - 7.58 0.00065 0.0123
Pax6
(MA0069.1) TTCAAGCAAGAGCG 11 - 10.85 5.0E-5 0.00085
seq_alternative
Name Sequence Position
(0-based) Strand Score p-value E-value
BRCA1
(MA0133.1) GCAACAG 13 - 6 0.001 0.0240
TFAP2A
(MA0003.1) GCCTTCAAG 19 - 7.54 0.00085 0.0187
Pax6
(MA0069.1) TTCAAGCAACAGCG 11 - 9.78 0.0002 0.0034
But I don't see much difference and I don't really understand how to interpret this. Does anyone know if this is the right way to do it? Or have other ideas? Or is this the good way to do it only this is a bad example?
Thanks for your reply! Can I also see then whether the binding site is disrupted and so this SNP can be identified as the causal variant?