I have a naïve but complex question. I used RSAT to get 5 genes -2000 bp upstream sequence of TSS. I used this FASTA file and binding motif (identified from my experiment) in FIMO to see where is the binding site of the identified motif. I know that protein of interest bound very close to TSS. I get following results from FIMO output:
to get -2000 upstream TSS seq
# motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence Chr start End Strand
2 D20_ENSG00000130164-LDLR-ENST00000557958 80 108 + 41.9286 1.14E-14 6.18E-10 CTCTGCCACCCAGGCTGGAGTGCAATGGC chr19 11102268 11104267 D
2 D102_ENSG00000161048-NAPEPLD-ENST00000425379 106 134 - 37.6735 4.50E-13 1.15E-08 CTCTGTCACCCAGGCTGGAATACAGTGGC chr7 103128761 103130760 R
2 D17_ENSG00000130164-LDLR-ENST00000558518 309 337 - 32.8367 1.19E-11 1.10E-07 CTCTGTCACCCAGGCTGGAGCGCAGTGAC chr19 11130163 11132162 D
In the table above column 3 has gene/ transcript name (name Is trimmed from default names because FIMO will not expect long names) column 4 -6 show motif binding regions start, end and strand respectively. Last four columns are Chr, start end and strand corresponding to -2000 bp upstream FASTA that was used as input in FIMO. The problem is I could not figure out how should I location of column 4 and 5 (start and end of motif binding region) to column 12 and 13 that represent original FASTA coordinates.
In row 1: original FASTA (Column14) and motif binding (column 6) are both on forward strand. to get location of column 4 and 5 with in column 12 and 13, should I simply be doing 11102268 +80 and 11104267 – 108. But it does not give me the sequence and insert is also >29. Similarly, if in row 2 both binding motif and FASTA seq are in Reverse strand how should I map to coordinates in FASTA file.
Did you try using BLAT?
P.S. Actually, did you try Ctrl+F in some sequence editor? Sequence editors (e.g. APE) search sequence on either strand. Once you know the pattern, it will be easy in future.
That output is similar to FIMO output format as document here. But not identical. How exactly did you produce it.
Did you use retrieve-seq to get your upstream sequences? What did the first line in that file look like?