Question

FIMO target coordinates in fasta file

1

Entering edit mode

7.5 years ago

rbronste ▴ 420

Wondering about a relatively quick and easy way to find the following FIMO given coordinates in a fasta file?

Maybe output as a an actual chromosomal coordinate?

Thanks!

pattern name   sequence name   start   stop    strand  score   p-value q-value matched sequence
MA0432.1        mm10_dna        **1140    1157**    +   12.2477 1.44e-05        0.0464  ACCAGTGAGCAAGACCCC

fimo meme motif fasta • 2.4k views

ADD COMMENT • link updated 7.2 years ago by ATpoint 88k • written 7.5 years ago by rbronste ▴ 420

0

Entering edit mode

What's the question, again? The start and stop in mm10_dna fasts should give you the sequence that FIMO found.

ADD REPLY • link 7.5 years ago by Santosh Anand 5.8k

score 1 · Answer 1 · 2018-05-24

FIMO has the option --parse-genomic-coord which looks for the genomic coordinates of the fasta string in the header of each sequence in the format "chr:start-end" (1-based coordinates!). If this is found, the start and stop coordinate of the motif are adjusted to this genomic position, rather than to the start of the fasta string.

Given that you used BEDtools getfasta with a BED file as input (which has 0-based coordinates) to get the DNA sequence in the first space, you would need to add +1 to the input coordinate to make it 1-based, e.g.:

cat test.bed

chr3 187745371 187745782

## Get fasta sequence (header has 0-based coords):
bedtools getfasta -bed test.bed -fi hg38.fa > test.fa

>chr3:187745371-187745782 GTAGCAGCAGCAGCGGCGGCAGCAACAGCAATAATCACCTGGTGTCCGGCCTTTCCTAGAAACTTCTTGCATCACCACTTCTAAGAACCCCAGTTCTAAGAATCAACAGAGCTCAATTCTCGGAATTTGAGCTTCGGACTTTACCACTGCTACGTGGCAGGGGAGGACTTGGTGTCAGCTCTCCGAGATTTTTACTGCCCCTGGCCAACCAAAAGCCCTCAAAGCCACAAGATTTTTTCACTGGCCGGCATATTTCGAGGTCCTCATAAGCAGAGCGTCTCGGATTTGGAGGTTCCGGTTCGAGGCTCGAGGGGCCTGAAGGTGGCTCTCCCTCCCCGGGCCCAAGACGATGGTATGGCCTGCTCCGCCACCATCACGTGGGCTCCTCCTCTGTGACGTCGGCGCCTTCGC

## add +1 to start coord. of the header to make it 1-based:
awk -F ":" '{OFS=""; split($2,a,"-"); if(a[1]) print $1,":",a[1]+1,"-",a[2]; else print; }' test.fa > test_1.fa

chr3:187745372-187745782

## then run fimo:
./fimo --parse-genomic-coord JASPAR2018_CORE_vertebrates_non-redundant_pfms_meme.meme test_1.fa

chr3 fimo nucleotide_motif 187745423 187745436
52.1 + . Name=MA0463.1_chr3+;Alias=Bcl6;ID=MA0463.1-Bcl6-1-chr3;pvalue=6.2e-06;qvalue= 0.00489;sequence=TTTCCTAGAAACTT;

Coordinates of the motif are now adjusted to the genome rather then the fasta string itself.