FIMO target coordinates in fasta file
1
1
Entering edit mode
6.9 years ago
rbronste ▴ 420

Wondering about a relatively quick and easy way to find the following FIMO given coordinates in a fasta file?

Maybe output as a an actual chromosomal coordinate?

Thanks!

pattern name   sequence name   start   stop    strand  score   p-value q-value matched sequence
MA0432.1        mm10_dna        **1140    1157**    +   12.2477 1.44e-05        0.0464  ACCAGTGAGCAAGACCCC
fimo meme motif fasta • 2.2k views
ADD COMMENT
0
Entering edit mode

What's the question, again? The start and stop in mm10_dna fasts should give you the sequence that FIMO found.

ADD REPLY
1
Entering edit mode
6.5 years ago
ATpoint 85k

FIMO has the option --parse-genomic-coord which looks for the genomic coordinates of the fasta string in the header of each sequence in the format "chr:start-end" (1-based coordinates!). If this is found, the start and stop coordinate of the motif are adjusted to this genomic position, rather than to the start of the fasta string.

Given that you used BEDtools getfasta with a BED file as input (which has 0-based coordinates) to get the DNA sequence in the first space, you would need to add +1 to the input coordinate to make it 1-based, e.g.:

cat test.bed

chr3 187745371 187745782

## Get fasta sequence (header has 0-based coords):
bedtools getfasta -bed test.bed -fi hg38.fa > test.fa

>chr3:187745371-187745782 GTAGCAGCAGCAGCGGCGGCAGCAACAGCAATAATCACCTGGTGTCCGGCCTTTCCTAGAAACTTCTTGCATCACCACTTCTAAGAACCCCAGTTCTAAGAATCAACAGAGCTCAATTCTCGGAATTTGAGCTTCGGACTTTACCACTGCTACGTGGCAGGGGAGGACTTGGTGTCAGCTCTCCGAGATTTTTACTGCCCCTGGCCAACCAAAAGCCCTCAAAGCCACAAGATTTTTTCACTGGCCGGCATATTTCGAGGTCCTCATAAGCAGAGCGTCTCGGATTTGGAGGTTCCGGTTCGAGGCTCGAGGGGCCTGAAGGTGGCTCTCCCTCCCCGGGCCCAAGACGATGGTATGGCCTGCTCCGCCACCATCACGTGGGCTCCTCCTCTGTGACGTCGGCGCCTTCGC

## add +1 to start coord. of the header to make it 1-based:
awk -F ":" '{OFS=""; split($2,a,"-"); if(a[1]) print $1,":",a[1]+1,"-",a[2]; else print; }' test.fa > test_1.fa

chr3:187745372-187745782

## then run fimo:
./fimo --parse-genomic-coord JASPAR2018_CORE_vertebrates_non-redundant_pfms_meme.meme test_1.fa

chr3 fimo nucleotide_motif 187745423 187745436
52.1 + . Name=MA0463.1_chr3+;Alias=Bcl6;ID=MA0463.1-Bcl6-1-chr3;pvalue=6.2e-06;qvalue= 0.00489;sequence=TTTCCTAGAAACTT;

Coordinates of the motif are now adjusted to the genome rather then the fasta string itself.

ADD COMMENT

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6