Entering edit mode
3.5 years ago
Lila M
★
1.3k
Hi there, I have near 1000 coordinates of regions of interest, and I would like to know if there is any (sort, ~4-10bp) homology among them. I've first tried to run a discovery motif using homer, but the results are not as expected and I am not sure if is the best approach. My second plan is to generate sequences for those regions using UCSC table and then find the homology using BLAST. However, I don't know if this is the best approach and I don't know if I can run 1000 sequences altogether in BLAST for homology. Any advice would be great
Thanks
What is the size range of these regions? ~4-10 bp can't really be called homologous as they may simply be present by chance (depending on size of your sequences).
You could use one of the k-mer enumeration programs (
jellyfish
(LINK) orkmercountexact.sh
from BBMap suite) and look for k-mers shared among your sequences.The length of the regions is very variable. Regarding your concern about the small size of the homologous you are right and that is my concern too. Do you think is better to use a k-mer enumeration programs than using BLAST? Also If I've tried something bigger as 10 bp?
Thanks!
If the regions are small then blast may not work (I assume you have tried it already). If the regions are random then large enough number of sequences may not share a
motif
for it to be identified. Try k-mer approach. It should work for any length of sequences.Thank you for your advice. Regions are very random, some big, some small. I will try the K-mer approach then. Thanks!