I've been trying to use BLAT to align some sequences to the axolotl genome. I have all the required files and have finally got BLAT to actually run (I'm running it on my own mac in terminal), and this is the error that I keep encountering:
Assertion failed: ((hit->tStart >> bucketShift) < bucketCount), function clumpHits, file genoFind.c, line 1806.
Anyone have any ideas of what could be the issue? Thanks for any help
What parameters or options are you passing to blat? Perhaps provide the full command you are running.
I'm running it at a very basic level. I should say I don't have a background in CS, so that might be why I'm having issues. Here's the command:
./blat Genome.fa Input.fa output.psl
Genome.fa is just the full axolotl genome, and Input.fa is just the sequences I'm trying to align.What kind of sequences are in
Input.fa
?blat
is meant to be used with sequences that are similar. It is not a tool meant to find distantly related sequences.They are a bunch of sequences varying in length from 200-1000 bp, so could the differences in length be the issue?
No that should not be an issue. Are your input sequences in multi-fasta format?
I'm not sure what that means, it's a TRINITY transcriptome assembly of genes expressed during certain stages of axolotl development.
Edit: should've Googled multi fasta before posting, my bad. Yes, it basically is in multi-fasta format. Here's the type of thing each sequence in preceded by:
>TRINITY_DN20780_c0_g1_i1 len=639 path=[1367:0-311 1382:312-312 1381:313-313 1380:314-337 1379:338-462 1362:463-638] [-1, 1367, 1382, 1381, 1380, 1379, 1362, -2]
Can you take 3-4 sequences from your file put them in a new file and see if you still get that error using this new file? It is possible that all that
cruft
(extra stuff) in fasta header may be causing a problem. If the search works then there may be a specific sequence in your data that is causing that error. If it does not then you may need to shorten those headers. But one step at a time.I tried it with only one sequences and still got the same error. I then shortened the header so that it only contained the carrot >, still the same error. For the heck of it, I decided to remove the header entirely, and that gave the following error:
Couldn't open TCAGCTTTGTTCTCCGCTGCCATTTTCTCATATTGTGTACGGATTTCCGCGATCGCTACA , No such file or directory
That is the first line of the sequence in Input.fa, not even the entire sequence. This is really perplexing me.
Just ran a test
It should be just that simple.
Can you try running your test file with one sequence (with header) against itself? That would make sure your
blat
works properly.Long headers are not a problem either.
Yup that worked, so that leads me to believe that somethings wrong with the axolotl genome I'm using. For one thing, it is massive, I can't even open the fasta file that it's in because it's 28 gb, so I'm not sure what it looks like. Maybe I'll try converting it to a 2bit file, not sure how that would change things, but it's worth a shot.
Worth a try. Let us know how that goes.
It did not work. Starting to feel like I'm running out of ideas, any further suggestions? Thanks for all the help so far! I really appreciate it.