Hello, I'm using blat to align genes from one genome to another. This is working well for small sequences (<10kb), but longer sequences are running for an more than a day with no signs of finishing. This seems to be especially true for those 35kb+ and some of the sequences are near 200kb.
Does anybody have suggestions for increasing the efficiency? I've thought about blat-ing 10kb intervals of the genes, but that would pose problems if some intervals fail to map or fail to map uniquely. I've pasted below the code that I'm currently using to run blat given the target genome and sequence, requiring at least a 90% of the sequences match and 97% identity. Thanks!
f=`awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' sequence.fa | tail -1`
a=$(( 9*f/10 ))
blat target.2bit sequence.fa psl/sequence.psl -tileSize=15 -minScore=$a -minIdentity=97