Dear all,
I apologize for asking a very basic question.
Using local BLAST searches against a genome, I am trying to retrieve full length homologous sequences of a gene of interest. Ultimately, I am interested in knowing the copy number of these genes and would also like to retrieve pseudogenes, if any. However, BLAST search (tblastn; BLAST 2.2) against the downloaded genomes (e.g., M. martensii: http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AYEL01#contigs) is retrieving several partial hits. I wonder if it is because I am BLAST searching against a collection of contigs and not against an assembled scaffold. Do I need to assemble these contigs into scaffolds before BLAST searching or are these files available elsewhere? Alternately, is there any tool that would be beneficial in this process?
Thank you very much in advance,
Regards,
Kartik
If the gene doesn't exist in a full length form in the fasta file, then blast can't return it...
Hey. Thanks for that quick reply. Exactly why I wonder if I need to assemble these transcripts into a scaffold before BLAST searching against it.
If you're only getting partial matches then the answer is yes, you'll need to assemble things further :)
Of course, this all assumes that the gene of interest can even align in its full length against the actual full genome, were it to exist. If that ends up not being the case, then there's nothing you can do but work with the partial hits.