I'm writing code to parse the XML output from a BLASTX search on NCBI's servers, and from this grab from NCBI the nucleotide sequence whose translated protein was found to be a hit by BLASTX . This is working, and I am tracking back from the protein ID to the original nucleotide record ID to do this. But it is possible that my original query sequence is from the other strand, and thus BLASTX may have had to reverse complement it to find the match. I want to know if that is so, in order to automatically reverse-complement one of them. BLASTX in discovering the hit knew whether it was a reverse-complement or not, but I can't see any hint of it reporting this back to me in the XML, nor can I see how to get that info from other queries. Yes, I can try go through the trouble myself of reverse-complimenting the hit sequence, try the three reading frames, translate to amino acids, and see if it matches better with my original sequence, but that's what BLASTX has just done and I would rather just get the info from the BLASTX and/or the NCBI databases. Can I?
That is, how can I determine if BLASTX had to reverse-compliment my query sequence when it found a hit?
Thanks! David
Thanks! The query start/end one makes sense to me, but how does query-frame tell me the direction?
Alas, the query_from/query_to and hit_from/hit_to scheme doesn't work.
I have an example where BLASTX returns query_from less than query_to AND hit_from less than hit_to, and yet the query sequence is definitely the reverse complement of both the response protein sequence, and the response DNA sequence if you go after the nucleotide sequence that is eLinked to the protein BLASTX response.
Any other thoughts on how to solve this?
If the Hsp_query-frame is positive (plus strand) then the supplied query sequence was matched. Alternatively if Hsp_query-frame is negative then the reverse complement (minus strand) of the query sequence was matched.