Blast+ remote search for refseq reference genome
0
0
Entering edit mode
5 weeks ago
Chiranth • 0

Hello all i am currently using the latest blastn: 2.16.0+ i have a protein sequence and i want to tblastn it aganist refseq reference genome using the remote function in the offline Blast+ that i have I am building a tool and i need to see from what reference genome(basically the start and end postion) this protein sequence is mapped onto here is the following command: tblastn -query query.fasta -db refseq_representative_genomes -remote -out results.txt -max_target_seqs 5 -entrez_query "txid4932[Organism]"

please tell me what exactly should i use in db?

Here is the database that i actually wanna tblastn it aganist

I have also tried using the API service but it doesn't work, it only works for nt

here's the sample code:

import requests
import time
accession="AJR77291.1"
taxid='4932'

subj_seq_url = f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&retmode=text&id={accession}"
response = requests.get(subj_seq_url)
if response.status_code == 200 and response.text:
            sequence = "".join(line.strip() for line in response.text.splitlines() if not line.startswith(">"))
            print(f"Retrieved sequence for accession {accession}:\n{sequence[:60]}...")  # Show a snippet
else:
            print(f"Failed to fetch sequence for accession {accession}. Skipping to the next accession...")


tblastn_url = "https://blast.ncbi.nlm.nih.gov/Blast.cgi"
tblastn_params = {
    "CMD": "Put",
    "PROGRAM": "tblastn",
    "DATABASE": "ref_euk_rep_genomes",  #prok_complete_genomes--> for prokaryotes
    "QUERY": sequence,
    "ENTREZ_QUERY": "txid4932 [organism]",  # Replace with actual taxid
    "HITLIST_SIZE": 5,
    "FORMAT_TYPE": "Text"
}


tblastn_response = requests.post(tblastn_url, data=tblastn_params)
with open("tblastn_1.txt", "w", encoding='UTF-8') as file:
        file.write(tblastn_response.text)
if "RID" in tblastn_response.text and "RTOE" in tblastn_response.text:
            rid_start = tblastn_response.text.find("RID = ") + len("RID = ")
            rid_end = tblastn_response.text.find("\n", rid_start)
            rid2 = tblastn_response.text[rid_start:rid_end].strip()

            rtoe_start = tblastn_response.text.find("RTOE = ") + len("RTOE = ")
            rtoe_end = tblastn_response.text.find("\n", rtoe_start)
            rtoe2 = int(tblastn_response.text[rtoe_start:rtoe_end].strip())
            print(f"tBLASTn job submitted successfully for {accession}. RID: {rid2}, Estimated time: {rtoe2} seconds.")

else:
            print(f"Failed to submit tBLASTn for {accession} using organism and taxid. Trying with genus...")  

print(f"Waiting for tBLASTn for {accession} to complete...")
time.sleep(rtoe2)



while True:
        result_params = {"CMD": "Get", "FORMAT_OBJECT": "SearchInfo", "RID": rid2}

        result_response2 = requests.get(tblastn_url, params=result_params)

        if "Status=WAITING" in result_response2.text:
            print("Searching...")
            time.sleep(60)
            continue

        if "Status=FAILED" in result_response2.text:
            print(f"tBLASTn search failed for {accession}. Trying next accession...")
            break

        if "Status=READY" in result_response2.text:
            print(f"tBLASTn search complete for {accession}. Retrieving results...")
            break


result_params = {"CMD": "Get", "FORMAT_TYPE": "Text", "RID": rid2}
result_response2 = requests.get(tblastn_url, params=result_params)

output_file = f"tblastn_results1_{accession}.txt"
with open(output_file, "w") as file:
      file.write(result_response2.text)
print(f"tBLASTn results for {accession} saved to {output_file}.")
API refseq • 322 views
ADD COMMENT
0
Entering edit mode

small comment already : you might be better of using the tabular output format , it's much easier to parse afterwards

ADD REPLY
0
Entering edit mode

Thanks,i'll keep that in mind. but i want it to work first, when i try the remote blast offline command i'm getting a:

Error: NCBI C++ Exception:
    T0 "E:\RemoteApp\tmp64_19\2_1719320056\c++\src\connect\ncbi_conn_streambuf.cpp", line 763: Error: (CIO_Exception::eIO_NotSupported) ncbi::CConn_Streambuf::showmanyc() - I/O error
    T0 "E:\RemoteApp\tmp64_19\2_1719320056\c++\src\serial\objistrasnb.cpp", line 1231: Error: (CIO_Exception::eIO_NotSupported) ncbi::CObjectIStreamAsnBinary::ReadClassSequential() -  Frame type= eFrameClassMember, Member name= description
    T0 "E:\RemoteApp\tmp64_19\2_1719320056\c++\src\serial\objistrasnb.cpp", line 1237: Error: (CIO_Exception::eIO_NotSupported) ncbi::CObjectIStreamAsnBinary::ReadClassSequential() -  Frame type= eFrameClass, Object type= Blast4-database-info
ADD REPLY
0
Entering edit mode

NCBI remote blast is meant as a convenience for use with a small number of sequences. It should not be built into a tool or anything that you need to reliably work.

Web blast also has the following limits in place that you should keep in mind ( New web BLAST default parameters and search limits are now in place at NCBI ) It would be best to download the refseq_pro/eu-karyotic genomes locally and run your search against the database, if you want this to reliably work. NCBI makes preformatted blast databases available for these.

ADD REPLY

Login before adding your answer.

Traffic: 1836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6