Entering edit mode
22 months ago
ANDRES FELIPE
•
0
I'm trying to search homologs for a query sequence in a list of genomes given their accesion numbers obtained from ref seq nucleotides database. I have no results and I don't understand how to find the homologs of the protein in a genome. Please help me! Thank you.
The next one is the code:
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
# Read or define your query protein sequence
query_sequence = "MAAGVK"
# Define the genome accession numbers
genome_accession_numbers = ["NC_000913.3", "NC_060200.1"] # Example accession numbers for E. coli and S. cerevisiae
# Perform BLASTP search against each genome
for accession in genome_accession_numbers:
# Build the entrez query for the specific genome
entrez_query = f"refseq_genomic[filter] AND {accession}[accession]"
# Perform BLASTP search for the specific genome
result_handle = NCBIWWW.qblast("blastp", "refseq_select", query_sequence, entrez_query=entrez_query)
# Parse the results
blast_records = NCBIXML.parse(result_handle)
# Process each record
for record in blast_records:
for alignment in record.alignments:
for hsp in alignment.hsps:
# Access relevant information
print("Hit ID:", alignment.hit_id)
print("Hit Description:", alignment.hit_def)
print("Alignment Length:", hsp.align_length)
print("Bit Score:", hsp.bits)
print("E-Value:", hsp.expect)
print("Alignment Sequence:", hsp.query)
print("Hit Sequence:", hsp.sbjct)
print("\n")
# Close the result handle
result_handle.close()
So should I just change from the code from
NCBIWWW.qblast("blastp",...)
toNCBIWWW.qblast("tblastn",...)
and it should work?give it a shot!
It didn't work :c
Please invest some proper effort. "It doesn't work" is not sufficient feedback. What is the exact error?
Two possible issues. If you want to do
tblastn
you will need to reduce the default word size to perhaps 2 since the query is so short (6 AA). This does not work.It may be best to use the same strategy with
blastp
search. Here is an example using your query against E coli proteins with a word size of 2. This result is limited to 100 hits but there will likely be more. The link should stay valid for a day or two: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=AE4RBNUD01NThis search may not work via command line remote blast.