Help searching homologs in a genome
1
0
Entering edit mode
22 months ago

I'm trying to search homologs for a query sequence in a list of genomes given their accesion numbers obtained from ref seq nucleotides database. I have no results and I don't understand how to find the homologs of the protein in a genome. Please help me! Thank you.

The next one is the code:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

# Read or define your query protein sequence
query_sequence = "MAAGVK"

# Define the genome accession numbers
genome_accession_numbers = ["NC_000913.3", "NC_060200.1"]  # Example accession numbers for E. coli and S. cerevisiae

# Perform BLASTP search against each genome
for accession in genome_accession_numbers:
    # Build the entrez query for the specific genome
    entrez_query = f"refseq_genomic[filter] AND {accession}[accession]"

    # Perform BLASTP search for the specific genome
    result_handle = NCBIWWW.qblast("blastp", "refseq_select", query_sequence, entrez_query=entrez_query)

    # Parse the results
    blast_records = NCBIXML.parse(result_handle)

    # Process each record
    for record in blast_records:
        for alignment in record.alignments:
            for hsp in alignment.hsps:
                # Access relevant information
                print("Hit ID:", alignment.hit_id)
                print("Hit Description:", alignment.hit_def)
                print("Alignment Length:", hsp.align_length)
                print("Bit Score:", hsp.bits)
                print("E-Value:", hsp.expect)
                print("Alignment Sequence:", hsp.query)
                print("Hit Sequence:", hsp.sbjct)
                print("\n")

# Close the result handle
result_handle.close()
blast • 1.1k views
ADD COMMENT
0
Entering edit mode
22 months ago
LChart 4.9k

I'm not super familiar with the Bio.Blast bindings, but it looks like you're trying to blastp protein sequences against DNA sequences. The query tool for peptides against (implicitly translated) nucleotides is tblastn.

ADD COMMENT
0
Entering edit mode

So should I just change from the code from NCBIWWW.qblast("blastp",...) to NCBIWWW.qblast("tblastn",...) and it should work?

ADD REPLY
0
Entering edit mode

give it a shot!

ADD REPLY
0
Entering edit mode

It didn't work :c

ADD REPLY
0
Entering edit mode

Please invest some proper effort. "It doesn't work" is not sufficient feedback. What is the exact error?

ADD REPLY
0
Entering edit mode

Two possible issues. If you want to do tblastn you will need to reduce the default word size to perhaps 2 since the query is so short (6 AA). This does not work.

It may be best to use the same strategy with blastp search. Here is an example using your query against E coli proteins with a word size of 2. This result is limited to 100 hits but there will likely be more. The link should stay valid for a day or two: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=AE4RBNUD01N

Query           1    MAAGVK  6
EKH6153926.1    588  MAAGVK  593
EZH04849.1      223  MAAGVK  228
ENG99890.1      223  MAAGVK  228
WP_192959513.1  223  MAAGVK  228
STP69032.1      223  MAAGVK  228
GHL80845.1      220  MAAGVK  225
GHL01723.1      220  MAAGVK  225
WP_282955576.1  268  MAAGVK  273
GHL26235.1      220  MAAGVK  225
KAA2030300.1    257  MAAGVK  262
HBD5769007.1    113  MAAGVK  118
HAH4572450.1    223  MAAGVK  228
STM17458.1      223  MAAGVK  228
NEY29015.1      223  MAAGVK  228
WP_259431145.1  223  MAAGVK  228

This search may not work via command line remote blast.

ADD REPLY

Login before adding your answer.

Traffic: 2424 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6