Question

Help searching homologs in a genome

0

Entering edit mode

2.0 years ago

ANDRES FELIPE • 0

I'm trying to search homologs for a query sequence in a list of genomes given their accesion numbers obtained from ref seq nucleotides database. I have no results and I don't understand how to find the homologs of the protein in a genome. Please help me! Thank you.

The next one is the code:

from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

# Read or define your query protein sequence
query_sequence = "MAAGVK"

# Define the genome accession numbers
genome_accession_numbers = ["NC_000913.3", "NC_060200.1"]  # Example accession numbers for E. coli and S. cerevisiae

# Perform BLASTP search against each genome
for accession in genome_accession_numbers:
    # Build the entrez query for the specific genome
    entrez_query = f"refseq_genomic[filter] AND {accession}[accession]"

    # Perform BLASTP search for the specific genome
    result_handle = NCBIWWW.qblast("blastp", "refseq_select", query_sequence, entrez_query=entrez_query)

    # Parse the results
    blast_records = NCBIXML.parse(result_handle)

    # Process each record
    for record in blast_records:
        for alignment in record.alignments:
            for hsp in alignment.hsps:
                # Access relevant information
                print("Hit ID:", alignment.hit_id)
                print("Hit Description:", alignment.hit_def)
                print("Alignment Length:", hsp.align_length)
                print("Bit Score:", hsp.bits)
                print("E-Value:", hsp.expect)
                print("Alignment Sequence:", hsp.query)
                print("Hit Sequence:", hsp.sbjct)
                print("\n")

# Close the result handle
result_handle.close()

blast • 1.2k views

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 2.0 years ago by ANDRES FELIPE • 0

GenoMax · Answer 1 · 2023-07-06

0

Entering edit mode

2.0 years ago

LChart 5.0k

I'm not super familiar with the Bio.Blast bindings, but it looks like you're trying to blastp protein sequences against DNA sequences. The query tool for peptides against (implicitly translated) nucleotides is tblastn.

ADD COMMENT • link 2.0 years ago by LChart 5.0k

0

Entering edit mode

So should I just change from the code from NCBIWWW.qblast("blastp",...) to NCBIWWW.qblast("tblastn",...) and it should work?

ADD REPLY • link updated 2.0 years ago by GenoMax 152k • written 2.0 years ago by ANDRES FELIPE • 0

0

Entering edit mode

give it a shot!

ADD REPLY • link 2.0 years ago by LChart 5.0k

0

Entering edit mode

It didn't work :c

ADD REPLY • link 2.0 years ago by ANDRES FELIPE • 0

0

Entering edit mode

Please invest some proper effort. "It doesn't work" is not sufficient feedback. What is the exact error?

ADD REPLY • link 2.0 years ago by Ram 45k

0

Entering edit mode

Two possible issues. If you want to do tblastn you will need to reduce the default word size to perhaps 2 since the query is so short (6 AA). This does not work.

It may be best to use the same strategy with blastp search. Here is an example using your query against E coli proteins with a word size of 2. This result is limited to 100 hits but there will likely be more. The link should stay valid for a day or two: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=AE4RBNUD01N

Query           1    MAAGVK  6
EKH6153926.1    588  MAAGVK  593
EZH04849.1      223  MAAGVK  228
ENG99890.1      223  MAAGVK  228
WP_192959513.1  223  MAAGVK  228
STP69032.1      223  MAAGVK  228
GHL80845.1      220  MAAGVK  225
GHL01723.1      220  MAAGVK  225
WP_282955576.1  268  MAAGVK  273
GHL26235.1      220  MAAGVK  225
KAA2030300.1    257  MAAGVK  262
HBD5769007.1    113  MAAGVK  118
HAH4572450.1    223  MAAGVK  228
STM17458.1      223  MAAGVK  228
NEY29015.1      223  MAAGVK  228
WP_259431145.1  223  MAAGVK  228

This search may not work via command line remote blast.

ADD REPLY • link 2.0 years ago by GenoMax 152k