From nucleotide or proteine sequences to EC number using biopython
0
0
Entering edit mode
10 months ago
friguiahlem8 ▴ 30

Hi,

if I have a fasta file containing nucleotide sequences or proteines sequences is it possible to get EC number using biopython for example

1.1.1.169
1.1.1.205
1.1.1.25
1.1.1.302
1.1.1.330
1.1.1.34

ps : I'm working on fungus so I need the fungal database from NCBI but I didn't know how to downloaded or construct it from fasta file using makedb command line and the second point is since I have a large fasta query file to pass I believe that I must work woth blast+ locally but also I don't know how to proceed Thanks

ECnumber • 519 views
ADD COMMENT
0
Entering edit mode

Can you say a bit more about your nucleotide and protein sequences? Are your nucleotide and protein sequences complete coding sequences of well-defined reference sequences or de novo assembled sequences from experimental data like metatranscriptomics for example? I ask b/c this can inform how to proceed.

ADD REPLY
0
Entering edit mode

jv Thank you for responding !! my nucleotide/proteine sequences are Illumina raw reads for RNA-Seq they have been deposited in the NCBI Sequence Read Archive . Genome assemblies and annotations have been deposited in the NCBI BioProject database.

I hope you can help me with the appropriate script because I have been trying but no solution !! what I did for instance is : 1/ download executable blast :https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html 2/ decompress : tar -zxvf ncbi-blast-2.15.0+-x64-linux.tar.gz

3/dowload database :swissprot database fasta file : https://www.uniprot.org/help/downloads 4/decompress : gunzip uniprot_sprot.fasta.gz 5/create the database : /root/blastEC/ncbi-blast-2.15.0+/bin/makeblastdb -in /root/blastEC/database/second/uniprot_sprot.fasta -dbtype prot -parse_seqids

then I runned this script only for blast for instance but I believe it's not correct at all because I don't generate the same thing as when I runned on the web ( I have the same description for all my hits , I don't have the scientific name etc ..) : I also didn't know how to retriefe the EC number :(

from Bio.Blast import NCBIXML import subprocess from io import StringIO

class YourBlastClass: OUT_FMT = "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"

def __init__(self):
    pass

def blast_sequence_local(self, sequence, database_path, evalue, thread, cov, num_alignments):
    cmd = [
        "/root/blastEC/ncbi-blast-2.15.0+/bin/blastp",
        "-db", database_path,
        "-query", "-",
        "-evalue", str(evalue),
        "-num_threads", str(thread),
        "-qcov_hsp_perc", str(cov),
        "-num_alignments", str(num_alignments),
        "-outfmt", self.OUT_FMT,
        "-show_gis",
        "-task", "blastp-fast",
    ]

    try:
        result = subprocess.run(cmd, input=sequence, text=True, capture_output=True, check=True)
        lines = result.stdout.split('\n')
        best_hits = []

        for line in lines:
            if line:
                fields = line.split('\t')
                qseqid, sseqid, pident, length, mismatch, gapopen, qstart, qend, sstart, send, evalue, bitscore = fields

                # Add more fields as needed
                # For example, you might want to extract additional information from the descriptions

                best_hits.append({
                    "qseqid": qseqid,
                    "sseqid": sseqid,
                    "pident": pident,
                    "length": length,
                    "mismatch": mismatch,
                    "gapopen": gapopen,
                    "qstart": qstart,
                    "qend": qend,
                    "sstart": sstart,
                    "send": send,
                    "evalue": evalue,
                    "bitscore": bitscore,
                    # Add more fields as needed
                    # ...
                })

        if not best_hits:
            print("No significant hits found.")

    except subprocess.CalledProcessError as e:
        print(f"Error in BLAST search. Non-zero exit status: {e.returncode}")
        print("BLAST command:")
        print(" ".join(cmd))
        print("BLAST output:")
        print(e.stdout)
        print("BLAST error:")
        print(e.stderr)
        raise
    except Exception as e:
        print(f"Error in BLAST search: {e}")
        if 'result' in locals():
            print("BLAST output:")
            print(result.stdout)
        raise

    return best_hits

if __name__ == "__main__": blast_obj = YourBlastClass()

fasta_file_path = '/root/blastEC/P_murina_1.fasta'
database_path = '/root/blastEC/database/second/uniprot_sprot.fasta'
evalue_threshold = 1e-4
num_threads = 4
query_coverage_percentage = 60
num_alignments_to_show = 10

with open(fasta_file_path, 'r') as fasta_file:
    sequence = fasta_file.read()

best_hits = blast_obj.blast_sequence_local(sequence, database_path,
                                           evalue=evalue_threshold,
                                           thread=num_threads,
                                           cov=query_coverage_percentage,
                                           num_alignments=num_alignments_to_show)

output_file_path = '/root/blastEC/best_hits_output.txt'
with open(output_file_path, 'w') as output_file:
    if best_hits:
        output_file.write("qseqid\tsseqid\tpident\tlength\tmismatch\tgapopen\tqstart\tqend\tsstart\tsend\tevalue\tbitscore\n")
        for hit in best_hits:
            output_file.write(f"{hit['qseqid']}\t{hit['sseqid']}\t{hit['pident']}\t{hit['length']}\t{hit['mismatch']}\t{hit['gapopen']}\t{hit['qstart']}\t{hit['qend']}\t{hit['sstart']}\t{hit['send']}\t{hit['evalue']}\t{hit['bitscore']}\n")
            # Add more fields as needed
            # ...
    else:
        output_file.write("No significant hits found.\n")

print(f"Best hits have been written to {output_file_path}")

I also tried diamond https://github.com/bbuchfink/diamond/wiki and i generated my output file matches.tsv but I still have the same problem I don't know how to generate the EC number

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6