Entering edit mode
12 months ago
friguiahlem8
▴
30
Hi,
if I have a fasta file containing nucleotide sequences or proteines sequences is it possible to get EC number using biopython for example
1.1.1.169
1.1.1.205
1.1.1.25
1.1.1.302
1.1.1.330
1.1.1.34
ps : I'm working on fungus so I need the fungal database from NCBI but I didn't know how to downloaded or construct it from fasta file using makedb command line and the second point is since I have a large fasta query file to pass I believe that I must work woth blast+ locally but also I don't know how to proceed Thanks
Can you say a bit more about your nucleotide and protein sequences? Are your nucleotide and protein sequences complete coding sequences of well-defined reference sequences or de novo assembled sequences from experimental data like metatranscriptomics for example? I ask b/c this can inform how to proceed.
jv Thank you for responding !! my nucleotide/proteine sequences are Illumina raw reads for RNA-Seq they have been deposited in the NCBI Sequence Read Archive . Genome assemblies and annotations have been deposited in the NCBI BioProject database.
I hope you can help me with the appropriate script because I have been trying but no solution !! what I did for instance is : 1/ download executable blast :https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html 2/ decompress : tar -zxvf ncbi-blast-2.15.0+-x64-linux.tar.gz
3/dowload database :swissprot database fasta file : https://www.uniprot.org/help/downloads 4/decompress : gunzip uniprot_sprot.fasta.gz 5/create the database : /root/blastEC/ncbi-blast-2.15.0+/bin/makeblastdb -in /root/blastEC/database/second/uniprot_sprot.fasta -dbtype prot -parse_seqids
then I runned this script only for blast for instance but I believe it's not correct at all because I don't generate the same thing as when I runned on the web ( I have the same description for all my hits , I don't have the scientific name etc ..) : I also didn't know how to retriefe the EC number :(
from Bio.Blast import NCBIXML import subprocess from io import StringIO
class YourBlastClass: OUT_FMT = "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"
if __name__ == "__main__": blast_obj = YourBlastClass()
I also tried diamond https://github.com/bbuchfink/diamond/wiki and i generated my output file matches.tsv but I still have the same problem I don't know how to generate the EC number