Hi everybody,
I need to download sequences (fasta) with their annotation data (gff3) from ncbi based on their accession number. I've used entrez efetch
for that job and retrieved data in asn.1
and converted to fasta
and gff3
with asn2fasta
and annotwriter
from ncbi c++ toolkit. However for some Refseq records, the raw sequence information is not part of the asn.1
record and the asn2fasta
needs to download it from some ncbi webservice. However it takes ages, compared to plain efetch
.
For example, it takes efetch
1.3 seconds to download fasta sequences for these two refseq accessions "NW_003726435.1, NW_003729148.1", while asn2fasta
, with asn.1
records already obtained in the file takes about 40 seconds (for one sequenece about 37 seconds).
Do anybody have any idea, why the asn2fasta
is so slow, and/or how to make it run faster?
Best regards
This really is a question for NCBI help desk. Be aware that it may take 2-3 business days to get an answer from them but be patient. Come back and post the official response here when you get one.