Hi all!
I am building a local pipeline in order to identify unknown transcripts. One part of this pipeline is identifying if the unknown sequences have a similar already-annotated counterpart. For this, I locally BLAST the transcripts and I am able to get the accession code, the coordinates, and strand of the hit in the other genome. With this, I expected to extract possible annotations found within the genome of the hit. I tried using efetch with the following call and delivers the next output:
efetch -db nuccore -id "CP040608.1" -seq_start 17402 -seq_stop 16692 -strand 1 -format ft
>Feature gb|CP040608.1|
<1 647 gene
locus_tag FBF02_00060
<1 647 CDS
product desulfoferrodoxin FeS4 iron-binding domain-containing protein
transl_table 11
protein_id gb|QJE54075.1||gnl|PRJNA258022|FBF02_00060
inference COORDINATES: similar to AA sequence:RefSeq:YP_002343484.1
Sadly I expect the region to be labelled only in the plus strand, but changing the strand to 2 delivers the same result...
Do you have any suggestion why this is happening? Do you have maybe another solution rather than efetch? I would expect to run ~10.000 of hits and efetch is quite slow and restrictive for this large amount of queries.
Thanks in advance!