I'm using the command line e-utilities tools to get nucleotide sequences from Biosample IDs. RIght now I'm working it out for just one ID, and then I will address all 50+ IDs I need to fetch. I would like to limit the efetch to only capture the complete genome.
Initial code showing the results that I want to filter:
esearch -db biosample -query "SAMN04014953" | elink -target nuccore | efilter -source refseq | efetch -format fasta | grep '>'
Output:
>NZ_CP021549.1 Klebsiella pneumoniae strain AR_0112, complete genome
>NZ_CP021548.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000005, complete sequence
>NZ_CP021547.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000003, complete sequence
>NZ_CP021546.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000002, complete sequence
>NZ_CP021545.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000001, complete sequence
>NZ_CP021544.1 Klebsiella pneumoniae strain AR_0112 plasmid tig00000000, complete sequence
I want only the first result. (To save time, I'd rather set up that filter in the efetch command rather than filtering after fetching all the results.)
How I've tried to alter the efetch command in the above query:
efilter -source refseq -field "complete genome"
This produces no output.
efilter -source refseq -query "genome"
This produces the same output as removing the -query "genome" from the efilter command.
Any suggestions? I've looked through the Entrez Direct guide and did a Google search, but I still can't come up with a solution.
I would suggest that you use Kai Blins' NCBI genome download tool.
You can also parse a file NCBI makes available: how to download all the complete genomes for mycobacteria from NCBI?
Thanks. I read through the documentation and it doesn't like this includes any of the features I need. Namely, I need to link 50+ Biosample IDs to their Nucleotide IDs, and then fetch the associated RefSeq complete genomes. I'm starting with a list of Biosample IDs that I care about, not all are K. pneumoniae (that was just an example for illustration). Have I misunderstood the tools' functions?
In case you knew the organisms you were interested in this would be a simpler solution. Noted to make you aware of its presence.