Hi all,
I want to download FASTA from NCBI based on GCF/A ID. I used this command at first:
esearch -db assembly -query GCF_000023565.1 | elink -target nuccore | efetch -format fasta > {genome_out}
to download genomes, but I noticed the result is "duplicated", meaning I get both >NC_
and >CP_
Complete Genomes.
Then I switched to the command
esearch -db nucleotide -query GCF_000023565.1 | efetch -format fasta > {genome_out}
which worked. (File size is half the original).
Now when I try downloading GCF_000648595 with the new command I get an empty file. When I switch back to the original command I do manage to get a fasta file but it seems "duplicated" again.
From the NCBI site I can see that the new command only works on IDs with "WGS project".
I hope some can shed some light on this, I am hoping for a command that can deal with both GCF_000023565.1 & GCF_000648595 examples, and if not than maybe switching between different commands based on some condition.
I have tried multiple other parameters and none seems to work.
Any help will be highly appreciated.
There was a similar question earlier today.
Thanks, the command in that post is the exact one given to me in the said post by GenoMax. But, his was a NCBI error which occurred to me as well but is not the current issue I'm facing.