Hi, I am looking to download all NCBI accession ID of CDS in a text file of all viral genomes. Please let me know how to download these. I tried following command but it downloads accession ID of the whole genome of virus.
esearch -db nucleotide -query "virus [orgn]" | efetch -format acc >gi_virus_id.txt
I assume you mean CDS's from accession IDs?
You could do something like this but it would be useful to use a very specific query than just "virus".
Thanks for your help. However, I am looking to download just gi number, not fasta seq. Also, my goal to download all gi numbers of genes from all viruses is that because we want to see if our samples (fastq files) extracted from animals have any similarity to the genes in the virus. Consequently, it can give us the idea if the samples are pathogenic. Also, I mentioned CDS mistakenly, I need to download GI numbers from all viruses not CDS. According to the gi's header information such as "Influenza B virus (B/Victoria/4/2012) polymerase PA (PA) gene", "HIV-1 isolate F7S2CL10 gag protein (gag) gene" etc., we can see the virus names.
gi
numbers are deprecated for use outside NCBI. You should switch your workflow to use accession numbers instead.Ok! Could you please let me know the command of downloading all accession numbers of genes in all virus.
Hi, please let me know if you have an idea of downloading accession number of genes in all virus species.