Thank you all for your kind help and direction.
I have however utilized a different approach to gather information for acc. no.s , as my system couldn't install efetch and esearch (eutilities).
Also, manual way was inpossible for such a huge dataset.
My work is although a liitle exhaustive but had helped me so sharing with others for knowledge, just in case required:
wget url:
wget "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10234&lvl=3&lin=f&keep=1&srchmode=1&unlock"
Here I have replaced my taxid with $i
which it read from list as,
for i in `cat list`; do wget "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=**"$i"**&lvl=3&lin=f&keep=1&srchmode=1&unlock" ; done
then an index file forms like index_******_ taxid_*****
grep -E "Scientific name|/genome/?term=txid""$i" wwwtax.cgi\?mode\=Info\&id\=**"$i"**\&lvl\=3\&lin\=f\&keep\=1\&srchmode\=1\&unlock >Details_$i
will save in Detais_$s
the details of taxids whose genome is available, such as taxid 10244 :
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10244&lvl=3&lin=f&keep=1&srchmode=1&unlock
has and this id : 10234,
https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10234&lvl=3&lin=f&keep=1&srchmode=1&unlock
doesn't.
So grep
will keep all that saved in Details file, from details get their NC_****
acc numbers using the following url:
https://www.ncbi.nlm.nih.gov/genome/?term=txid10244[Organism:exp]
Hope this might help someone in future too, or someone may improve this to make it more organised.
Thanks once again biostars, especially Sej Modha and genomax for your help and kind guidance.
Thank you
My system is not supporting these utilities, as a result it shows command not found error. Can we get some curl/wget link to get NC_XXX data for each taxid. Would any other way round be possible?
You'd have to install these utilities on your computer and it can be downloaded from: ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/
You might also find this eutils tutorial helpful.