My goal is to download a specific number of sequences (eg. 1 million) from the NCBI protein database but exclude a specific organism. I can do an Entrez search to exclude the organism, but I can't find a way to limit the number of sequences I download. It's also not feasible to download all results (too large for the NCBI website).
Is there a particular Entrez command or script that would allow this? There was a similar question posted previously, but no direct answer was found.
Also, is there a more effective way to download sequences from search results other than selecting the 'Send to: file' drop down menu in the upper right of the screen? Often, when downloading larger numbers of sequences the connection ultimately breaks.
Thanks for your help! Please direct me/correct me if this question has in fact been already answered.
Thank you very much! Exactly what I was looking for. Also, thanks for the heads up about bias.