I am seeking a way to download a random set of >600 proteins from NCBI (for example) that are NOT my target protein set. For example, I have a list of fasta files which are hemoglobin, myoglobin (all available species) and want to find fasta files that do NOT contain these proteins. I am seeking this 'null set' of proteins for comparison.
Forgive me, this might be similar to another question but I could not seem to find the proper key words for my search.
All the Entrez sites support standard boolean queries. It's a bit 'hacky' but you could simply search the protein databases for something like "NOT haemoglobin". I'm not sure if this will work without something to compare the intersection to, but give it a try.
Hi jrj, Yes i tried that early on and it did not work. However, Searching around Biostars I found (how can I combine uniprot's random and limit functionality?) which I had seen before...