Hi, everyone! Do you know where download protein sequences? Up to now, I had downloaded NCBI NR database and Uniprot. Except for these famous databases, are there any other databases? Thank you very much! Best wishes!
Hi, everyone! Do you know where download protein sequences? Up to now, I had downloaded NCBI NR database and Uniprot. Except for these famous databases, are there any other databases? Thank you very much! Best wishes!
I am not aware of a metagenome protein sequence database
since many times the information you are getting from this type of sequencing is incomplete. There are now recent advances being made with metagenome sequenced genomes but that information is likely not fully validated.
There are papers like (https://www.nature.com/articles/sdata2017203, https://www.nature.com/articles/s41587-018-0008-8, https://www.ncbi.nlm.nih.gov/pubmed/30320765 as examples ) which describe metagenomic genome assemblies. You would likely need to download these genomes yourself and make databases from them. It would not be a trivial exercise. e.g. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA482748 I am not sure if these sequences make it into gene/protein sections of GenBank at some point.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
There are a lot of posts here about that. Like this one:
Difference between NCBI non-redundant and refseq database
Go to the left-hand upper corner, press "LATEST"
and insert your question in the middle of the page.
Everything depends upon your goal, your species, etc
Human and bacterial proteins are the most studied ones.
How to download database of Human protein sequences with sub cellular locations?
Looking For A Database Of All Proteins Expressed/Predicted In Completely Sequenced Genomes
And there may be a lot of details:
Do you need some unique protein database? Are you interested in some particular
organellas, like mitichondria? Or some protein domain or motif?
Extracting Sub-cellular location from Uniprot into tabular format
Download all bacterial proteins from the same family
how to get protein motif sequence from pfam database?
Do you need a curated protein database ot it doesn't matter?
Do you need some enzymes?
A: extract EC number from entrez esearch query
Do you need to do it computationally or manually?
Etc,
Thanks a lot for your reply! I am collecting protein sequences data as much as possible. I had done searching as you say above but did not found some valued ones.
What exactly didn't you find?
for example, metagenome protein sequence database.
In case there is nothing similar in Biostars, Google is your first choice. Chrome as a browser is better as well.
https://www.google.com/search?source=hp&ei=A3dlXLm7DcjIrgSukbzADQ&q=metagenomics+protein+sequence+database&oq=metagenome+protein+sequence+database&gs_l=psy-ab.1.0.33i22i29i30.6422.6422..12106...0.0..0.242.306.1j0j1......0....2j1..gws-wiz.....0.oiryon8GYfk
There are many links inside, again, I don’t know what exactly you need. For example (some ncbi-links are papers describing these tools)
This review looks very informative:
A review of methods and databases for metagenomicclassification and assembly
Florian P. Breitwieser, Jennifer Lu and Steven L. Salzberg
http://ccb.jhu.edu/people/salzberg/docs/Breitwieser-etal-2017-Metagenomics-review-reprint.pdf
https://www.uniprot.org/help/unimes
https://omictools.com/protein-database-search-category
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3794082/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5337752/
Another way is https://scholar.google.com
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-386
https://reader.elsevier.com/reader/sd/pii/S0950705118304933?token=BA1889DBEA721A9AACB49274C21FA766158A34A80C41C8FB156078EDC89DED188720C85CD3ACBA480F64C061C6C07F33