hi I have to get RNA and Protein sequences for a large set of gene ids. how can i do it without entering thousands of queries. thanks
hi I have to get RNA and Protein sequences for a large set of gene ids. how can i do it without entering thousands of queries. thanks
You have at least these three options: 1) use esearch and efetch in euitils of entrez (provides command line access to tons of ncbi data) https://www.ncbi.nlm.nih.gov/books/NBK25498/#chapter3.Application_3_Retrieving_large 2) Bio.Entrez (I never used but might be an option) http://biopython.org/DIST/docs/api/Bio.Entrez-module.html looks like it provides the same functionality as eutils in Entrez. 3) I personally like to download data from ftp and then extract what I need at that moment. That way I can do lots of different extractions for different projects from the same data.
Could you also please specify what databases you had in mind for your task, since if you meant Ensembl, UniProt or something else, then only the third option will work.
You can write a Python program using Biothon to retrieve these sequences. Or you can use MATLAB too.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.