retrieving fasta sequences with set of genbank ids and their corresponding protein sequences
2
0
Entering edit mode
7.7 years ago

hi I have to get RNA and Protein sequences for a large set of gene ids. how can i do it without entering thousands of queries. thanks

sequence • 1.8k views
ADD COMMENT
0
Entering edit mode
7.7 years ago

You have at least these three options: 1) use esearch and efetch in euitils of entrez (provides command line access to tons of ncbi data) https://www.ncbi.nlm.nih.gov/books/NBK25498/#chapter3.Application_3_Retrieving_large 2) Bio.Entrez (I never used but might be an option) http://biopython.org/DIST/docs/api/Bio.Entrez-module.html looks like it provides the same functionality as eutils in Entrez. 3) I personally like to download data from ftp and then extract what I need at that moment. That way I can do lots of different extractions for different projects from the same data.

Could you also please specify what databases you had in mind for your task, since if you meant Ensembl, UniProt or something else, then only the third option will work.

ADD COMMENT
0
Entering edit mode
7.7 years ago
Charles Yin ▴ 180

You can write a Python program using Biothon to retrieve these sequences. Or you can use MATLAB too.

ADD COMMENT

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6