Question

Entrez Direct to retrieve Protein and Coding sequences from NCBI accesion

0

Entering edit mode

6.1 years ago

dllopezr ▴ 130

Hi everyone!

I have a bunch of fasta files of gene sequences that I download from ncbi trough entrez direct tool. I am wondering if it is posible to obtain the protein and coding sequences of these genes using the accession, that is in this format: NZ_CP006694.1:1104181-1105143

where the data following the : is the sequence section where the gene is located.

Can you help me with that?

Thank you so much

entrez ncbi Coding Sequences Retrieve protein • 1.8k views

ADD COMMENT • link updated 6.1 years ago by vkkodali_ncbi ★ 3.8k • written 6.1 years ago by dllopezr ▴ 130

1

Entering edit mode

An R-based solution would be to use the Bioconductor package BiomaRt; please see my post here. Since you have the exact chromosomal position already, you can easily covert this to sequences. You can find the appropriate filters (chromosome, start and end position) using listFilters(ensembl), and the attribute (protein / dan sequence) using listAttributes(ensembl).

ADD REPLY • link 6.1 years ago by thomaskuilman ▴ 850

score 1 · Answer 1 · 2018-10-25

1

Entering edit mode

6.1 years ago

vkkodali_ncbi ★ 3.8k

I think you can use Edirect for this as follows:

efetch -db nuccore -id 'NZ_CP006694.1' -seq_start 1104181 -seq_stop 1105143 -format fasta_cds_aa

ADD COMMENT • link 6.1 years ago by vkkodali_ncbi ★ 3.8k