Entrez direct E-utilities - "efetch" command to retrieve CDS with protein accessions does not work
3
0
Entering edit mode
8.2 years ago
al-ash ▴ 210

UPDATE: problem solved - it was just a typo and following line does what it is supposed to:

efetch -db protein -format fasta_cds_na -id XP_003399879.1

ORIGINAL REQUEST: I'm using Entrez Direct E-utilities to retrieve protein sequences with protein IDs but the option to retrieve CDS when using a protein ID is not working for me with the following command with an example protein accession:

efetch -db protein -format fasta_cd_na -id XP_003399879.1

although the command to fetch the protein FASTA works:

efetch -db protein -format fasta -id XP_003399879.1

Could you point me towards a mistake? Or is it because the efetch command does not work this way? Thanks!

Entrez Direct E-utilities efetch CDS retrieve • 8.8k views
ADD COMMENT
0
Entering edit mode

curious, what kind of ID is that?

ADD REPLY
2
Entering edit mode
6.1 years ago
h.mon 35k

The problem is you have a typo in your command to recover CDS, is should be -format fasta_cds_na, not -format fasta_cd_na. The following works.

efetch -db protein -format fasta_cds_na -id XP_003399879.1
ADD COMMENT
2
Entering edit mode
8.2 years ago
DCGenomics ▴ 330

The following EDirect commands will get the CDS FASTA from a protein accession:

elink -db protein -id XP_003399879.1 -target nuccore | \
  efilter -molecule mrna | \
  efetch -format fasta_cds_na
ADD COMMENT
0
Entering edit mode

This solution doesn't work for me, it returns:

QueryKey value not found in filter input

QueryKey value not found in fetch input

ADD REPLY
0
Entering edit mode
8.2 years ago
piet ★ 1.9k

The coding sequence (CDS) is a genomic nucleotide sequence, thus you have to retrieve it from the 'nucleotide' database rather then from the 'protein' database. In this case, XP_003399879.1, the coding sequence is XM_003399831.2:24..1547.

ADD COMMENT
0
Entering edit mode

In other words, it is not possible to use efetch with a protein ID as an input to obtain directly the CDS sequence, right? Rather, it is still necessary to convert first the protein ID to gene ID...I'm a bit surprised that the tool can not do this job...anyway, thanks for your reply!

ADD REPLY

Login before adding your answer.

Traffic: 2399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6