Retrieve CDS sequence from XP accession number
1
Hi!!
I looking for a faster way to retrieve my CDS sequences from a list of protein accession numbers from NCBI. I know how to do it manually and one by one, but I need a quicker way to solve this problem.
Thank you
CS
ncbi
• 1.4k views
Actual examples for @Sej' suggestion (output truncated to save space).
$ efetch -db nuccore -id "XP_001563456 " -format fasta_cds_na
>lcl|XM_001563406.1_cds_XP_001563456.1_1 [locus_tag=LBRM_15_0070] [db_xref=UniProtKB/TrEMBL:A4H7X7,GeneID:5413973] [protein=conserved hypothetical protein] [protein_id=XP_001563456.1] [location=1..1404] [gbkey=CDS]
ATGCCCTTGTCCTGCGTCGCCAAAGCTGAGGATGTCTTGCAGAAGACTGTGCATCTCTCCAGAGGCGGCC
TCTGCGCAGAGTTCACAGCGGAGGACATCCAGCGCATCACGGACGCCGACGTGCTCCGCTACCTCTCCAC
CCACTCTAATGCACGCACCGAATTGGACGGCGGTATCAACACCGCACCTGTTGAAAAGTCGCTCGCTCCT
GTGACGGGGGCGGCAGACATGGAGGTGCACATGGAGGCCTTGCAGGAGGCGATCAGCACATTTATTACAG
$ efetch -db nuccore -id "XP_001563456 " -format fasta_cds_aa
>lcl|XM_001563406.1_prot_XP_001563456.1_1 [locus_tag=LBRM_15_0070] [db_xref=UniProtKB/TrEMBL:A4H7X7,GeneID:5413973] [protein=conserved hypothetical protein] [protein_id=XP_001563456.1] [location=1..1404] [gbkey=CDS]
MPLSCVAKAEDVLQKTVHLSRGGLCAEFTAEDIQRITDADVLRYLSTHSNARTELDGGINTAPVEKSLAP
VTGAADMEVHMEALQEAISTFITVVDNEGCRYEIRVGALGHVQVPIDDDSYADGASLHEDEGDIEVAPAS
DAVHVGMSGEKSAVTEEATSAAVSRPSSEVTPAASHQKGWPVRRPQPSKPVRPARAAAHLSARVRQQNRF
Login before adding your answer.
Traffic: 2531 users visited in the last hour
You can use NCBI eutils for batch retrieval. More info about this available on: https://www.ncbi.nlm.nih.gov/books/NBK179288/ and http://bioinformatics.cvr.ac.uk/blog/ncbi-entrez-direct-unix-e-utilities/