Question

convert mitochondrial refseq peptide id's (YP_XX) using biomaRt

0

Entering edit mode

9.0 years ago

glocke01 ▴ 190

I downloaded human.1.protein.faa.gz ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/, which includes protein sequences for mitochrondrial proteins such as this:

>gi|251831107|ref|YP_003024026.1| NADH dehydrogenase subunit 1 (mitochondrion) [Homo sapiens]
MPMANLLLLIVPILIAMAFLMLTERKILGYMQLRKGPNVVGPYGLLQPFADAMKLFTKEPLKPATSTITLYITAPTLALT
IALLLWTPLPMPNPLVNLNLGLLFILATSSLAVYSILWSGWASNSNYALIGALRAVAQTISYEVTLAIILLSTLLMSGSF
NLSTLITTQEHLWLLLPSWPLAMMWFISTLAETNRTPFDLAEGESELVSGFNIEYAAGPFALFFMAEYTNIIMMNTLTTT
IFLGTTYDALSPELYTTYFVTKTLLLTSLFLWIRTAYPRFRYDQLMHLLWKNFLPLTLALLMWYVSMPITISSIPPQT

The difficulty is that I can't get biomaRt to recognize the id YP_003024026. human.1.protein.faa includes refseq ids like NP_12345, which biomaRt recognizes as "refseq_peptide", and XP_12345, which biomaRt recognizes as "refseq_peptide_predicted". I can't figure out how to get it to recognize the YP sequences. I want to find the corresponding entrez id.

In the meantime, there are only 13 YP_1234, so I've solved this problem "by hand" using http://www.ncbi.nlm.nih.gov/protein/YP_003024026 and looking down to find the GeneID. I'd prefer to do it "the right way" in case the id's change in the future.

any advice?

refseq biomaRt • 2.0k views

ADD COMMENT • link updated 9.0 years ago by Denise CS ★ 5.2k • written 9.0 years ago by glocke01 ▴ 190

score 2 · Answer 1 · 2016-05-16

2

Entering edit mode

9.0 years ago

Denise CS ★ 5.2k

Believe it or not, but YP does get recognised as RefSeq Protein ID. I've just tried it and it worked beautifully using the web interface of Ensembl Biomart. Use YP_003024026 instead of YP_003024026.1 and choose your filters as RefSeq protein ID(s). You can have a mix bag of NPs and YPs in your filters and it will work just as well.

ADD COMMENT • link 9.0 years ago by Denise CS ★ 5.2k

0

Entering edit mode

YP seems to be another example of curated RefSeq protein according to this README. So that explains why we should use RefSeq Protein ID in BioMart as YP = NP but not XP (predicted protein).

ADD REPLY • link 9.0 years ago by Denise CS ★ 5.2k