List of accession numbers for nucleotide sequences to protein sequences using R
1
0
Entering edit mode
7.2 years ago
arla_21 • 0

Hi I'm sure this is simple but I am quite new to the area so be gentle I have a list of accession numbers corresponding to full length sequences. I want to use these to download the protein sequences for all of the full length sequences using Rentrez. I can do this easily for one accession number:

search1 <- entrez_search(db="nuccore", term="JQ348844", [ACCN])
protein_links <- entrez_link(dbfrom='nuccore', id=search1$ids, db='all')
protein_seq <- entrez_fetch(db="protein", rettype="fasta", id=protein_links$links$nuccore_protein)

You can't input more than one accession into the term field of the first search. I'm sure you can do this by a simple loop or something similar but I want one file in the end with all the protein sequences from all the input accession numbers.

Sorry if this is a stupid question! Thanks in advance

R rentrez • 2.1k views
ADD COMMENT
0
Entering edit mode
7.2 years ago
tarek.mohamed ▴ 370

Hi

you can do this by using BSgenome package in R

library("BSgenome")
available.genomes()
installed.genomes()
hg38_genome <- getBSgenome("BSgenome.Hsapiens.NCBI.GRCh38")
hg38_genome
seq<-getSeq(hg38_genome,target_genes)

whereas, "target_genes" is character vector containing the names of the sequences in hg38_genome where to get the subsequences from. Hope this is helpful! Tarek

ADD COMMENT

Login before adding your answer.

Traffic: 2377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6