So I am trying to execute the follow code on a list of ID's instead of an individual ID:
source("https://bioconductor.org/biocLite.R")
#install.packages('reutils')
#install.packages('Peptides')
#biocLite(pkgs = c('GenomeInfoDb','GenomicRanges'))
#install.packages('plyr')
#install.packages('devtools')
#devtools::install_github("gschofl/biofiles")
library(Peptides)
library(reutils)
library(Biostrings)
library(biofiles)
library(plyr)
library(stringr)
library(tibble)
#install.packages('data.table')
library(data.table)
#this exactly the end format of that data frame I want but instead of 1 UID like 124511 a list of UIDs
fetch <- efetch(124511, db=db, rettype = 'gp', retmode = retmode, retmax = returnAmount)
rec <- gbRecord(fetch)
seq <- getSequence((ft(rec)))
m <- as.data.frame(seq)
setnames(m, "x", "sequence")
protienName <- names(seq)
m <- add_column(m, protienName, .after = 0)
m$molecularweight <- mw(m$sequence)
m$m<- str_count(m$sequence, 'm')
m$cc <- str_count(m$sequence, 'cc')
logvec <- grepl('(Protein)|(Region)', m$protienName)
m <- subset(m, logvec)
The problem is efetch() can only use one ID at a time. So I must either write a for loop or use the apply function on the list of protein IDs. If I were to take the code as is and tried to make it for a list each iteration would delete the previous one. Therefore I was hoping someone can help me append the data.frame each time or show me a way that each iteration wouldn't replace the previous.
efetch can take a list of ids https://www.rdocumentation.org/packages/reutils/versions/0.2.2/topics/efetch
Wow...how about that. For some reason I really thought it couldn't! Thanks I will try just feeding it a list then. Thanks.