So I am trying to loop over a data.frame in R where I have proteins and all of the protein subregions. The identifying factor is the geneID. The first occurrence of the geneID is always the whole protein. The following occurrences are the subregions. I am trying to align the subregions with the whole protein to determine the start and stop locations and then add that back to the DF. The data looks like this:
The code I am working on looks like this so far but doesn't work, I know I have some obvious errors but I am trying to work through it :
for(i in 1:length(keyplayers$geneid)) {
id <- keyplayers$geneid[[i]]
a <- i + 1
while(keyplayers$geneid[[a]] == keyplayers$geneid[[i]]) {
pat <- matchPattern(keyplayers$sequence[[a]] , keyplayers$sequence[[i]])
keyplayers$start[i] <- start(pat)
keyplayers$end[i] <- end(pat)
}
}
EDIT: So I have been iterating though the code try to get a solution. The above code returns the same start and stop for all! So I am getting close.
Thanks in advance for the help.
Without looking at the rest of the code, have not forgotten a "1:" in:
for(i in 1:length(x$geneid)){
To loop through the all dataframe?
NB: It would be easier if you could provide the dataframe first lines.
Good catch..I did forget that
Ah it is provided in the imgur link. http://imgur.com/a/lRSrz