Hello,
I have a large file with 105 protein sequences. To obtain this file I used the 'seqinr' function:
library (seqinr)
myfasta <- read.fasta (file = "mydata.fasta", seqtype = "AA", as.string = TRUE, set.attributes = FALSE)
subsetlist <-read.table ("mylist.txt", header = TRUE)
my_fasta_sub <- myfasta [names (myfasta)% in% subsetlist $ ID]
write.fasta (sequences = my_fasta_sub, names = names (my_fasta_sub), nbchar = 80, file.out = "myresylt.fasta")
Next, I would like to look for some sequences of peptides within this list with the 105 sequences, and furthermore, to know which are the two amino acids before the first amino acid of the peptide. Does anyone have any idea how I can do this, please? I'm new to the R environment.
Example of peptide sequences:
DAIPAVEVFEGEPGNK
AVFQLLDSMGPSLPIAEYIASLDRPR
GFCFITFKEEEPVKK
HAFSGGRDTIEEHR
I would like to know what the two amino acids are before, for example:
_ _ DAIPAVEVFEGEPGNK
_ _ AVFQLLDSMGPSLPIAEYIASLDRPR
_ _ GFCFITFKEEEPVKK
_ _ HAFSGGRDTIEEHR
Thanks in advance!
Use zero length assertions in R. Without zero length assertions:
if Both the sequences are in the same order i.e order of protein sequences (pepseq1 above) is exactly as query sequences (pepseq above):