Hello,
I'm using the following code to read from a text file containing more than 200 fasta files and to align these proteins.
seqs = read.fasta(file = "sequencefasta.txt", seqtype = "AA")
self_aligned_score = rep(NA, length(seqs))
for (i in 1:length(seqs)){
cat('aligning ',i,'/',length(seqs),' with itself..\n')
seq = paste(toupper(getSequence(seqs[i][[1]])), sep="", collapse="")
align_score = pairwiseAlignment(seq, seq, scoreOnly=TRUE, gapExtension=0.5, type="local", substitutionMatrix = "BLOSUM50")
self_aligned_score[i] = align_score
}
But I'm having the following error, which I have no idea about why.
Error in .Call2("XStringSet_align_pairwiseAlignment", pattern, subject, : key 32 not in lookup table
Here is a snippet of my sequencefasta.txt file:
>sp|O00141|SGK1_HUMAN Serine/threonine-protein kinase Sgk1 OS=Homo sapiens GN=SGK1 PE=1 SV=2
MTVKTEAAKGTLTYSRMRGMVAILIAFMKQRRMGLNDFIQKIANNSYACKHPEVQSILKI
SQPQEPELMNANPSPPPSPSQQINLGPSSNPHAKPSDFHFLKVIGKGSFGKVLLARHKAE
EVFYAVKVLQKKAILKKKEEKHIMSERNVLLKNVKHPFLVGLHFSFQTADKLYFVLDYIN
GGELFYHLQRERCFLEPRARFYAAEIASALGYLHSLNIVYRDLKPENILLDSQGHIVLTD
FGLCKENIEHNSTTSTFCGTPEYLAPEVLHKQPYDRTVDWWCLGAVLYEMLYGLPPFYSR
NTAEMYDNILNKPLQLKPNITNSARHLLEGLLQKDRTKRLGAKDDFMEIKSHVFFSLINW
DDLINKKITPPFNPNVSGPNDLRHFDPEFTEEPVPNSIGKSPDSVLVTASVKEAAEAFLG
FSYAPPTDSFL
>sp|O00311|CDC7_HUMAN Cell division cycle 7-related protein kinase OS=Homo sapiens GN=CDC7 PE=1 SV=1
MEASLGIQMDEPMAFSPQRDRFQAEGSLKKNEQNFKLAGVKKDIEKLYEAVPQLSNVFKI
EDKIGEGTFSSVYLATAQLQVGPEEKIALKHLIPTSHPIRIAAELQCLTVAGGQDNVMGV
KYCFRKNDHVVIAMPYLEHESFLDILNSLSFQEVREYMLNLFKALKRIHQFGIVHRDVKP
SNFLYNRRLKKYALVDFGLAQGTHDTKIELLKFVQSEAQQERCSQNKSHIITGNKIPLSG
PVPKELDQQSTTKASVKRPYTNAQIQIKQGKDGKEGSVGLSVQRSVFGERNFNIHSSISH
ESPAVKLMKQSKTVDVLSRKLATKKKAISTKVMNSAVMRKTASSCPASLTCDCYATDKVC
SICLSRRQQVAPRAGTPGFRAPEVLTKCPNQTTAIDMWSAGVIFLSLLSGRYPFYKASDD
LTALAQIMTIRGSRETIQAAKTFGKSILCSKEVPAQDLRKLCERLRGMDSSTPKLTSDIQ
GHASHQPAISEKTDHKASCLVQTPPGQYSGNSFKKGDSNSCEHCFDEYNTNLEGWNEVPD
EAYDLLDKLLDLNPASRITAEEALLHPFFKDMSL
Which libraries are you using to import the fasta and to align the sequences?
I use these:
I think that the issue might be that you have some non-standard characters in your sequences. Can you check that you just have aminoacids, no "X"s and no other letters which do not correspond to an aa?
I checked for "X"s and there was none. Besides, wiki article for aa describes X as a valid letter to describe unknowns, https://www.wikiwand.com/en/Amino_acid#/Table_of_standard_amino_acid_abbreviations_and_properties
Yes it is standard, but maybe for some reason this library doesn't recognise it. Did you check for all characters which are not standard amino acids or just X?
I checked for X, U, O, B, Z, and J. There were none of them.