Hi everyone, I am very new in environment R. I am trying to obtain a final fasta file selecting a subset of sequences from a downloaded fasta file, using seqinr. For this purpose I type the following script:
myfasta<- read.fasta(file = "MRP2.fas", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
subsetlist<-read.table("list.txt", header=TRUE)
myfasta[names(myfasta) %in% subsetlist$id]
write.fasta(sequences = myfasta, names = names(myfasta), nbchar = 80, file.out = "parsedMRP2.fasta")
parsedMRP2<- read.fasta("parsedMRP2.fasta", set.attributes = FALSE)"
The MRP2.fas contains 5039 sequences, while the list.txt is a list of 3000 sequences. When I run my script, I obtain final file (parsedMRP2.fasta) always with 5039 sequences instead of the 3000 sequences that I indicated in the list.txt.
Could you please help me? thank you very much in advance
If you need an alternative then using
faSomeRecords
from Jim Kent's utilities is a great/fast option. Linux version is linked but macOS available as well. After downloading do not forget to add execute permissions (chmod a+x faSomeRecords
).