I am trying to subset a fasta file like described in another post (How To Extract Multiple Fasta Sequences At A Time From A File Containing Sequences Ids Using R). I simply want to be able to extract a subset of sequences from a fasta file.
I load the package and the data as described in the other thread:
library("seqinr")
fastafile<- read.fasta(file = "proteins.fasta",
seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
as well as the list of IDs I was to use for subsetting (the column in subsetlist
is called id
).
subsetlist<-read.table("~/scripts/subsetfasta/test.txt", header=TRUE)
when I attempt to use the solution from the previous thread:
fastafile[names(fastafile) %in% subsetlist$id]
I get the following:
named list()
What am I doing wrong or missing?
best regards
Henrik
Dear all,
Does someone know how to do partial matching based on a column in a dataframe (here column id in dataframe subsetlist)? in case id thus only contains part of the names of the fasta file? (since %in% needs a complete match)
Thanks! ellen
Please open a new question and add a reproducible example.