Hello everyone,
I apologise in advance if the terminology used in the title is misleading; I am not totally familiar with all object type terms, but I believe what I have posted is at least mostly correct.
I have a script for extracting sequences from a phyDat object (see packages 'ape' and 'phangorn') in R that is based on using subset and and a character string of the column names I wish to retain. See code below:
newalign <- as.phyDat(subset(aligndf, select = seqkeep))
In this case, 'aligndf' is the complete original alignment that has been transformed into a data frame in an earlier part of the script. Here I use 'subset' and 'select' to generate a new alignment object via 'as.phyDat' that consists only of the sequence names contained in the object 'seqkeep'. As an example, the contents of 'seqkeep' looks like the following:
[1] "hominin23" [2] "hominin33" [3] "hominin47"
This procedure works well, and from this I gain exactly what I wanted, which is a new alignment that consists only of the sequences given in 'seqkeep'.
When I try to then write a second alignment that consists only of the sequences not in 'seqkeep', I have encountered a problem. No matter what I have tried, the resulting alignment is the complete original alignment that still includes the 'seqkeep' sequences.
Here are my most recent attempts based on some guides I have seen online:
remainalign <- as.phyDat(subset(aligndf, aligndf =! seqkeep)) remainalign <- as.phyDat(subset(aligndf, !(aligndf == seqkeep)))
Could anyone advise me on how to correctly render this task in R?
Thank you for your help.
?subset
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.