I have a sequence alignment in fasta format with 219 sequences in it. I am testing a new phylogenetic method and I am curious about how subsets of differing sizes and compositions from my full alignment might impact upon selection of sites for inclusion in tree building and thus tree topology.
I am using 'ape' and 'phangorn' in R and have found that I can subset defined sequences using the following method:
testalign <- read.phyDat("alignment.fasta", format = "fasta", type = "DNA")
subset(testalign, subset=1:10)
In this case I am creating a subset of sequences 1 through 10. Ideally I would like to extract subsets of this alignment of a random size between 3 and 218 and then write these subsets out as individual alignment files. I would prefer, of course, that these subsets not be taken in order of how they are found in the origianl file (i.e. not 1:10; 10 random sequences from the alignment of 219).
Could anyone advise on how I might achieve this?