Hi everyone,
I have a DNA sequence alignment in a .fasta format that I'd like to read in using R, choose one of a given set of variant sites (columns if we think of a DNA sequence alignment as a matrix where rows are sequences), and then randomise this site x times to produce x alignments that are identical with the exception of the 'shuffling' of the site of interest.
What is important here is that I'd like to keep the base composition of the given site. For example, if a given site has a 'C' frequency of 0.7 and a 'T' frequency of 0.3, I would like to retain this. All I want to do is shuffle which sequences have which nucleotide.
Does anyone know of a software package that can do this? Or alternatively of a quick way in R that I can isolate the colum of interest and simply rearrange its contents in a random way?
Thank you
You might want to shuffle the columns or nucleotides using the Fisher-Yates Algorithm.
did you try fasta shuffle letters? (http://meme-suite.org/doc/fasta-shuffle-letters.html)